Get amazing AI audio voiceovers made for long-form content such as podcasts, presentations and social media. (Get started for free)

The Science Behind Dune's 'Voice' Exploring Real-World Vocal Control Techniques

The Science Behind Dune's 'Voice' Exploring Real-World Vocal Control Techniques - Pitch Modulation in Human Communication

Human communication relies heavily on the ability to manipulate the pitch of our voices. This pitch modulation stems from the intricate workings of our vocal apparatus, including the larynx, vocal folds, and the parts of our mouth that shape sound. Beyond simply conveying words, pitch variations allow us to express a wide array of emotions and intentions, creating a level of interaction that goes beyond straightforward speech. The way we change our pitch reveals our moods and motivations, impacting how we interact with others, even influencing aspects like courtship and mate selection.

The neural mechanisms behind pitch control are remarkably complex, highlighting our unique ability to produce and control sounds expressively. This sophistication isn't just an interesting biological quirk; it's crucial in fields like voice cloning and the production of audio content. In the age of podcasts and audiobooks, a deep understanding of the nuances of pitch modulation remains crucial for creating engaging and effective communication. While we've come a long way in replicating human voices, it's evident that the richness and complexity of human vocal expression are still a challenge to fully capture.

1. Pitch modulation isn't solely about expressing emotions; it's also a vital tool for conveying the speaker's intended message. Subtle changes in pitch can signal whether a statement is a question, a command, or a definitive statement, highlighting its role in shaping communication.

2. The fascinating aspect of pitch modulation is its cultural context. What's interpreted as uncertainty in one culture, like a rising pitch at the end of a sentence, might signal enthusiasm in another. This illustrates how the same pitch patterns can elicit different meanings across diverse populations, leading to potential miscommunication if not considered.

3. Humans have a remarkable vocal range, with males typically producing sounds between 82 and 180 Hz and females between 165 and 255 Hz. This inherent biological diversity in vocal production emphasizes the wide spectrum of sound possibilities and variations we can create.

4. In the world of voice cloning, pitch modulation plays a key role in creating synthetic voices that sound authentic. By analyzing how pitch varies during speech, researchers can replicate specific voice qualities, leading to very realistic-sounding synthetic voices. While it presents a future full of potential and a bit unnerving, this technique highlights a specific understanding of the vocal anatomy.

5. Vocal fry, a low and creaky voice quality, has recently gained traction in some communities. However, studies suggest it might negatively impact perceptions of authority and competence. It highlights how even seemingly minor pitch variations can have significant impacts on how we are perceived by others, making understanding pitch important in different contexts.

6. In audio production for mediums like podcasts and audiobooks, mastering pitch modulation is a crucial aspect of the sound design. Voice artists use it to enhance clarity, convey emotions, and maintain listener engagement, ensuring a successful audio experience. It reveals a fascinating application of our understanding of sound production in enhancing human communication.

7. There's a close relationship between emotional states and pitch patterns. Excitement often leads to higher pitch, whereas sadness can lower it. This highlights how pitch can serve as a non-verbal cue, influencing how listeners interpret and process the content of spoken messages.

8. It's not just about the height or depth of the sound but also the speed of pitch changes. Rapid changes can evoke surprise or urgency, while slower, gradual shifts can signal calmness and reassurance. This highlights the complexity of pitch modulation in communication, showing how subtle changes in pitch can affect the perceived emotional weight of a message.

9. Researchers are finding that background music can significantly affect pitch perception in spoken language. The harmony and tempo of music can make it either easier or harder to differentiate specific vocal characteristics. This is an intriguing finding that indicates the interconnectedness of sound, where the interplay of various audio elements shapes our perception of speech.

10. Current advancements in voice synthesis and cloning technologies allow for creating a range of emotions simply through pitch manipulation. This indicates that even the smallest variations in pitch can communicate complex emotions, even without accompanying words. It suggests that future sound production technologies can generate emotions without additional tools.

The Science Behind Dune's 'Voice' Exploring Real-World Vocal Control Techniques - Laryngeal Control and its Neural Correlates

turned-on touchpad, ableton push 2 midi controller

Laryngeal control is fundamental to the intricate process of sound production, shaping both our communication and how sound is perceived. The larynx, along with its associated neural networks, manages a complex interplay of muscle movements that allow for precise pitch changes. This is especially crucial for tonal languages where pitch distinctions carry meaning, and also plays a vital role in conveying prosody in languages like English, influencing things like emphasis and emotional tone. This sophisticated sensorimotor integration, necessary for vocalization, underscores the remarkable complexity of human speech. Understanding the intricate mechanisms of the larynx is therefore critical for advancements in voice cloning and the creation of audio content. As these technologies progress, they must grapple with the inherent subtleties of human vocal expression, the nuances that convey meaning and enhance listener engagement. A more thorough understanding of laryngeal control could improve our ability to replicate not just the literal words, but the rich tapestry of emotion that truly brings communication to life. While some progress has been made, the fine control humans possess over their voices is still a challenge to fully reproduce artificially. It remains to be seen whether advancements in this area will provide genuinely nuanced emotional expression via synthetic voices. The field of voice cloning and sound production has a ways to go before it can convincingly match human expression.

The intricate control of the larynx, the voice box, is a fascinating aspect of human communication, particularly when considering its neural underpinnings. The larynx's role in modulating pitch is crucial for languages that rely on tone for meaning, as well as for conveying emotional nuances in languages like English. Producing speech is a complex motor act requiring a finely-tuned interplay of over 100 muscles spanning the larynx, face, and respiratory system. This complexity underscores the intricacy of the neural networks that govern vocalization.

The larynx is intimately linked to a complex network of neural structures, including the motor cortex and areas like Broca's area, which play critical roles in coordinating muscle movements for speech. This intricate connection is vital for the execution of vocalizations and, equally importantly, the complex act of swallowing. Achieving smooth vocal production requires a precise interplay of sensory and motor information, allowing the brain to plan, execute, and refine vocal movements.

Interestingly, the level of sophisticated laryngeal control found in humans doesn't appear to be mirrored in non-human primates. Some researchers believe this might stem from a potential lack of direct motor cortex innervation of the laryngeal nuclei in these species. In contrast, humans seem to have developed a unique neural pathway, involving a connection between the cingulate cortex and the periaqueductal gray (PAG), which facilitates coordinated control of the larynx and respiration during vocalization. This highlights an intriguing adaptation within the human motor cortex, potentially tied to our evolved capacity for complex vocal communication.

These findings suggest that our understanding of cortical control in voice production needs further refinement. It seems that traditionally separating cortical control from the influence of limbic structures and the brainstem might be too simplistic. Instead, a more integrated perspective is likely needed to fully grasp the complexity of voice production. Furthermore, there are crucial links between the neural pathways controlling speech and the protective mechanisms of the upper airway. This highlights that the brain carefully orchestrates both vocalization and protection of the airways, a reminder of the intricate balancing act the nervous system must manage during communication.

The enhanced neural control of the laryngeal muscles in humans also seems to contribute to the stability of the vocal fold oscillations that regulate airflow during voice production. This tight control is likely crucial for producing consistent and clear sounds. While the details of this neural control are still being explored, it's clear that this sophisticated regulation is essential for the complex and expressive communication characteristic of humans. While advancements in voice cloning and AI-driven sound production continue to blur the lines between natural and artificial voices, a deeper understanding of these complex neural processes remains a core challenge. Understanding how the brain orchestrates sound will continue to shape future technologies that manipulate and replicate human voices.

The Science Behind Dune's 'Voice' Exploring Real-World Vocal Control Techniques - Source-Filter Theory in Vocal Production

Source-Filter Theory provides a framework for understanding how humans produce speech. It posits a two-part process: the larynx, or voice box, acts as the source, creating a basic sound through the vibration of the vocal folds. This sound then passes through the vocal tract, which acts as a filter, shaping the sound into the various speech sounds we recognize. This source-filter interaction isn't unique to humans; it's a fundamental mechanism found in the vocalizations of many animals, indicating a common foundation for sound production across species.

This theory has implications for fields like voice cloning and audio production. By understanding how the source and filter interact, we gain a better grasp of how to generate synthetic voices with greater clarity and emotional range. This is crucial for technologies like voice cloning, where recreating the subtleties of human expression is a continuous challenge. Podcasts and audiobook productions also benefit from this understanding, as mastering the source-filter dynamic can lead to more engaging and effective audio content.

While the field of artificial voice production has advanced significantly, accurately capturing the intricate nuances of human vocal expression remains a significant hurdle. Replicating the subtleties that convey emotion and meaning is a challenge that requires a deeper understanding of the source-filter relationship. This ongoing exploration into the mechanics of voice production, fueled by theories like source-filter, will undoubtedly continue to shape future technologies that create and manipulate human sounds.

The Source-Filter Theory posits that vocal production is a two-part process. The first part involves the creation of a sound source from the vibrating vocal folds within the larynx. Think of it as the raw material of the voice. The second part involves the shaping and filtering of that sound as it travels through the vocal tract – the mouth, throat, and nasal passages. This acts like a dynamic resonator, changing the character of the sound significantly. The shape and size of the vocal tract, things like tongue position and lip rounding, play a major role in how the sound is filtered and modified, producing the wide array of sounds we use in speech.

It's fascinating to consider that not just the physical shape of the vocal tract, but also the airflow and vocal fold tension, can dynamically influence how the sound gets filtered. It's not just a static system, it's actively and subtly adjusted, which adds a layer of complexity that's still being fully understood, particularly when trying to replicate it with voice cloning. For instance, in voice cloning, capturing the unique tonal quality of someone's voice hinges on accurately replicating both the sound source from the vocal folds and the filtering done by the vocal tract. If you don't get both parts right, the voice ends up sounding artificial or generic.

The harmonics, or overtones, created by the vocal folds provide a rich palette of frequencies, but the filter component plays a key role in selecting and modifying those frequencies, emphasizing some and attenuating others. This interplay shapes the perceived quality of the voice, including its timbre, brightness, and warmth. For example, simply moving your tongue or jaw can alter the size and shape of your vocal tract, extending your vocal register in a way that can be tricky for synthetic voices to emulate convincingly. This adaptability and dynamic range present a significant hurdle for voice cloning systems that often struggle with the smooth transition between different registers or emotional expressions.

This theory also provides insights into why male and female voices differ, not just because of size of vocal folds but because of the different shapes and sizes of the male and female vocal tracts. This variation in filtering creates the difference in average pitch between genders. Furthermore, the vocal apparatus is incredibly complex, controlled by a network of neural pathways in the brain that links emotional state and physical action. This makes it a challenge to replicate the subtleties of human vocal expression using artificial means. It's a reminder that there's much more to communication than just the words we use – the emotional weight carried by the voice is a complex interaction of physiology and the brain's control.

Voice synthesis technology currently often uses simplified models of pitch and timbre, failing to capture the intricate interaction between the sound source and filter that Source-Filter Theory describes. While advancements in AI and machine learning have helped generate more realistic voices, those voices often still sound robotic or sterile, especially in emotionally-charged contexts. This highlights the gap between how we humans understand and generate sound versus how current synthetic voice technologies do it.

Vocal resonance is a related concept that further underscores the influence of the vocal tract. The shape and length of the vocal tract don't just alter the pitch, they shape how the voice sounds and is perceived. It can make a voice sound warm or cold, bright or mellow, and this is an aspect often overlooked in synthetic voices. Interestingly, humans manipulate this filter component beyond verbal communication for things like laughter or crying. Developing an understanding of how these non-verbal sounds are produced could be a major step forward in refining the emotional expressiveness of synthetic voices.

Moreover, the principles of Source-Filter Theory can be applied practically in audio post-production. Sound engineers can use their understanding of this theory to manipulate audio recordings, adjust frequencies, and enhance a voice's clarity and expressiveness, making it a valuable tool across disciplines. This theory not only provides valuable insights into the science of voice production but also gives tools for how to modify and manipulate audio in practical ways. In conclusion, the Source-Filter Theory remains a powerful model for understanding how we create sounds, providing a road map for refining voice cloning, AI sound production, and audio manipulation. It's an active area of research that will likely continue to have a major influence on the future of sound technology and its applications.

The Science Behind Dune's 'Voice' Exploring Real-World Vocal Control Techniques - The Acoustic Complexity of Human Speech

Human speech, while occupying a relatively narrow frequency range compared to other vocalizations, demonstrates an impressive level of acoustic complexity. This complexity arises from a unique combination of evolutionary adaptations within our vocal tract and the intricate neural control of sound production. Changes like the descent of the tongue and modifications to the pharynx have resulted in a system capable of producing a wide range of sounds, from brief bursts to sustained vowels. This ability to manipulate the vocal tract with precision, coupled with the loss of certain structures like laryngeal air sacs, has contributed to the development of both stable and highly expressive speech. Interestingly, the enhanced control needed for nonverbal vocalizations might have played a crucial role in the evolution of our clear and intelligible speech. Humans, unlike other primates, have developed conscious control over their vocalizations, leading to diverse speaking and singing styles, yet still retain the ability for spontaneous sounds. Though the basic principles governing sound production are shared with other animals, human speech stands apart due to its unique blend of a stable sound source and rapid changes in formant frequencies. This complexity is further emphasized by the fact that acoustic cues used in speech, while following general principles, vary subtly between languages, revealing the intricate nature of spoken communication. As technologies like voice cloning and audio production aim to recreate and manipulate human voice, a deep understanding of this acoustic complexity is paramount for creating truly realistic and engaging sound experiences. The journey towards capturing the nuances of emotion and intention in synthesized speech relies on unraveling the complexities of how humans produce and interpret these intricate sound patterns.

1. The human vocal apparatus, encompassing over 100 muscles in the larynx and vocal tract, allows for a remarkable range of sounds and voice qualities. This intricate structure poses a fascinating challenge for voice cloning researchers aiming to reproduce the authenticity of human speech, particularly in mimicking the unique characteristics that make each voice distinct.

2. Humans possess an exceptional ability to differentiate minute pitch variations, as small as a few cents—a hundredth of a semitone. This sensitivity to pitch changes is crucial for understanding the nuances in speech and presents a hurdle for voice synthesis technologies that strive to replicate the subtleties of human expression. It’s not just about the overall pitch, but how it changes subtly over time.

3. The tension in the larynx has a direct impact on vocal fold vibration, consequently influencing the pitch of the voice. This relationship between tension and pitch needs to be meticulously accounted for in voice cloning to produce a truly authentic sound, otherwise, the synthetic voice may lack the richness and character present in natural speech.

4. The precise timing and coordination of vocal fold closure, happening hundreds of times per second, plays a critical role in both speech clarity and emotional expression. Accurately replicating this rapid coordination in synthetic voices is a considerable challenge in the realm of artificial voice generation. Getting the “timbre” and other nuances correct is not a simple task.

5. Emerging research suggests that the neural pathways controlling our voice are closely linked to the parts of the brain involved in emotional processing. This implies that effective voice cloning shouldn't solely focus on replicating the sound production, but must also incorporate the inherent emotional context woven into vocal expressions. This could be difficult to do artificially.

6. Voice actors expertly manipulate their pitch and resonance to create diverse characters and accents. This emphasizes the complexity of vocal characteristics that synthetic voices find challenging to convincingly emulate. Artificial voices often lack the fine-tuned adjustments that human voices effortlessly employ.

7. Formant tuning, a process where the vocal tract shape is manipulated to enhance certain frequencies, has a significant impact on both speech clarity and emotional expression. Synthetic voice technologies frequently struggle to replicate these dynamic adjustments, which can contribute to the perception of static or lifeless voices. Humans do this all the time unconsciously, while artificial systems are just beginning to catch up.

8. The length of the vocal tract is a major factor in determining perceived pitch and timbre. Longer vocal tracts, commonly found in men, contribute to lower-pitched voices. Conversely, women's shorter vocal tracts often lead to higher-pitched voices. Effectively mimicking these anatomical differences is crucial for achieving a high degree of realism in voice cloning.

9. Studies have shown that the emotional tone of a voice can induce physiological responses in listeners, like changes in heart rate. This observation emphasizes the importance of voice quality in audio content creation; the emotional quality can affect audience engagement and perception in tangible ways. It is something that synthetic voice production needs to become better at.

10. The interplay of sound quality and its surrounding acoustic environment heavily influences how a voice is perceived. Audio production engineers utilize this knowledge to craft immersive soundscapes in podcasts and audiobooks by carefully manipulating acoustic properties. Voice cloning technologies would benefit from similarly incorporating these environmental considerations to increase the authenticity of synthetic voices.

The Science Behind Dune's 'Voice' Exploring Real-World Vocal Control Techniques - Voice Cloning Technology and its Applications

Voice cloning technology employs artificial intelligence to create a digital copy of a human voice, replicating its distinct characteristics like tone, pitch, and emotional nuances. This technology has significantly improved the realism of synthetic speech, finding applications in fields like entertainment, audiobook production, and podcasting. Efforts in voice cloning are focused on creating systems that can operate in real-time, enhance the quality of synthesized voices, and generate speech that sounds natural and emotionally expressive. One challenge is replicating the voices of people who were not part of the initial training data for these systems. While voice cloning offers potential cost savings for industries dependent on voice work by automating parts of the speech creation process, fully replicating the subtleties of human vocal communication remains a challenge. The future of this technology will depend on how effectively we can capture the essence of human interaction through voice, going beyond just the production of sound.

1. Voice cloning technology is advancing to the point where it can not only replicate speech but also capture and reproduce emotional nuances. By skillfully manipulating aspects like pitch, tone, and speech speed, researchers are able to imbue synthetic voices with a wide range of emotional expressions. This ability to generate emotionally expressive speech without needing specific emotional instructions in the script showcases how deeply voice cloning can tap into the complexities of human communication.

2. The quality of the audio recordings used to train voice cloning algorithms is paramount to the success of the technology. A higher sampling rate within those recordings is crucial as it allows the algorithm to capture very fine details in the voice, such as rapid fluctuations in pitch or the subtle warble of vibrato. This highlights that the starting point – the initial audio data – needs to be of a high quality to produce accurate replications of individual vocal characteristics.

3. Interestingly, voice cloning is now reaching a level of sophistication where it can even distinguish and replicate subtle dialectal differences within a language. This capacity to mimic regional variations in accent or pronunciation suggests the potential to create highly localized voice assistants. This technology could provide more personalized user experiences, catering to the specific linguistic and cultural preferences of individuals, enhancing user engagement and satisfaction.

4. One of the primary limitations of voice cloning technology today lies in its need for extensive and high-quality datasets. If there isn't enough training data to encompass the wide variety of emotions and contextual nuances present in natural human speech, the resulting cloned voices will lack the richness and individuality of authentic human voices. This is a crucial aspect for improving the technology, as the current dependence on a large pool of data can be a barrier to wider applicability.

5. "Voice leakage" is a phenomenon where multiple voices in a recorded audio scene inadvertently mix or bleed into each other. This phenomenon is a significant challenge for voice cloning and audio synthesis. Understanding the origins of this leakage and developing methods to minimize it could significantly improve the clarity of synthetic voices, particularly when those voices are meant to operate within complex audio environments containing multiple speakers.

6. The rapid advancements in voice cloning are inevitably sparking discussions around ethical considerations, specifically in relation to consent and the accurate representation of individuals. There's increasing concern that extremely accurate synthetic voices can be generated from existing audio recordings without the explicit permission of the individual whose voice is being cloned. This raises questions around how individuals retain control over their own voices and the broader implications this could have for identity and the concept of personal agency.

7. Despite significant advancements in the field, there's a persistent challenge in achieving completely natural-sounding synthetic voices. Research has shown that listeners can often identify a synthetic voice even when those voices are attempting to sound extremely lifelike. This suggests that while voice cloning can generate speech that sounds incredibly realistic, there are still some subtle, almost subconscious, cues that help listeners discern between human and artificial voices. Capturing those elusive nuances is a major challenge and one that is crucial for the next generation of voice cloning technologies.

8. The relationship between the shape of the vocal tract and how it shapes sound is a rich and active area of research in voice production. Scientists are learning that the unique shape of an individual's vocal tract not only influences the sound frequencies it produces but also contributes to the emotional resonance of speech. It suggests that future voice cloning technologies might consider incorporating the anatomical shapes of vocal tracts as a means to enhance the emotional quality of synthetic voices.

9. A newly emerging area of voice technology focuses on multi-speaker voice synthesis. This research seeks to create audio scenes with multiple synthetic voices engaged in a conversation or narration. The challenges here involve maintaining natural intonation, rhythm, and the individuality of each voice within the scene. The successful development of multi-speaker voice synthesis could fundamentally alter the creation of audio content, possibly transforming applications like interactive storytelling and audiobook creation.

10. Prosody, the rhythmic and melodic aspect of speech, plays a vital role in how we communicate, offering context and meaning that is independent of the specific words we use. The ability to successfully reproduce prosody in voice cloning is critical because it's what can make a synthetic voice sound engaging and natural, rather than overly robotic or monotonous. It demonstrates the remarkable potential of voice cloning to generate sounds that mimic the intricate and nuanced interplay of human speech patterns and thereby create a more authentic and captivating auditory experience.

The Science Behind Dune's 'Voice' Exploring Real-World Vocal Control Techniques - Ethical Considerations in Voice Replication

The rapid progress in voice replication technologies necessitates a concurrent examination of the ethical implications they present. Concerns regarding the appropriate use of these tools are rising, particularly when it comes to issues of consent and safeguarding individual privacy. The potential for misuse in generating deepfakes or spreading misinformation adds a layer of complexity, demanding a critical evaluation of accountability and transparency in the development and deployment of voice cloning. Furthermore, the ability to fabricate remarkably realistic synthetic voices poses a threat to the trustworthiness of audio content, raising serious questions about identity and personal integrity. To navigate these challenges, a robust ethical framework is needed to guide the development and application of voice cloning. This framework should incorporate regular audits and strict adherence to principles of fairness and respect for individuals' rights. As the use of synthetic voices becomes more prevalent in audio production, notably in fields like podcasting and audiobook creation, the balancing act between innovation and ethical considerations will become ever more critical. It's crucial to ensure that these technologies enhance, rather than undermine, the integrity and trust inherent in human communication.

### Ethical Considerations in Voice Replication

The ability to replicate human voices with increasing fidelity using artificial intelligence presents a fascinating array of ethical considerations. One core issue lies in the question of consent and ownership. It's becoming increasingly possible to clone a person's voice without their knowledge or permission, leading to a gray area surrounding personal rights and the very concept of identity within audio recordings. This raises complex questions about who truly owns a voice and how we can protect it from unauthorized use.

The rise of realistic clone voices also creates a risk of misinformation and manipulation. Deepfakes, for example, can use cloned voices to spread false narratives or damage reputations, potentially undermining public trust and eroding the value of authentic communication. This points to the need for ethical guidelines related to authenticity and deception in the context of audio content.

Furthermore, the ability to evoke emotions with cloned voices creates the possibility for unethical manipulation. Because these voices can be very convincing, they can be used in advertising or propaganda to illicit emotional responses without the recipient fully understanding they are interacting with a synthetic voice. This highlights a potential risk that needs to be acknowledged and mitigated by ethical frameworks.

Another concern stems from the potential for biases to be amplified within voice cloning technologies. If the training data for these systems is not diverse and representative, the resulting voices may also be biased, potentially marginalizing underrepresented communities. This illustrates the need for carefully considered training procedures that aim to achieve equity in synthetic voices.

Moreover, voice cloning must consider cultural nuances. What might sound authentic in one cultural setting could be considered disrespectful or inauthentic in another. The development and application of this technology should therefore be informed by cultural sensitivity and a deep understanding of how voice and communication are valued across different communities.

Existing legal frameworks may not be adequate to grapple with these emerging ethical concerns. Defining voice ownership, for instance, presents challenges under existing copyright laws. We need to develop a comprehensive legal framework that adequately protects individuals and addresses the potential harms that can arise from voice cloning.

There is a bright side to this technology, however. Ethical applications exist within healthcare, for example, where cloned voices could give individuals who have lost their ability to speak a means of communication. These beneficial uses, however, should be governed by clear ethical guidelines to ensure patient safety and informed consent.

The ability of voice to convey emotion is a crucial part of human interaction. When synthetic voices lack this emotional depth, we lose an important layer of authentic communication, raising questions about the nature of human-human connection in a world where artificial voices are increasingly sophisticated.

Moreover, the commercialization of voices can occur without the original speaker's knowledge or consent. This highlights a complex ethical issue surrounding the commodification of personal identity and how individuals can maintain control over their voices and likeness in a commercial setting.

Finally, as synthetic voices become more realistic, we need to grapple with the potential for users to form genuine emotional bonds with them. This raises questions about the nature of future human-machine relationships and the role of authentic communication in a world where the boundary between humans and AI becomes increasingly blurred. These considerations are vital as we navigate the implications of voice replication in an ever-evolving technological landscape.



Get amazing AI audio voiceovers made for long-form content such as podcasts, presentations and social media. (Get started for free)



More Posts from clonemyvoice.io: