by Rocío Txabarriaga and Jacopo Madaro Moro
Based on the authors' "Voice-Over Workshop", sponsored by the New England Translators Association, Boston, MA. January 2008.
The A.T.A. Chronicle Vol XXXVII, No. 6, June 2008

Voice-over, often abbreviated as VO, is a term dating from the late 1940s, which describes the voice of an unseen person speaking (in a variety of media). Also used to present the audible thoughts of a visible character in a film, VO is just another form of language transfer or translation.

Viewers of news programs are familiar with the voice-over translation of statements or responses of interviewees who do not speak in the language of the viewing audience. This technique allows the first few words in the original language to be heard, and then fades them down for revoicing a full translation.

The speaker or voice talent is a professional who, by skill and schooling, has achieved breathing control, proper enunciation and tone, and the ability to convey the right feeling, in the context of a recording session. All types of voices will do. The preferences seem to favor bases and altos for training materials, news and documentaries, and tenors and sopranos for ads and instructional materials.

Considering the partial application overlap, it is important to make a distinction between "dubbing" and "voice-over" work. The difference is lip-synchronization. Dubbing is a technique that makes the translated dialogue match the lip movements of the actors on screen. Both present identical extra-linguistic requirements, expecting as close as possible parity of several markers, including gender, age and ethnicity. Voice-over works show greater variance from these ideal equivalences, and parity is not always maintained.



Voice-over's range of applications is quite broad. It encompasses film, radio and TV productions, multimedia presentations, IVRs (Interactive Voice Response systems; telephone prompts), video games, educational materials, and audible messages in public places (e.g., airports, train stations, terminals).


FILM  Many films and documentaries have no acting. Instead, there is a narration (e.g., the BBC’s Planet Earth series, narrated by Sigourney Weaver).

Many character films also have a narration component, and it is usually done by one of the main characters (e.g., the movie “The Shawshank Redemption,” with Morgan Freeman’s character narrating over several scenes.

RADIO           VO is applied to promotional spots or promos (a spot or a promo is an advertising industry term to designate 30 or 60 second promotional segments between scheduled programs on radio and TV).

TV       VO is used in every news cast, in just about every commercial, in sitcoms, in educational programming … almost every TV show has a narration component done by a voice talent.

In many cases, when the person on screen speaks a different language, and the broadcast is live, a voice talent (a simultaneous interpreter in many cases) speaks over the foreign voice.

MULTIMEDIA         When you visit a museum, chances are you will come across an exhibit with a voice component. The visitor usually presses a button and a voice plays in the background.

Corporate training videos normally have a voice component, so as to enable learning across sensory preferences.

Multimedia educational materials always include a voice component.

IVRs   Interactive Voice Response systems, by their very nature depend on good, clear voices. In this type of system, a caller responds to a set of prompts (spoken by a voice talent) to access a system or a person.

Recent developments in the field of natural language processing have turned IVRs into very effective systems for automating services where a large exchange of information and data capturing are required. These include utility companies, large corporations, and clinical trials.

VIDEO GAMES        Even if a character in a video game just grunts, a voice talent is required to bring it to “virtual life.” Not all games have voice components, but those with educational purposes always do. There is no doubt that doing the voice-over for a video game requires voice characterization (i.e., acting). In this sense, when done in a foreign language this is like dubbing, only the character is a line of code!

EDUCATIONAL MATERIALS      Many learners are auditory, i.e., they favor acquiring information through their hearing sense. If no voice is incorporated into educational materials, those learners are at a disadvantage.

Adding voice and sound to interactive materials enriches the user experience and ensures that learning occurs at every sensory level. It also renders versatility to the materials, making it possible to sometimes simply “listen” to them.

AUDIBLE MESSAGES        These messages are usually playing in a loop at airports, terminals and other public places, often in several languages. In most cases, they are recorded by professional voice talents.

Visitors at many world museums can also get a guided tour using an interactive device that plays a narration (through headphones).

Audible messages are also used for the visually impaired at ATMs and elevators, for individuals entering a secured facility as confirmation of their credentials, and for many “talking” machines. These, however, are instances of screen readers, not voice-overs done by humans.



These basic questions define the areas that might need work:


1.         Is every voiced syllable perfectly audible?

            It should be. Enunciation is key for a successful message.

2.         Is every word pronounced with the right stress?

Some words have very similar sounds, and if not stressed adequately, may sound like something else entirely. Proper pronunciation is critical.

3.         Is every sentence pronounced with the right tone for the context?

If you are reading a warning, a soft, relaxed tone will not engage the target audience. Every context has a particular tone associated to it. (See PITFALLS.)

4.         Are the sibilants particularly loud in the target language, sometimes sounding like static on a telephone line?

Muting these “Ss” is one of the most difficult skills to acquire, and it takes lots of practice to master the art of pronouncing sibilants without residual noise. The other extreme should be avoided as well, eschewing and sounding like you have a speech impediment either, so balance is key.

5.         Can you hear your tongue clicking loudly, as when you take a sip of a really cold liquid?

This is a common occurrence and your goal is not to eliminate clicking, as that is impossible, but to minimize it. Current audio recording technology helps eliminate clicking to a large degree, but a well-hydrated throat is key when recording to minimize clicking.

6.         Can you hear yourself inhaling or exhaling?

Breathing control is not about holding your breath or inhaling deeply so that you can speak for a long time without inhaling again. It is about taking in the right amount of air for every segment that you and the client or producer have agreed will be recorded. This is not just the air that you hold in your lungs, but the kind of air that lyric singers, for example, pull from their entire abdomen so that their voice can carry through a space and hold a particular note in tune.

7.         Do you hear noises such as jewelry jingling, hair rustling or hands rubbing?

Studio mics are extremely sensitive and although most have noise cancellation features, they pick up noises from the person speaking directly into them. These include the examples given in this question. Prepare yourself to wear “quiet” clothes, not to wear any jewelry, and not to play with parts of your body (such as touching your hair or fidgeting with your hands.)


The following means of developing VO skills are recommended:

Understand phonology and phonetics: Knowing how your voice apparatus works, the difference between the voiced and not voiced phonemes, etc. will help you enunciate correctly.

Learn to breathe like a singer: The techniques used by singers are sure to be beneficial for voice talents. Currently, many singing technique self-study materials include breathing techniques.

Observe news anchors with different eyes: Study their intonation, the way they pronounce and enunciate.

Pay attention to the “feeling” of the voice-overs for TV commercials. Notice how different the voice choices and the intonation are for the various products.

Take an acting class: If you are serious about becoming a professional voice talent, this will help you stand out in a sea of voices.

And of course … learn from professional voice talents!



The US VO market can be organized in the following four segments:

·           News organizations: most are in New York, Atlanta, and Washington.

Some local cable station might have a need for VO. This work is often pro bono, but one can develop precious experience.

Ad work: inquiries should be directed to ad agencies, talent agencies, and recording studios.

Documentaries: public television stations, colleges, private TV and radio stations are the most likely sources of work.

Instructional materials: translation agencies are often in charge of localizing these materials and will be the source of related voice-over work. Often, they will need voice talents in their own geographic area (see A STARTING KIT).


The full range of options often applies only to work in English. In reality, language services companies often manage foreign language voice-overs, seeking and booking the various talents. Agencies may first contact local linguists for the job (and be willing to train those without voice-over work experience). Jobs are also advertised in the usual boards for linguist.

There are other instances on which the companies who manufacture the materials in need of foreign language VO directly hire the voice talents. They usually advertise the projects on Internet boards dealing specifically with voice-over or acting work, and invite candidates to audition for the project. For developing a sense of these sites, you could start, for example, with http://voice123.com/, http://searchvoices.com or http://voicerecruiters.com/.



Several materials are needed:

1.         A voice sample in MP3, CD ROM or .wav format. Avoid tapes and videotapes, as well as cell and digital videos. Cell videos offer low bandwidth/quality and digital recordings are distractive and too costly to produce professionally. Read only public domain texts or text you can use by author's permission. If you choose a newsprint article, quote the paper and the journalist. Conform to international copyright laws.

2.         Leads - Visit the Public Library, consult the US yellow pages and search the Internet for recording studios, public television stations, (voice) talent agencies and translation companies in your geographical area (see THE MARKET).A talent should cover as much ground as s/he is willing to pound, e.g.: East & Atlantic Coasts: “have voice, will travel”.

3.         CV - Write a résumé highlighting voice-over and dubbing as objectives/goals, listing singing and acting courses or classes/experience, mother tongue if working into foreign languages, etc.

4. Mailing campaign - Send the résumé and the sample by regular mail or e-mail to the leads. Do not be discouraged: 5 answers in 100 is a very good score and one solid contact out of twenty answers is a phenomenal result: you are still unemployed, but in business!



Voice-over is done in a variety of ways and settings, at times using specialized equipment with peculiar kinks. Nevertheless, the methodology, basic equipment and pitfalls seem to be constant:




Whether recording at a studio or on a home computer (for instance, your first sample), a recording session must occur in a controlled environment (see PITFALLS).

The first rule is: NO NOISE. We have already mentioned the example of “body noise” which should be avoided when recording. Recording from home will obviously not be possible if your dog is barking at a passerby. And electronic feedback can be a problem while recording if equipment is not correctly handled.



            While improvisation may be adequate in certain settings, for instance, when it occurs in a recording session, because a director is present and has authorized it. A script is necessary and must be followed. When scripts are long, they must be divided in workable segments and a time cue should precede each segment. When there are multiple voice talents in the same recording session, all participants must be clearly marked on the script so that each person knows with certainty when it is his or her time to speak (see TIMING).



A voice talent must be punctual. Studio time is booked for a specific schedule and the norm is to hold back-to-back sessions. It is also a costly service. Every minute lost from the original schedule of a session because a voice talent is late costs money, and means that a makeup session will need to be scheduled. Clients will not appreciate this.



The script must be fully read and practiced ahead of time. Preparation time is not part of a scheduled session. At the studio, once the voice talents and all other participants are properly situated, a sound test is conducted and the session begins immediately afterwards (see PITFALLS).



The following elements are standard recording items:



            Make sure you can see/hear your recording tech/coach. Find a good position for reading stand, stool and self. It can be a few hours


            During the voice level test, make sure you are not adapting to the microphone. Save your neck and put the microphone in a position that is completely comfortable. Ergonomics first!


            You are supposed to hear voices! Make sure they do not bleed into the microphone. Normally, the original voice is piped through one channel/ear, and your own through the other.

            Be careful when putting on your earphones and be ready to remove them fast in case of a deafening feedback.


            Video monitors can be useful in providing visual clues (e.g.: a timer at the bottom of the screen) that allow fine-tuning one's starting points. Conversely, not everyone can integrate 2 sound tracks and two visual inputs (text and monitor) on different visual planes.


            The rec technician is in charge of timing and if s/he asks for a retake, a retake should be done. Textual checking is instead a much more vague domain. It could be done by the producer, the project manager or, in a multilingual event, a fellow talent waiting for his or her turn.

            Classy productions have real booth support. They require another talent of similar language, background, etc., who can check diction and enunciation as well.


Be prepared. Voice-over is thirsty work.



Whatever can go wrong, will do so, especially in three areas: communication, timing and recording.



            The text presents its own obstacles, mainly:

·          Pronunciation: as in Apalæcian or Apalacian; glimpses of English into the translated script

·          Alliteration: for example, “I bought a box of biscuits, a box of mixed biscuits and a biscuit mixer”

·          Translation errors from mild to catastrophically offensive

·          Run on passages, which are difficult to read in any language, because there have wrong pauses.



            Normally, if it is not timed properly, the translation is too long into a foreign language, especially if Romance, and too short into English. In these cases, only two options are available, and sometimes even both are insufficient. In fact, it is possible to speed up / slow down your speech, or trim/expand the text.

            The choice depends on the talent/producer assertiveness and the seriousness of the timing gap. In the best-case scenario, the voice talent is also the translator, so that the timing has been already considered and at voice-over time, said talent has a greater control over the final script. At the opposite end of the spectrum we find the so-called HKS (Hong Kong Studios) syndrome: a talking head strings a long sentence on screen while the voice-over offers a simple “Yes.”



            The problems encountered during recording include but are not limited to:

·          Poor breathing

·          Slurred enunciation, especially at the end of long sentences

·          Uneven tempo. If the talking head is vivacious and bubbly, the talent cannot give the impression of reading an accountant's obituary

·          Missed passages. To kill the endings is very Bostonian and far from ideal. To skip a line is not acceptable, no matter where and in which language

·          Volume levels. Especially during retakes and in between scenes, it's important to keep an even level. Discrepancies can be digitally corrected, up to a point, but it is a matter of professional pride (and cost-control) to keep the takes to a minimum.



Once more, control your breathing!

            If you don't fill your lungs with air before you start, you will produce the Brenda Vaccaro's effect: Some 20 years ago, in a nationally televised ad, Brenda started with a deep aspirating groan, saying : “AAAAAH Tampax”. It sounded as though she was dying of anaphylactic shock. Sales suffered and so did Brenda's advertising career.

            Take a full breath on 5 and start on cue


Do not get flustered.

            If you hit a rough spot, call for a break, have a brief normal conversation, a glass of water and try again. What you cannot afford it is to act as a primadonna with a short fuse.

Retake whatever needs to be re-recorded, and when mistakes are made, or one of the participants is not satisfied with the quality of a sound byte, remember that communication must be crystal clear. The point in the script where a repeat is needed must be well indicated. In many studios, immediate replay of the segment just recorded is possible to spot problems.


Do not overact.

            Unless you are dubbing or doing voice-characterization, all you need to do is convey the feeling of the message, reading the script as if it meant something to you.


Aim for clarity and proper tempo.

            Even when others check what you do, check your own work yourself and be brutally critical.  Better if it comes from you than the final audience.




            To hear your own recording is a paramount component of voice-over quality control. You know best your own phrasing, pacing, parsing and projecting. If you sound hollow, tentative or out of breath to your own ears, consider whether that passage conforms with minimum standards, ask the opinion of others and then decide. Do not leave a job feeling like you should have done better. The end client might think the same.