While we were chatting over the recycling bins recently, my new neighbor, Aji, asked me what I do for a living. I told him that I help people make podcasts. Most people respond to that statement by asking me, “What’s a podcast?” Aji surprised me not only by knowing what a podcast is, but also by having a podcasting question that was relevant to my job.
Aji has a pretty successful YouTube channel in his native language, Malayalam. Now that he’s comfortable with producing his show regularly, he wants to start offering episodes in English, too. He speaks both languages fluently, but the idea of re-recording all his past content in another language feels daunting.
As it happens, software exists that can not only translate your podcast’s words but also generate audio in a version of your voice. Can AI voice dubbing software help you expand your audience? Let’s look at some different ways to use technology to translate your podcast and whether it can help you engage your audience more effectively.
Why Translate Your Podcast?
The most obvious answer is to expand your current audience. But it can stabilize your relationship with new audiences.
My podcast teaches people how to write audio drama. In the spring of 2022, the podcast’s download numbers had a sudden bump in Ukraine. My podcast is in English, so this was surprising. My co-host and I thanked listeners and asked them to write about how they found the show, but our response felt weak. In hindsight, we had an opportunity to build and validate our new audience if we’d translated the podcast into Ukranian. Plus, we could have communicated more effectively.
Additionally, when you translate your podcast, your story can expand its perspective. If your podcast is about the history or culture of a particular region, translating it into languages spoken in those regions can help you get guests outside of your comfort zone and information you might not otherwise have access to.
When you translate your podcast, you can expand your options. Let’s look at some ways to do that.
Translate Your Podcast for Free, Then Re-Record
As my grandmother always said, you either pay with your wallet or with your back. This option doesn’t cost much money, but you’ll spend time and effort doing it. You take a transcript of your podcast and enter it, a few paragraphs at a time, into Google Translate or ChatGPT (and ask it to translate). Once you’ve translated the material, copy and paste it into a new script. Then, re-record. Success here depends on a few things:
Pronunciation: How comfortable are you speaking your words in a foreign language? Many of us study Spanish in high school, but how’s your Japanese?
Translation accuracy: Google’s support documents claim that Google Translate may be as much as 94% accurate. But that doesn’t account for colloquialisms (i.e., How would it translate the expression “cat got your tongue” or “in the zeitgeist?”).
Patience: Your willingness to re-record and re-edit.
There’s no getting around it; this is a mammoth task, even just translating a handful of episodes into one other language. So, if you can afford a helping hand, what are your options?
HeyGen: AI-Generated Audio and Video, Plus Translation
Paid translation services like DeepL Translate or QuillBot may have better accuracy and security than the free alternatives discussed above. But these paid services only translate text. You still have to record and edit new audio. Most podcasters want to avoid doing the same task twice, which is why the companies discussed in this section grab so much attention. Let’s move on to the software that promises a voice double.
Our links here are affiliates, which means we’d earn a commission should you choose to buy through them, never at any extra cost to you!
HeyGen is meant for people who need to make many videos quickly and who want an AI avatar in front of the camera. All you need for this sort of translation is a script. If you already have a video, you can upload it and let HeyGen make more videos with your face and voice, incorporating some mannerisms. Or, you can pick out an avatar based on the video and audio of different actors. HeyGen can do this in 28 languages.
Not only does HeyGen help you make videos in different languages, but it can save time if you need to make many videos without re-shooting.
Case Study: Learn With Aji and HeyGen
Let’s get back to my neighbor. His YouTube channel, Learn With Aji, is in Malayalam, and he wants to add content in English. He’s got 43 videos; reshooting and re-editing would be time-consuming. I’ve seen his videos, and they’re warm, engaging, and accessible. So, I suggested that he and I check out HeyGen together.
He created two videos to test the avatar, and HeyGen did what it promised. Here’s what we learned.
Despite HeyGen’s variety of languages to translate your podcast to, it doesn’t appear to have any languages other than English to translate from. So, for our initial test, I suggested he translate his first video from English to English to see what would happen. This was an option I got stuck with as well (Scroll on, dear reader, to learn how your gentle narrator turned into a Mary Shelley creation. Again).
Neither Aji nor I could put our finger on it, but something felt inauthentic to both of us.
Aji’s an energetic guy who moves like a trained public speaker. HeyGen’s avatar software is clearly trained to move like a public speaker, too. But the avatar’s motion appeared to stop and start unnaturally. The uncanny valley effect wasn’t huge, but it was noticeable enough to make both of us hesitant about HeyGen.
The second video, which he translated into Spanish, illustrated a similar problem. Though HeyGen could help Aji make many videos quickly, those videos lacked the human authenticity he needed to expand his audience.
Aji’s got a great tone and presence. I don’t want his videos to lose that. Though HeyGen helps him translate his podcast’s words, it may compromise his natural style.
Case Study 2: Lindsay vs. The Zeitgeist
After working with Aji, I decided to try HeyGen by taking an mp4 video with a static image and an excerpt from my podcast.
Remember, the first time I tried it, HeyGen asked for the language to target. And I thought that meant the language in the video to translate from, not to. So, I picked English. In a blink, HeyGen showed me that my spot in the queue was somewhere above 34,000. They would tell me when my file was translated into English (If I paid more, I could upgrade and skip to the front of the line).
A few hours later, I found that HeyGen had Frankensteined my voice and that of my co-host, Sarah, into one. I’m American, and she’s English. The resulting accent was neither American nor English, but definitely distracted from what we were talking about.
So I tried something new. I submitted a new excerpt from my podcast, with only me talking. I deliberately used the expression “in the zeitgeist” to find out how that would translate. Then, I took the whole clip and submitted it to HeyGen to translate into Spanish. HeyGen’s result came out in formal, academic Spanish with a labored and choppy cadence. “In the zeitgeist” came out as a garbled sentence. The voice sounded depressed.
I wanted to give HeyGen the benefit of the doubt. I tried translating the same excerpt into Polish and then shared it with my friend Peter, a native Polish speaker. He felt the same: the voice was discomfiting, and the language oddly formal. He mentioned that some sentence constructions sounded like Polish and Ukrainian mashed together (much like Spanglish in the US).
Pricing for HeyGen
HeyGen is most effective for a company or individual needing to make lots of videos in a short period of time. But, it’s not cheap to create this kind of software. HeyGen’s pricing shows both ideas. The credit-based system means that HeyGen’s different services can charge different rates per minute of data. Fortunately, HeyGen offers a free tier. I couldn’t believe my luck when I saw how much time HeyGen gives you in the free trial! Here’s how much it costs when you pay annually.
One credit for one minute of video. Your one login has access to over 120 avatars and over 300 voices. Your video will be watermarked.
$24/month gets you 15 credits per month and everything from the free pricing tier. Each video can be up to five minutes long. You also get three instant avatars, premium voices, and auto-captions. And, they remove their watermark.
For $72/month, you get 30 credits per month, and each video can be as long as 20 minutes. You also get everything from the Creator level, plus three logins, three instant avatars, API access, 4K resolution, and a brand kit.
Add-ons include a custom voice clone for $99 a year, the ability to fine-tune an avatar for $39 a month, and a studio avatar for $1000 a year. If you’re already paying for recording and editing software and hosting services, HeyGen’s pricing may not be sustainable. An independent podcaster using this service would have to be sure they want to reach an audience in another language.
So, now that you’re clued up on HeyGen, are there any alternatives?
Eleven Labs: Translation and a Wide Variety of Voices
When I tested HeyGen, I noticed a tiny watermark on the drop-down menu that said, “Voice powered by Eleven Labs.” Curious, I decided to explore Eleven Labs.
The voices Eleven Labs generates are beautiful. They should be: the founders’ original motivation was to create better voice dubbing in their native Poland for Hollywood movies released in English. If, for whatever reason, you can’t speak and you need to make a podcast, Eleven Labs’ Speech Synthesis has hundreds of artificially generated voices for nearly any accent, age, and gender you can imagine. All you do is add the text. Their voice library can speak in 29 languages. Many are tagged with tones such as “Calm” or “Hyped.”
Eleven Labs may seem similar to Descript. The simplest way to differentiate these two companies is that Descript has more editing tools, and Eleven Labs has more voices. More importantly, Eleven Labs has translation: you enter text, and it provides an audio file.
I tried using Eleven Labs’ dubbing tool at the free level. I took the same sample I’d tried out earlier on HeyGen and had Eleven Labs translate it into Spanish and Polish. The resulting audio sounded smooth and fluid. Despite the faster pace, the tone was relaxed and more inviting.
Be careful to back up your files more than once. I logged out and logged back in later in the day to download the sample, and the Spanish one wasn’t in my library. I uploaded the sample and translated it into Spanish again. The result was not Spanish.
I’m not proud of it.
This might not have happened had I used a paid tier. What was that my grandmother said again? No such thing as a free lunch?
Pricing for Eleven Labs
Comparatively speaking, pricing for Eleven Labs is reasonable. They charge by the character, and their Pricing page includes a calculator to estimate the number of characters for different kinds of projects. For example, one ten-minute voice-over is 10K characters.
10K characters per month of speech synthesis and up to three custom voices. This tier includes the ability to create random voices using voice design. You also get access to voices in the Shared Library in 29 languages and automatically dubbed content from 57 languages into 29 languages at a rate of 2K characters per minute (crushing that pesky “from” and “to” problem). All of the content must be watermarked or otherwise attributed to elevenlabs.io.
For $5 a month, you get 30K characters per month, plus everything at the free pricing tier. You also get the ability to create up to 10 custom voices and access to Instant Voice Cloning. This level includes a commercial license so that you can monetize your content.
$22 a month gets you 100K total characters per month (roughly two hours of audio) and everything from the Starter tier. You also can create up to 30 custom voices, access Projects (their long-form speech synthesis editor), Professional Voice Cloning of your own voice, and the ability to add on more characters for thirty cents per 1,000 characters.
For $99 per month, you get 500K total characters per month (about 5 hours of generated audio), 160 custom voices, a usage analytics dashboard, and you can add on characters at a rate of 24 cents per 1K characters.
$330 a month gets you 2 million characters per month (roughly 40 hours of generated audio) and 660 custom voices. Plus, you can add on characters at a rate of 18 cents for 1000 characters.
ElevenLabs also has an Enterprise level, which is priced at “Let’s Talk.” How many voices, languages, and characters can you come up with?
Speak Your Audience’s Language
These translation tools are definitely fun to play with. As shiny as they are, though, they don’t replace human-generated cadence, subtext, or intent. They’re improving rapidly, but for now, they’re still distracting.
Don’t get so tangled up in nifty technology that you lose sight of what makes a great podcast. Language is important, but not as much as meaning, context, and utility.
Yes, please, translate your podcast to grow your audience. Nothing wrong with that. In fact, these tools are a worthwhile exercise to make some bite-sized content and test its impact. Translating hundreds of episodes into dozens of languages in one go is a lot of work for little reward. But, to reach new demographics, content and context matter more. What does your audience want?
What about guests or a co-host from the demographic you want to reach? More importantly, what are the topics they want to learn about? What do you want to learn with that audience?
Our Growth Essentials Course can provide you with all the ways to get your show in front of new audiences. Plus, our IndiePod Community provides camaraderie and support, and that translates to good news for you.
Originally posted on November 22, 2023 @ 2:24 am