During Vu Van’s 20-plus year journey learning English, she explored just about every available avenue to perfect her skills. She attended classes in her home country of Vietnam. She watched American television shows and listened to music in English. She took the immersion approach, living in Denmark before moving to California to pursue a master’s degree at Stanford University. She aced language tests at school.
But when she went out into the workforce, she had a rude awakening: Despite her extensive education, many coworkers still had trouble understanding her.
“Every time I spoke, people would ask me to repeat myself, and they’d just keep asking over and over. You start feeling embarrassed and withdrawing into yourself because you’ve lost your confidence,” she recalls.
Determined to improve, Van hired a speech therapist as a one-on-one coach. In short order, her pronunciation became clearer. Coworkers stopped asking her to repeat herself.
But she quickly realized that this solution represented a gap in the market: Speech therapist coaches are cost-prohibitive for most language learners—they tend to charge anywhere from $100-300 per hour. Van wanted to create a more accessible way for people to perfect the nuances of speaking English.
“I had the idea that maybe I could replicate the [speech therapist] solution and democratize it so that a billion more language learners who face the same challenge could use it—but at a fraction of the cost,” she says.
A Pronunciation Coach in Your Pocket
Five years ago, Van founded ELSA Speak, a platform fueled by artificial intelligence (AI) that helps non-native English learners improve their speaking skills. The app functions as a pronunciation coach in your pocket, using proprietary speech recognition technology and personalized, gamified programming to give users instant feedback via short, fun dialogues.
The platform features more than 3,000 lessons that cover common English idioms, 10-minute daily practice sessions, and an interactive pronunciation dictionary. There’s a free version for both iOS and Android, but to unlock all the content (including the dictionary), users must purchase the PRO version.
AI and machine learning are the beating heart of ELSA, says Van. “Multiple levels of AI have been part of the core technology since the beginning,” she explains. “Since then, it’s been a lot of model training, retraining, adding more data, and fine-tuning.”
Getting ELSA off the ground involved some manual legwork. The platform doesn’t rely upon existing APIs like the ones behind popular voice-recognition tools like Google Translate, Alexa, or Siri. So Van and her team had to hit the streets to collect their own data.
They started in Vietnam, paying passersby to speak English into an early prototype of the app for 20 minutes at a time. They canvassed different acoustic environments, asking beta testers to speak in situations ranging from silent rooms to noisy street corners. From this initial data set, the platform’s automatic speech recognizer (ASR) was born.
Multiple levels of AI have been part of the core technology since the beginning,” she explains. “Since then, it’s been a lot of model training, retraining, adding more data, and fine-tuning.
―Vu Van, ELSA founder
“We use AI and the ASR to identify the mistakes that people make when they speak English. We compare what accented English sounds like compared to American English,” explains Van. “[Then the AI flags,] ‘Hey, you made this very specific mistake,’ and we pinpoint it for you at a phoneme level—which is the single individual sound.” There are approximately 44 phonemes in English. The word “hen” for instance, has three—/h/, /e/ and /n/.
Although ELSA uses a neutral American English accent versus, say, British or Australian English as its control—it’s the language for which Van and her team were able to gather the most data—Van says the company serves a global audience. “We’re not building the product for one country. We aim to serve language learners from different places and different parts of the world.”
Human-Machine Collaboration Behind the Scenes
After identifying pronunciation mistakes at the phoneme level, the AI’s next task is to provide feedback to the user. Training ELSA’s algorithms also involved some manual oversight for this element. Van says the company works closely with linguists and language experts to develop intuitive and easy-to-understand feedback solutions—shapes to make with your mouth or where to put your tongue to pronounce a certain syllable, for instance. The algorithm learns and adapts based on users’ successes and progress.
“Once we bootstrapped the system by collecting data and labeling it manually, we had linguists listen to all of our samples and highlight what mistakes were made,” says Van. “Then, we built a system to learn from what the linguists said to do, and then to auto-scale all of this labeling.”
Although the AI itself is now largely on autopilot, there are still areas of human involvement. AI researchers on ELSA’s team continuously evaluate how the algorithm might be improved, and data scientists help parse the data to figure out which points to use and which to discard. Linguistic experts are still heavily involved in the feedback loop, too. “The linguists double- and triple-check everything,” says Van.
The Next Step: Building a Conversational AI Partner
To date, ELSA has about 14 million downloads. The app has users based in more than 100 countries, with large concentrations in Asia, including Vietnam, Japan, India, and Indonesia. Van says there are also a lot of U.S.-based users, including Spanish-speaking immigrants. Latin American countries like Brazil, Mexico, Chile, and Colombia represent another large chunk of ELSA’s user base.
Today, ELSA employs around 80 full-time people around the world. “Our team has been super global since the beginning—we have offices in Asia, Europe, the U.S., and Latin America,” says Van. “We have team members coming from different backgrounds, and people speaking English with different accents. Why we are doing this resonates with everybody in the company, either because they’ve gone through the challenge themselves or have family members, friends, and colleagues who have gone through it.”
Van says that user feedback to the app has been overwhelmingly positive. The platform has helped people prepare for college interviews and land a spot at their dream university. It’s helped others get jobs. It’s even helped users with their dating lives.
“I once deleted all the English apps because [I didn’t] find them necessary for my learning. I can still improve by myself. Yet, ELSA was the exception,” says Trinh Minh Trang, an ELSA user, in a testimonial for the platform. “It’s not just an app. It’s my diligent coach, 24/7.”
Another impact of the app is that it leads to very quick improvement over the first few months of use. “Our internal data shows that 90 percent of users show extremely fast progress in the first three months, improving the clarity of their speech,” says Van. In a recent user survey, 95 percent also reported feeling significantly more confident after using ELSA for several months.
This is a stat Van is particularly proud of. “When it comes to using a language, if you’re confident, you go a long way,” she says. “Once you’re confident, you keep speaking, and once you keep speaking, you improve.”
When it comes to using a language, if you’re confident, you go a long way. Once you’re confident, you keep speaking, and once you keep speaking, you improve.
In the future, Van hopes to turn ELSA into a comprehensive language-learning companion—a conversational AI that can absorb your communications in the background (with permission), and provide real-time feedback.
“Right now, we are teaching you English. But we also want to turn into a conversational partner. We’re constantly looking for new technology and innovation within the AI world, and how we can close the gap between research and academia to real application,” she says.
Although ELSA continues to innovate, Van says the core company mission hasn’t changed from her initial vision. “We want to enable a billion people to learn to speak better English, to let their voice be heard and unleash their full potential—whether that’s in life, in education, or out in the world.”
Lead photo courtesy of ELSA