Can Natural Language Processing Distill the Taste of Whiskey?

Researchers are standardizing the language of whiskey flavors—like we've done for colors—by using natural language processing (NLP) and machine learning. Consumers can purchase the flavors they love but avoid the premium prices.

By Betsy Vereckey

Many whiskey aficionados would agree: There’s no greater joy than kicking back with a glass of single malt or a nip of bourbon. But how would they describe exactly what they’re tasting? One might detect a hint of chocolate, while someone else says coffee. This becomes even harder for novice drinkers, who don’t often have the vocabulary to describe what they’re tasting and why they like it.

Researchers at Virginia Tech’s Department of Food Science and Technology are working on developing a standardized language of flavor—like the one we have for colors—by using natural language processing (NLP) and machine learning to establish a common language for whiskey flavors. Graduate student Leah Hamilton, a food scientist earning her PhD at the university and one of the researchers behind the project, says that the idea surfaced in an NLP class she was taking with Chreston Miller, a data consultant with a PhD in computer science who works at the school’s University Libraries division. Their project is the first to use NLP processing to evaluate sensory experiences, Hamilton says. “It just seemed like there should be a better way to [extract the words we use as flavor descriptions]—and we figured we could probably automate it.”

“It just seemed like there should be a better way to [extract the words we use as flavor descriptions]—and we figured we could probably automate it.”

–Leah Hamilton, food science PhD candidate, Virginia Tech

Why Whiskey?

Most of the research that already exists on flavor focuses on wine, Hamilton says, noting that wine was actually one of the first products to have a research-based flavor vocabulary developed for it.

“Whiskey has had some amount of research interest, but most of the published research is on Scotch,” she explains, noting that bourbon and other American whiskeys can be pretty different.” A lot of the flavor research on spirits is done in-house by the big spirits conglomerates, and we wanted to have some tools to do our own research on whiskey. It’s still a growing American market.”

Recent data shows that combined U.S. sales for bourbon, Tennessee whiskey, and rye whiskey rose 8.2 percent, or $327 million, to $4.3 billion in 2020, according to the Distilled Spirits Council of the United States.

Another benefit of studying whiskey? There’s already a large online community of whiskey aficionados who are versed in describing its flavor with popular words such as vanilla, spice, honey, oak, fruit, dry, sweet, malt, cinnamon, spicy, caramel, chocolate, pepper, smoke, orange, apple, toffee, ginger, fruity, sherry, lemon, and clove.

The process by which whiskey companies use these words to market their products varies based on the company’s size. A smaller distillery, for example, is more likely to have one or two people doing tastings and note-taking, whereas a larger corporation has trained panels with 10-20 people tasting their own products, as well as their competitors.

A Machine Learning Algorithm That Makes Sense of the Senses

Hamilton and her team started the project in 2018 by building a dataset of whiskey flavor descriptors from around 7,800 whiskey reviews from two websites, WhiskyCast and Whisky Advocate.

With the help of sensory scientists who had expertise in identifying words used to describe flavor, they developed an algorithm that sifted through each review and extracted words. In terms of popular terms, vanilla was used 3,517 times and honey was used 3,111 times. The algorithm was also able to identify similar words and group them together. Campfire, brine, and smokey were all words used to describe peat, for example.

Using NLP to make these associations helped them analyze faster and more accurately. “Humans can only read so fast, and it is like a sort of semi-skilled kind of work,” Hamilton says. “It takes your attention as well, which can make it kind of exhausting if you’re doing it for too long. The appeal was really to be able to do it faster and on bigger sizes of datasets.”

“Humans can only read so fast, and it is like a sort of semi-skilled kind of work. It takes your attention as well, which can make it kind of exhausting if you’re doing it for too long. The appeal was really to be able to do it faster and on bigger sizes of datasets.”

–Hamilton

Same Flavors at a Cheaper Price

As part of their research, the authors are using predictive modeling to determine the price of the flavors in a given whiskey. Peated whiskeys, which have a smokey flavor that’s leftover from the malting process, go for a bit of a premium, says Hamilton. “We’re expecting that to come up, but we’re also hoping to find stuff that we don’t already know. That’s where the project is right now, trying to actually take the output of our early stages and get some predictions.”

There are plenty of implications for consumers and producers alike. For drinkers, the authors’ research could be helpful to those who may want to buy something that tastes like an expensive whiskey, but is more affordable. “If you don’t want to spend $150 on the bottle, you could spend $25 for something with similar tasting notes,” Miller says.

Hamilton agrees, even though some whiskey experts may not. “I always tell everybody that there’s no reason to buy Jack Daniels because George Dickel tastes exactly the same, and it’s five bucks cheaper.”

Meanwhile, some makers might decide to produce these flavors that are associated with a higher price point, but produce the flavor at a lower cost. The findings could also help producers communicate better with consumers who may want to buy their product, but need more information on what they’re tasting. Winemakers, for example, will usually put tasting notes on the bottle, whereas that practice is less common with whiskey producers, Hamilton says.

“I think that is one of the reasons to use these text-based methods, where you can take whatever language people are using and just try and find which words are related,” Hamilton says. “Maybe that lets you translate between the expert language and the consumer language, or maybe it just helps people find other products that are like the ones they liked, even if they don’t know how to explain that.”

In fact, Miller says that some companies are already spending a lot of time and effort trying to identify descriptors for whiskeys.

“We’ve had some interest from the private sector of how this can affect processing this type of data by taking hopefully a lot of human effort out of it in a good way.”

What’s Next?

The authors hope to expand their work by exploring different food products. They say that they have existing datasets of beer and teas with “hundreds of thousands of reviews” that they may use to do more research.

The authors don’t have an exact completion date yet for the whiskey project, though they plan to complete most of the work before Hamilton graduates next spring.

“It [the project] has so much potential and there’s so much we want to do with it,” Miller said. “We are on the verge of starting to push some papers out and everything and get this more formally out in the world.”

Added Hamilton, “We get a lot of interest when we present this at food conferences from all kinds of industries, but right now, the startup to get in is pretty high, so we’re hoping that the time that we’re investing now will be applicable for other food products.”

Photo by Carl Folscher/Unsplash