Solving AI’s Training Problem With the Wisdom of Crowds

Artificial intelligence (AI) promises to change the world by driving our cars, assessing our insurance claims and writing reports for us, but who is training the AI software to do all these things?

By Danny Bradbury, Contributor

Artificial intelligence (AI) promises to change the world by driving our cars, assessing our insurance claims and writing reports for us, but who is training the AI software to do all these things?

Today, this daunting task is falling to crowds, as crowdsourcing is making it easier for data scientists to create smarter algorithms. For example, the AI that scans the road in your self-driving car may have been taught by an army of commuters sitting on the bus, new mothers working from home, or disabled veterans who have found a new way to make ends meet.

AI algorithms learn their tasks in a process known as supervised learning, which teaches programs what an object looks and doesn’t look like. Data scientists, for instance, train AI to recognize a stop sign by showing it two sets of pictures. The first set contains stop signs, while the second consists of other things. The scientists must electronically label each picture (“stop sign” or “not stop sign”) so that the AI program knows what it is seeing.

With machine learning, the algorithm then processes that information to help it understand—or learn—what looks like a stop sign and what doesn’t. Then, it can apply this understanding to classify new images when it sees them.

While people can learn to recognize things with just a few examples, AI programs need thousands of them to produce accurate results. The more data points they have, the better they get at recognition. But finding and labeling these large collections of data points is a monumental task in itself. So data scientists are turning to crowdsourcing for help.

Reciprocal Feedback

Large communities of people are great at producing examples for supervised learning—while each individual contributes a small part, collectively, the information accumulates to a large data set needed to train AI models. Consider mobile devices. Nexar, which produces a dash camera app for consumers, collects street-level images and driving videos from its users and makes it available for research. This enables data scientists at universities like Berkeley to train autonomous vehicles on images collected in all kinds of weather and light situations.

Yet, user-submitted data can train AI do more than just drive cars. Yelp has been training its machine learning software to identify the characteristics of a restaurant from user-submitted photos. From photos alone, it can assess restaurant’s ambience and take a good guess at whether or not it is child-friendly. Other companies are collecting data from users to help refine what is sold in some of those bars and restaurants. IntelligentX, for example, is a brewing company that refines its beer by encoding everything from the ingredients to the methods used to produce it. It uses a mobile app to ask users what they think of its beer as they drink it; by feeding this crowdsourced information into the machine-learning algorithm, it gets electronically-generated recommendations about how to refine its brewing process. Together, a crowd of enthusiasts is teaching the company’s computers how to make the perfect beer.

But getting the data is just one part of the challenge. The other is labeling it so that AI-training algorithms can understand it. Companies are again turning to crowdsourcing for help, using a concept known as microtasking.

A Global Community of Microtaskers

Microtasking divides the workload among thousands of people distributed around the world. Using a mobile app, they concurrently perform tasks that take just a few seconds each, such as drawing a box around an object in an image or typing a text label describing what they see. In return, they can earn small amounts of money per task, ranging from under a cent to a few cents, based on the complexity of the task at hand.

Some of the most promising developments are happening outside the United States. In 2017, Bangalore, India-based firm Playment raised $1.6m to help develop its microtasking community.

The community, made up of 250,000 contributors, help Playment label AI-training data—identifying objects and other features in images by drawing boxes around them or tracing their outlines. Seventy percent of Playment’s business comes from autonomous driving clients, who use its community to help identify cars, road edges, and specific vehicle lanes.

Other microtasking activities include voice recognition. Crowdsourced AI-training company Clickworker has freelancers spread across 140 countries in Asia, Europe, and North America. Independently, these freelancers record and classify phrases in their own accents to help make voice recognition systems more accurate for users across the world.

The Sky Is the Limit

The stars are aligning for AI. The advent of graphical processing units (GPUs) has supercharged computing power and given companies the ability to train algorithms using more data than ever before. The explosive growth in mobile devices, too, has enabled an army of people to quickly and easily generate and classify that data.

These developments can supercharge the customer experience for companies willing to invest in them. When customers interact with businesses of the future, AI algorithms will give them results that are more accurate and meaningful than a single person could hope to deliver. They will base their recommendations on thousands of small decisions that contribute to a collective intelligence.

It’s the wisdom of crowds at work.