Data Science and Biology Are Being Used to Map Human Behavior Online

Researchers and data scientists are harnessing the transformative power of data for the common good by studying digital media consumption, using algorithms to shut down bad bots and thwarting fake news with digital DNA mapping.

By Stephanie Walden, Contributor

From socializing via video calls to onscreen workouts to app-fueled food delivery, digital experiences are deeply engrained in our everyday lives. This has become all the more evident over the past several weeks, as people around the world move work, hobbies, and social lives online.

YOU MAY ALSO LIKE: During Uncertainty, Online Meetings Are Keeping Us Connected

Until recently, however, little was being done to comprehensively codify humans’ digital behavior and the practices that constitute healthy—or even real—online interactions. In January, Stanford University announced the Human Screenome Project, an effort to record and map the way people use devices to consume digital media.

It’s not the only instance in which data scientists are taking a page out of biology’s playbook. Here are three inventive ways researchers and data scientists are harnessing the transformative power of data for the common good, starting with the researchers at Stanford.

The Human Screenome Project

The Human Screenome Project aims to quantify online behavior and glean insights into an intimate yet relatively unstudied part of people’s personal and social lives: screen time. The body of research is part of the umbrella topic of “screenomics,” a term largely coined by the initiative itself, but which may soon be a solidified field of study.

The project relies upon voluntarily submitted user data and a robust computational framework. Using moment-by-moment screen grabs captured every five seconds during multiple intervals throughout the day, researchers glean a detailed “screenome analysis” that’s bolstered by algorithms that parse data points related to content consumption and user behavior. Tracked actions include “channel switching, swiping, zooming, and navigating,” according to a CNET interview with Nilam Ram, a researcher associated with the initiative.

An overview of the project on Stanford’s Cyber Policy Center posits there are real-world applications for screenomics. For instance, the site notes studies have found telling relationships between readers’ interpretation of news stories and their personal messaging habits. Screenomes also tend to differ markedly across countries, cultures, and generations. Ultimately, studying screenomes could inform policies for healthy digital interactions, such as establishing data-driven screen time recommendations for children and adolescents.

“In the same way that genomics reshaped understanding, prevention, and treatment of different diseases, the screenome will inform and reshape understanding of a broad array of social problems,” write the Stanford researchers.

Bot Blockers: ‘Digital Antibodies’ of the Internet

Bots—automated software applications that run scripts to conduct repetitive tasks online—are responsible for more than half the activity on the internet.

As with most technologies, bots are agnostic—not inherently good or bad—but beholden to the directives of human programmers. This is to say that there are literally “good bots” and “bad bots.” Bad bots are a significant problem for businesses: When this type of traffic floods a system, it can cause network downtime or outages, which translates to lost revenue—and that’s before taking into account the soaring costs of a data breach.

“Blocking bad bots used to be pretty simple with web applications or firewalls, because bots were pretty basic. But [the hackers] have adapted.”

—Benjamin Fabre, co-founder and chief technology officer, DataDome

Benjamin Fabre is the co-founder and chief technology officer of DataDome, a company that provides protection from bad bot traffic. DataDome’s artificial intelligence (AI) learns from hundreds of billions of user “events,” or data points, per day—the resolution of the screen on which a user accesses a site, how their mouse moves, the direction of the page, etc. It records this information to differentiate between human and bot behavior.

Bad bots, Fabre explains, come in dozens of different classifications, from credential-stuffing bots that steal credit card information on e-commerce sites to bots that scan media organizations for network vulnerabilities.

Stopping bad bot traffic has become more challenging for digital businesses, says Fabre. “Blocking bad bots used to be pretty simple with web applications or firewalls, because bots were pretty basic,” he says. “But [the hackers] have adapted. Now, they are using artificial intelligence to mimic user interactions. They are moving the mouse, scrolling on the page, filling out forms using AI-trained algorithms, etc. Old methods of protection aren’t relevant anymore.”

To fight fire with fire—and gain an upper hand in what Fabre calls a “technological war with the hackers”—DataDome also uses AI. Thanks to its robust client base, DataDome has a massive amount of data with which to train its algorithms. It deploys these algorithms to seek out and shut down bad bots—sort of like the digital antibodies of the internet.

Fabre says the metaphor to human biology is apt. “It’s a relevant comparison. When we turn on [DataDome] protection on a website, we see that the bad bot traffic tries to change shape to work around the protection. Our AI adapts in turn.”

Fabre says that, for now, DataDome appears to have the upper hand thanks to its huge data set and advanced AI-training capabilities. He notes, however, that the race for computational power is heating up. “Our R&D team makes sure the evolution of our protection is always upto date and a step ahead of hackers,” he says.

Digital DNA Mapping

It’s not just the business world that suffers from nefarious attacks. Social media, too, can be influenced by bad bot traffic, particularly when it comes to the spread of fake news.

YOU MAY ALSO LIKE: How Game Designers Are Outsmarting Fake News

In 2018, researchers Stefano Cresci and Maurizio Tesconi won the SAGE Ocean Concept Grant for their idea to create a “Digital DNA (DDNA) Toolbox,” or a set of methods to help scientists make sense of user data on social media sites. One of the company’s main goals is to identify sources of fake news. Two years later, DDNA Toolbox is an ongoing evolution.

“The basic premise [for a Digital DNA Toolbox] is quite simple. We look at the sequence of actions that users perform on social media—so when they share a post or comment or something on Facebook, Twitter, Instagram, Reddit, etc. We assign a character to each of these actions.”

—Stefano Cresci, Ph.D., researcher at the Institute of Informatics and Telematics and DDNA Toolbox developer

Cresci holds a Ph.D. in information engineering and is currently a researcher at the Institute of Informatics and Telematics of the National Research Council in Pisa, Italy. He explains how DDNA fingerprinting and modeling works in layman’s terms: “The basic premise is quite simple. We look at the sequence of actions that users perform on social media—so when they share a post or comment or something on Facebook, Twitter, Instagram, Reddit, etc. We assign a character to each of these actions. So, we end up with quite long strings of characters encoding user behavior.”

AI is part of the process, as well. “The algorithm for computing the similarities between the DDNA streams of different accounts leverages bioinformatics and machine learning. In other cases, we look at the temporal synchronization of the actions of different accounts, and we examine those with deep learning.”

Automated accounts like social bots, says Cresci, tend to have more similar DDNA sequences than human-operated accounts. Humans are more heterogenous, less predictable. Cresci believes this is largely because accounts responsible for sharing fake news have a common goal: to inflate the popularity of a politician or brand, or to spread misinformation. When examined as part of a larger picture versus as individual tweets or comments, patterns emerge.

Today, the process for spotting fake news is getting a “little bit more complicated,” admits Cresci. “Social bots are not the only issue we face. There are trolls—human-operated accounts—and their [sequence] is somewhat in between that of social bots and legitimate human users.”

Anyone studying the issue of fake news right now is struggling with these challenges, notes Cresci. “The fact that hackers are using our same technologies [like AI]… that has changed the landscape. Until some time ago, we could look at one individual account and spot the differences between [a bot] account and human ones. “That’s now much more difficult,” he notes. “Looking at individual accounts is not enough anymore; we need to spot suspicious similarities between large groups of accounts. That is a red flag for automation now.”

As our on-screen lives become more entangled with the lives we lead offline, data scientists are turning to biological studies to provide insight into a new state of being—one in which humans and machines are increasingly intertwined.