Why Digital Equity Depends on Democratized Data

By Marty Graham, Contributor

Many government officials and residents agree that the shift from brick and mortar to pixels and data has transformed, streamlined, and improved daily operations.

Citizens can pay their utility bills and taxes, apply for permits to have a parade, and report neighborhood problems without traveling to sometimes inconveniently located municipal offices. Governments similarly reap the benefits of digital technologies. In Indiana, for instance, a mobile portal sends users traffic updates directly to their phones, easing congestion; meanwhile, in Las Vegas, sensors and machine learning keep the city on top of picking up trash before it becomes a health hazard and eyesore.

YOU MAY ALSO LIKE: Urban Pulse: The State of Smart Cities Today (and Tomorrow)

But as more of the business of governing moves to the web, data scientists increasingly worry that the user information they leverage to improve existing services, make policy decisions, and plan for the future is being skewed by the amount of data they collect from frequent users—and the dearth of data from people who aren’t engaged. The heavy tilt of data toward current users can result in bias that benefits one group over the other.

“[This inequity] is one of the most important pieces [of virtual government],” Michael Lake, CEO of Leading Cities, said in a September interview shortly after his company released its smart city report card, a measure of how 24 cities worldwide are accomplishing their smart city goals. “A smart city has the people of a city at its core,” he continued. “If you’re not servicing their immediate needs, then by definition you can’t achieve a smart city status.”

LEARN MORE ABOUT DIGITAL CITIES

Data Bias: Who Is Left Out?

From the mid-1990s to 2016, the number of American adults connected to the internet increased from 6 percent to about 90 percent, according to Pew Research. In 2016, the numbers flattened, leaving about 10 percent of adults that either don’t have access to or have opted not to access the internet. A good portion of those people are older, though a growing number of seniors use the web almost as frequently and skillfully as any millennial.

The 10 percent of the population that’s not digitally connecting to local and regional government is facing a major challenge: These individuals are the most likely to need the assistance of government agencies…

The top reasons people are not on the web or have limited access tend to overlap. Race and ethnicity play a big role: Black and Hispanic people are nearly twice as likely to not use the web as white people, Pew Research found. When they connect, it’s mainly through their phones, which is limited in ability compared to a computer. Lower income and limited education are also significant factors, and the longstanding divide between rural and urban settings remains, heavily due to lack of broadband or Wi-Fi access. Meanwhile, language and citizenship barriers factor in, and most people who face one barrier face several.

The 10 percent of the population that’s not digitally connecting to local and regional government is facing a major challenge: These individuals are the most likely to need the assistance of government agencies, but are the least able to access the necessary and available resources, including food and housing benefits, health insurance, public transit information, and help with job searches.

How do you engage vulnerable populations when they’re not providing personal data?

Their digital absence also deepens the problem that many governments are already grappling with: How do you engage vulnerable populations when they’re not providing personal data?

According to Jeremy Gillula, tech projects director at the Electronic Frontier Foundation, that becomes a serious problem when it comes to training algorithms for public service.

“The first and foremost rule is: Don’t just assume a data set you found will be representative of the world,” Gillula explains. “The best way to ensure high quality data is to first think about the problem you’re trying to solve, brainstorm about the possible ways a solution could go wrong, and then collect data specifically to solve the problem while addressing those issues.”

Data bias has the potential to skew outcomes in favor of people whose advantages include owning devices and having greater web access.

“There’s two types of bias, and both can undermine democracy,” Gillula says. The first is socio-cultural bias, which includes racism, sexism, ageism, and discrimination against the disabled. The second, he continues, is statistical bias, which is when a model consistently makes the wrong prediction about something in a systematic way. For example, a model to predict which properties in California might be susceptible to earthquake damage could consistently under-predict or over-predict the possible damage, regardless of the value of the property.

“The harm to society, there, is that resources just get wasted or allocated poorly,” Gillula says.“But when these two types of bias intersect, it can be really awful. For instance, if you had that earthquake damage model, but it consistently under-predicted damage in poorer communities and over-predicted damage in wealthy communities, then the model would end up reinforcing inequalities that already exist in society—all with the veneer of being ‘scientific.'”

But Gillula believes there are ways of trying to add people who aren’t leaving a trail of data behind because they don’t use the internet.

“The methods are generally called ‘resampling’ methods,” he says. “They vary in their technical details, but at a really basic level, they perform by taking the data that is available from people who aren’t participating and, for a very simple example, [extracting] what you know from the data gathered from 1,000 people who normally aren’t visible and enlarging it to represent thousands more.”

There are people connecting with government who demographically resemble those who do not—individuals who have applied for a handicapped parking space near their homes or seniors who are connected to social services and transit subsidies, but live in a neighborhood where many others are not; immigrant families whose children attend school may resemble immigrants without children in terms of their needs and challenges, and how they access city resources like health fairs and food programs.

But, he notes, what the algorithm produces should still be reviewed by humans before it is applied, and those applications should be monitored by people—skeptical, concerned people.

More Privacy, More Data Gaps

As privacy concerns grow and people gain more rights to control their information, what data is gathered—including data gathered by governments—will change. Governments increasingly depend on sensors—from traffic cameras to smart meters that measure utility usage—an approach that relies on gathering data in much the same way as Facebook and Google, often referred to as surveillance capitalism.

“Letting people opt out of being collected only increases the likelihood of biased data sets if your primary data collection model is surveillance capitalism,” Gillula says. “If you’re doing your data science right, you can’t just rely on vacuuming up whatever data is handy; you have to explicitly go out and collect data, which may mean incentivizing people to participate in something like an anonymous study.”

Most people understand that interactions with the government are public—think court and voting records. But people don’t necessarily know—or agree—for their daily movements to get to work, pick up dry cleaning, and drive to pick up their kids can be public domain.

Udo Kock, deputy mayor of Amsterdam—the 13th smartest city in the world—believes governments must assure people that they’ll use their data responsibly. “Don’t think of smart cities as just a technology solution, think of it as a collaboration,” he said in an interview with CNN last February. “Involve communities, involve citizens—it’s very important for governments to work together with businesses and private citizens.”

Government by the People, for the People

The recurring theme among scientists, social critics, and concerned residents is that data and its use must be of service to the people who provided it and a tool, rather than an imperative, for the people who use it.

For example, an algorithm in Johnson County, Kansas, predicted higher recidivism for some mentally ill people who had already been incarcerated. County officials used that information to aggressively provide social services and mental health treatment to those individuals as part of their terms of release, not as a justification for intensely policing them until they offended again.

This effort succeeded because it relied on data specific to the people Johnson County wanted to help, and their information was available because of their previous engagement with the criminal justice system. Without data from that cohort (mentally ill people are less likely to be on the web), detecting the pattern and their vulnerability, and then matching services to people who need them, is far less likely to happen.

In 2017, the first year of the program, the algorithm accurately predicted who would return to jail within a year. With that information, the county was able to identify people in whom to invest social and mental health resources.

“It’s far too easy to make assumptions about how ‘everyone’ fits into our ideal ‘smart’ environment…[But] when we’re creating the environment of the future, it should work for real people, not idealized and unrealistic model citizens.”

—Hannah Kaner, smart cities advocate

“It’s far too easy to make assumptions about how ‘everyone’ fits into our ideal ‘smart’ environment,” says Hannah Kaner, a smart cities advocate. “It is easier still to assume that the people we are designing for are able-bodied, digitally literate, and financially stable. [But] when we’re creating the environment of the future, it should work for real people, not idealized and unrealistic model citizens.”

The idea of building inclusion—and accepting how challenging it is to get the data right—into government AI efforts has become part of the data science industry’s short list of ethical topics. At the March 2018 G7 conference, for instance, Canada and France called for an international study group to figure out how to implement inclusionary policies; and three months later, India announced a national strategy for AI called AI for All.

“Countries should take note of India’s goal of inclusive technology leadership,” says Tim Dutton, founder of the digital publication Politics + AI. “AI can be used to increase productivity, competitiveness, and economic development, but it must also be used to enhance the ability of every person to actively and fully participate in all aspects of life that are meaningful to them.”