Sections

Commentary

How to address new privacy issues raised by artificial intelligence and machine learning

Gary Shapiro, president and CEO of the Consumer Technology Association, speaks on artificial intelligence during a keynote address at the 2019 Consumer Electronics Show (CES) in Las Vegas, Nevada, U.S. January 8, 2019. REUTERS/Steve Marcus - RC1CDD6C75F0

For generations, companies have collected large amounts of information about consumers and have used it for marketing, advertising, and other business purposes. They regularly infer details about some customers based on what others have revealed. Marketing companies can predict what television shows you watch and what brand of cat food you buy because consumers in your demographic and area have revealed these preferences. They add these inferred characteristics to your profile for marketing purposes, creating a “privacy externality” where information others disclose about themselves also implicates you.

Machine learning increases the capacity to make these inferences. The patterns found by machine learning analysis of your online behavior disclose your political beliefs, religious affiliation, race, ethnicity, health conditions, gender and sexual orientation, even if you have never revealed this information to anyone online. The presence of a digital Sherlock Holmes in virtually all online spaces making deductions about you means that giving consumers control over their own information will not protect them from indirectly disclosing even their most sensitive information.

For this reason, policymakers need to craft new national privacy legislation that accounts for the numerous limitations that scholars such as Woody Hartzog have identified in the notice and consent model of privacy that has guided privacy thinking for decades. The exacerbation of privacy externalities created by machine learning techniques is just one more reason regarding the need for new privacy rules.

Should New Privacy Legislation Regulate Artificial Intelligence Research?

For decades, university institutional review boards (IRBs) have regulated academic research, aiming primarily to protect human research subjects against harm from the research process itself. But now online companies regularly conduct similar research on human subjects outside the academic setting, often through simple A/B testing aiming to improve customer engagement, without needing to seek IRB approval. As Facebook discovered several years ago in the reaction to its emotional contagion study, many people are concerned about this loophole and have called for greater controls on company-sponsored research.

Some businesses have responded by setting up internal review boards to assess the ethical issues associated with their projects that might have significant ethical consequences. Interestingly, these reviews go beyond trying to protect the human research subjects and touch on the broader question of whether the insights gained from the research might have harmful downstream consequences on a wider population.

The recent controversy over the development of facial recognition software that can predict the sexual orientation of people based on their facial characteristics reveals why ethical review must move beyond protecting human subjects. Dubbed “gayface” software, this experimental facial recognition tool was trained on publicly available photographs and claims to predict sexual orientation from facial characteristics alone. With no foreseeable beneficial use of this technology, it might not be ethical to develop an algorithm when it can only be used for harmful, discriminatory purposes.

Ethical review of this research has almost nothing to do with protecting the rights of the human subjects involved in training the software. The real ethical problem is the downstream use of the technology to harm other individuals in vulnerable groups. Clearly, universities, research institutions, companies, and government agencies will have to evaluate research programs using a wider lens than just protecting the rights of human subjects. With the development of machine learning capabilities able to predict the most intimate and sensitive aspects of people’s lives, these ethical questions urgently need to be addressed.

But should these broad ethical issues be resolved through legislation? Almost everyone who thinks about the “gayface” study is troubled by it. But few want the government to outlaw such research. Congress might need to monitor the research process and encourage the development of best practices in this area. Outright legislative bans on research opens risky questions about political interference with scientific research.

Should Privacy Legislation Address Bias in AI?

Current non-discrimination laws cover specific sectors such as housing, employment, credit, and insurance and specific groups of people who might be the victims of discrimination because of their race, gender, religion, national origin, age or disability. There is no exemption from these rules simply because a new advanced analytic technique such as AI or machine learning is being used.

It is an important to consider amending current discrimination laws to include sexual orientation and gender identity as protected characteristics, as the recently-introduced Equality Act does. But even though such improvements in discrimination law would affect the use of AI systems and other statistical models, they are not part of the governance of AI as such.

It might make sense to provide an incentive for companies to go beyond anti-discrimination law to assess possible discriminatory effects on other groups and in other contexts. But which contexts and which additional groups? And why only when AI-systems are used and not older statistical techniques? Ultimately, this is a reform of anti-discrimination laws, not a matter to be addressed in privacy or AI legislation alone.

Should a new AI law contain a requirement for explainability?

Current law already has requirements for explanations in certain circumstances. For instance, financial service companies are required to list the major factors in credit scores and they must provide reasons for any adverse actions in a lending decision.

Many machine learning programs raise new issues of explainability. Models derived from machine learning are hard to explain, even if the underlying algorithm is transparent to the user, because the pattern of interactions is very complex and often uses clusters of factors that make no intuitive or theoretical sense. DARPA has documented this trade-off between accuracy and explainability and has funded research aimed at increasing the level of explainability for each level of accuracy.

This trade-off between accuracy and explainability need not be dictated by a one-size-fits-all approach embodied in law. The best trade-off will differ by sector because the risks and benefits of analytic techniques depends less on their intrinsic characteristics and more on their use domain.

Even within a context such as health care, the trade-offs differ. As data scientists have documented, doctors and nurses in a Canadian hospital take preventive action based on a machine learning correlation between the onset of a dangerous fever in premature infants and the unusual stability of vital signs twenty-four hours earlier, even though they have no causal explanation for this correlation.

In other medical contexts, data scientists and medical professionals are aware of the dangers of relying on correlations that might reflect confounding treatment variables such as the fact that asthma patients have a better record of surviving severe cases of pneumonia than the average patient. This correlation is real, but reflects that doctors know people with asthma are at greater risk from pneumonia and hospitalize them right away. It would be a deadly mistake to use this correlation as the basis for hospitalization decisions.

For this reason, some researchers consciously use less accurate statistical models that allow them to see clearly the effect of each factor on the variable being predicted. In this way, they can avoid problems such as a confounding treatment variable hidden in the models created by more accurate, but less understandable, machine learning techniques. It would be risky to use privacy or AI legislation to reduce these complex, context-dependent questions about explainability and accuracy to a single rule requiring the same thing in all circumstances.

The future of AI and privacy legislation

The new privacy laws that Congress is considering would create new obligations regarding disclosure, consent, access, correction, portability and reasonable use of personal information that will certainly apply to AI systems. But legislators should be careful about devising new obligations that apply uniquely to AI systems. AI is heavily context-dependent, and progress will be made not by trying to regulate the technology as such, but by examining its use in each sector and setting out specific guidelines governing that use if necessary.

In the meantime, the Trump administration’s recent Executive Order on AI offers a useful perspective by instructing agencies to develop “regulatory and non-regulatory” approaches regarding AI technologies, under guidance provided by the Office of Management and Budget. Oversight of this process by the congressional committees with relevant jurisdiction would be a good way to ensure that the review is done properly and it might provide insight into ways in which further legislation is needed.