# Sampling Bias: a Careless Knife.

Election polls, medical studies, and even the creation of artificial intelligence have one thing in common: they use a sample of the available data to get the job done. This week’s Occupy Math will look at several examples of ways that badly designed sets of samples can result in bad outcomes (ranging from unfortunate to deadly). Occupy Math has already posted on a closely related area – picking a benchmark point – and the classic example of a benchmark chosen to obscure the truth is choosing 1998, a very hot year, as the starting point for measuring increase in global temperature. Sampling bias is different from selecting a benchmark point, because you are picking a large number of data, not just one point.

Polls that try to predict the outcome of an election call people on the telephone. Since it is possible to get a lot of information about where a person lives from the number on their land-line, polling organizations can avoid geographic and economic bias. They do this after the very harsh lesson in which the Chicago Tribune had a headline awarding the Truman/Dewey election to the loser.

The problem was that rich people were pro-Dewey and far more likely to have telephones in 1948.

It would nice to think that polling organizations could avoid making this mistake again but that turns out to be too much to hope for. The younger a person is, the more likely they are to have a cellphone but no land-line. That means that Obama’s solid victory in 2012 was under-predicted by a number of polling organizations because they only called land-lines. The organization 538 did much better than other organizations by using more sophisticated statistics – which compensated for many known biases in the sampling. Contemplate for a minute the effect of biasing any sort of poll or study against soliciting the input of younger people.

How can sampling bias be deadly?

The biggest sampling bias in the history of science is leaving women out of most medical studies. An earlier Occupy Math looked at this issue. Women (and female mice and rats) are left out of studies because, if they became pregnant, the massive systemic changes would mess up the studies. This is not only completely bogus given modern analysis techniques, but it has deadly effects. It masks, for example, the fact that women have a different set of symptoms when having a heart attack. Being sent home with a prescription for a sedative to help you with your anxiety when you are actually having a heart attack fits Occupy Math’s definition of deadly.

Bias of a different sort appears in the treatment of black patients with cancer. The linked study looked at a physician’s treatment of their black patients and also independently measured their degree of racial prejudice. Not only did the black patients of racially biased physicians receive inferior care, this bias will show up in the statistics about blacks with cancer as a set of inferior outcomes, which in turn affects treatment.

Accurate data on minorities is critical because they sometimes have different reactions to medications and treatments.

Before moving on to our next example, let’s reflect for a moment on what we need to do to avoid sampling bias. Men and women, Europeans and Africans, the young and old – these are pairs of groups that are different in ways that inform how a sample should be taken. Including everyone in a representative manner is absolutely the correct solution for a political poll and a work of idiocy for a medical study. Medical studies should be done with adequate, separate samples of men and women, blacks, white, Hispanics, and Asians. There is no “correct” sampling strategy, rather there are appropriate sampling strategies for each situation.

In politics, the average outcome is king; in medicine it is often useless, even harmful.

There has been a lot of kerfuffle in the media lately about artificial intelligence. Several completely unqualified people with lots of money or a Nobel prize have been sounding like they took the Terminator movies way too seriously. A technique called deep learning has managed to do some extraordinary things lately like beating the current human Go champion a decade earlier than any reasonable prediction and learning to play Atari video games just by watching examples of play. Occupy Math feels these results are inordinately cool – but they are not signs of the AI apocalypse.

Humans have a survival instinct, they show off to get mates, they compete for resources for themselves or their children. Most of all they are self-aware. We don’t currently know what self-awareness is to the resolution needed to code it into an AI and, here’s a thought, we could avoid giving our AI’s the kind of personalities and motivations that would make them want to take over. Misters Musk and Hawking, mistaking genius in one area for competence in another, are engaging in a type of anthropomorphism that Isaac Asimov called the Frankenstein complex.

Occupy Math still owes his readers an example of sampling bias in the programming of an AI. Google is building an AI-based image classifier. Like most AIs there are two parts to its programming. First, you build a general purpose learning algorithm (e.g., deep learning). Second, you feed it a whole bunch of examples of pictures with labels. The algorithm learns from these examples. The problem is that a beta version of the algorithm identified black people as gorillas. In a badly titled article, Artificial Intelligence’s White Guy Problem, the New York Times managed to mistake sampling bias for racial bias. If the labeled pictures of humans were mostly white people, and the gorillas were a better shade match for the black people, then the algorithm correctly learned from its example while incorrectly modelling reality. This outcome of a bad sample design is unfortunate – but as example-driven AI becomes more prevalent, it also gains potential to be deadly.

We require a science of sample choice!

The Times is a crusader against racial bias – and the AI did have racial bias, but an innocent bias caused by (quite literally) a lack of experience – so its viewpoint is understandable. In a future post Occupy Math will discuss experimental design, a solid beginning to a science of sample choice. This is a field of statistics that gives you tools for avoiding the types of errors discussed in this post. Do you have examples of sample bias that have caused problems or personally inconvenienced you? Occupy Math would like to hear about it in your comments or tweets.

I hope to see you here again,
Daniel Ashlock,
University of Guelph,
Department of Mathematics and Statistics