Stereotypes and Statistical Generalizations

This post is an extended version of a piece which originally appeared at the Prindle Post.


Let’s look at three different stories and use them to investigate statistical generalizations.

Story 1

This semester I’m teaching a Reasoning and Critical Thinking course. During the first class, I ran through various questions designed to show that human thinking is subject to predictable and systematic errors. Everything was going swimmingly. Most students committed the conjunction fallacy, ignored regression towards the mean, and failed the Wason selection task.

I then came to one of my favorite examples from Kahneman and Tversky: base rate neglect. I told the students that “Steve is very shy and withdrawn, invariably helpful but with little interest in people or in the world of reality. A meek and tidy soul, he has a need for order and structure, and a passion for detail,” and then asked how much more likely it is that Steve is a librarian than a farmer. Most students thought it was moderately more likely that Steve was a librarian.

Delighted with this result, I explained the mistake. While Steve is more representative of a librarian, you need to factor in base-rates to conclude he is more likely to actually be a librarian. In the U.S. there are about two million farmers and less than one hundred and fifty thousand librarians. Additionally, while 70% of farmers are male, only about 20% of librarians are. So for every one librarian named Steve you should assume there are at least forty-five farmers so named.

This culminated in my exciting reveal: even if you think that librarians are twenty times more likely than farmers to fit the personality sketch, you should still think Steve is more than twice as likely to be a farmer.

This is counter-intuitive, and I expected pushback. But then a student asked a question I had not anticipated. The student didn’t challenge my claim’s statistical illegitimacy, he challenged its moral illegitimacy. Wasn’t this a troubling generalization from gender stereotypes? And isn’t reasoning from stereotypes wrong?

It was a good question, and in the moment I gave an only so-so reply. I acknowledged that judging based on stereotypes is wrong, and then I…

  1. distinguished stereotypes proper from empirically informed statistical generalizations (explaining the psychological literature suggesting stereotypes are not statistical generalizations, but unquantified generics that the human brain attributes to intrinsic essences);
  2. explained how the most pernicious stereotypes are statistically misleading (e.g., we accept generic generalizations at low statistical frequencies about stuff we fear), and so would likely be weakened by explicit reasoning from rigorous base-rates rather than intuitive resemblances;
  3. and pointed out that racial disparities present in statistical generalizations act as important clarion calls for political reform.

I doubt my response satisfied every student — nor should it have. What I said was too simple. Acting on dubious stereotypes is often wrong, but acting on rigorous statistical generalizations can also be unjust. Consider a story recounted in Bryan Stevenson’s Just Mercy:

Story 2

“Once I was preparing to do a hearing in a trial court in the Midwest and was sitting at counsel table in an empty courtroom before the hearing. I was wearing a dark suit, white shirt, and tie. The judge and the prosecutor entered through a door in the back of the courtroom laughing about something.

When the judge saw me sitting at the defense table, he said to me harshly, ‘Hey, you shouldn’t be in here without counsel. Go back outside and wait in the hallway until your lawyer arrives.’

I stood up and smiled broadly. I said, ‘Oh, I’m sorry, Your Honor, we haven’t met. My name is Bryan Stevenson, I am the lawyer on the case set for hearing this morning.’

The judge laughed at his mistake, and the prosecutor joined in. I forced myself to laugh because I didn’t want my young client, a white child who had been prosecuted as an adult, to be disadvantaged by a conflict I had created with the judge before the hearing.”

This judge did something wrong. Because Bryan Stevenson is black, the judge assumed he was the defendant, not the defense. Now, I expect the judge acted on an implicit racist stereotype, but suppose the judge had instead reasoned from true statistical background data. It is conceivable that more of the Black people who enter that judge’s courtroom — even those dressed in suit and tie — are defendants than defense attorneys. Would shifting from stereotypes to statistics make the judge’s behavior ok?

No. The harm done had nothing to do with the outburst’s mental origins, whether it originated in statistics or stereotypes. Stevenson explains that what is destructive is the “accumulated insults and indignations caused by racial presumptions,” the burden of “constantly being suspected, accused, watched, doubted, distrusted, presumed guilty, and even feared.” This harm is present whether the judge acted on ill-formed stereotypes or statistically accurate knowledge of base-rates.

So, my own inference about Steve is not justified merely because it was grounded in a true statistical generalization. Still, I think I was right and the judge was wrong. Here is one difference between my inference and judge’s. I didn’t act as though I knew Steve was a farmer — I just concluded it was more likely he was. The judge didn’t act the way he would if he thought it was merely likely Stevenson was the defendant. The judge acted as though he knew Stevenson was the defendant. But the statistical generalizations we are considering cannot secure such knowledge.

The knowledge someone is a defendant justifies different behavior than the thought someone is likely a defendant. The latter might justify politely asking Stevenson if he is the defense attorney. But the latter couldn’t justify the judge’s actual behavior, behavior unjustifiable unless the judge knows Stevenson is not an attorney (and dubious even then). A curious fact about ethics is that certain actions (like asserting or punishing a criminal) require, not just high subjective credence, but knowledge. And since mere statistical information cannot secure knowledge, statistical generalizations are unsuitable justifications for some actions. (I’ll be dedicating a whole blog post to this feature of knowledge in a new series where I explain my central philosophical projects.)

Statistical disparities can justify some differential treatment. For instance, seeing that so few of the Black people in his courtroom are attorneys could justify the judge in funding mock trial programs only at majority Black public schools. Indeed, it might even justify the judge, in these situations, only asking Black people if they are new defense attorneys (and just assuming white people are). But it cannot justify behavior, like harsh chastisement, that requires knowledge the person did something wrong.

I didn’t do anything that required knowledge that Steve was a farmer. So does this mean I’m in the clear? Maybe. But let’s consider one final story from the recent news:

Story 3

Due to COVID-19 the UK canceled A-level exams — a primary determinant of UK college admissions. (If you’re unfamiliar with the A-levels they are sort of like really difficult subject-specific SAT exams.) The UK replaced the exams with a statistical generalization. They subjected the grades that teachers and schools submitted to a statistical normalization based on the historical performance of the student’s school. Why did the Ofqual (Office of Qualifications and Examinations Regulation) feel the need to normalize the results? Well, for one thing, the predicted grades that teachers submitted were 12% higher than last year’s scores (unsurprising without any external test to check teacher optimism).

The normalization, then, adjusted many scores downward. If the Ofqual predicted, based on historical data, that at least one student in a class would have failed the exam then the lowest scoring student’s grade was adjusted to that failing grade (irrespective of how well the teacher predicted the student would have done).

Unsurprisingly, this sparked outrage and the UK walked back the policy. Students felt the system was unfair since they had no opportunity to prove they would have bucked the trend. Additionally since wealthier schools tended to perform better on the A-levels in previous years, the downgrading hurt students in poorer schools at a higher rate.

Now, this feels unfair. (And since justifiability to the people matters for government policy, I think the government made the right choice in walking back the policy.) But was it actually unfair? And if so, why?

It’s not an issue of stereotypes — the changes weren’t based on hasty stereotypes, but rather on a reasonable statistical generalization. It’s not an issue of compounding algorithmic bias (of the sort described in O’Neil’s book) as the algorithm didn’t produce results more unequal than actual test results. Nor was the statistical generalization used in a way that requires knowledge. College admissions don’t assume we know one student is better than another. Rather, they use lots of data to make informed guesses about which students will be the best fit. The algorithm might sometimes misclassify, but so could any standardized test.

So what feels unfair? My hunch is the algorithm left no space for the exceptional. Suppose four friends who attended a historically poor performing school spent the last two years frantically studying together in a way no previous group had. Had they sat the test, all could have secured top grades — a first for the school. Unfortunately, they couldn’t all sit the test, and because their grades are normalized against previous years the algorithm eliminates their possibility of exceptional performance. (To be fair to the UK, they said students could sit the exams in the fall if they felt they could out-perform their predicted score).

But what is unfair about eliminating the possibility of exceptional success? My further hunch is that seeing someone as having the possibility of exceptional success is part of what it is to see them as an individual. Sure, we can accept that most people will be like most people. We can even be ok with wealthier schools, in the aggregate, consistently doing better on standardized tests. But we aren’t ok with removing the possibility for any individual to be an exception to the trend.

Why does seeing them as an individual require us to acknowledge the possibility the person be an exception? I think it has something to do with Kant’s insight that to see someone as an autonomous person is to see them as a first cause. Most things in nature are mediate causes. They are one step in a causal chain. Thus, the four-four domino might cause the next domino to fall over, but it was also caused to fall over by the domino that preceded it. The domino is a link in the chain, it connects to what comes after in exactly the same way that it connects to what came before.

Thus the domino makes a difference to what happens in the future—if you remove a domino from your Rube Goldberg machine that will change what chain of events unfolds—but it is not up to the domino what difference it ends up making. And of course I can think about humans the same way I think about dominos. I can see them as really really complex dominos; but Kant thinks that the moment you see a human that way you are no longer seeing them as a person.

This point is clearest if we note how it is impossible, while deliberating, to see yourself as merely a mediate cause. Suppose you are trying to decide whether to cross the street, you won’t answer that question by trying to assess what past causal influences bear on your future behavior. Instead you will try to figure out whether you should cross the street. You will identify reasons it would be good to do so and reasons it would be bad to do so. You can only deliberate about which of two choices to make if, from within the deliberative perspective, you see it as up to you what will happen.

In other words, when you try to figure out what to believe, you must see yourself as a ‘first cause’ of your beliefs. If you identify some cause that is likely to make you believe something (e.g. suppose you know someone has been trying to brain wash you) that does nothing to answer the question ‘what should I believe?’ Because it would involve seeing yourself as a mediate cause of your beliefs and not a first cause. Likewise, in trying to figure out what to do, you must see yourself as a ‘first cause’ of your actions. When the bully asks his victim ‘why are you hitting yourself?’ the victim could reasonably reply ‘I am only hitting myself as a mediate cause, thus I don’t have access to the reasons I am being hit; rather you, the first cause, are the one in the deliberative position to answer that ‘why-question.’ (Of course I do not recommend this reply, most middle school bullies are humeans about action and so will likely be infuriated by this deviation from their preferred philosophical orthodoxy.)

The deliberative standpoint is the standpoint we occupy as an agent. To see ourselves as agents, then, requires us to see ourselves as first causes. Likewise, to see other people as agents, or a persons, or as individuals, requires you to see them as first causes. To assume these students could not buck the previous trend, is to see them as mediate causes. The student in a domino in a chain stretching between the background school conditions and the future test score. The score the student would get is not ‘up to the student’ but is rather a predictable output of a prior chain.

Now of course, most people will not be exceptional, that is true by definition. But it still seems that if we want to see someone as a first cause, it must be up to each individual that they could be exceptional. We intuit something objectionable about the U.K.’s formula because it seems to wipe out that possibility.

When my students resisted my claim that Steve was likely a farmer, they did not resist the generalization itself. They agreed most farmers are men and most librarians are women. But they were uncomfortable moving from that general ratio to a probabilistic judgment about the particular person, Steve. They seemed to worry that applying the generalization to Steve precluded seeing Steve as an exception.

While I think the students were wrong to think the worry applied in this case — factoring in base-rates doesn’t prevent the exceptional from proving their uniqueness — they might be right that there is a tension between seeing someone within a statistical generalization and seeing someone as an individual. It’s a possibility I should have recognized, and a further way acting on even good statistical generalizations might sometimes be wrong.