Skip to main content

Selection Bias: Why Your Sample of Reality Isn't Real

·13 mins

If you’re getting older and spend time online, you may have felt a melancholy that culture and society are in decline compared to the past.

Nowadays the internet is flooded with clickbait garbage and petty domestic wisdom—people talk about nothing but hardship, dowries, and infidelity. Even mainstream media has forgotten how to speak proper Chinese. Occasionally you dig up old TV adaptations of Red Chamber Dream or Three Kingdoms, and you can’t help but think this generation of audiences has failed us, that Chinese cultural refinement is in serious collapse.

Looking offline too, many people pine for the good old days of the 1980s and 1990s. Back then, if you worked at a state-owned enterprise, your job was effortless and your life was rich with cultural activities and security. College graduates could find great jobs without effort, and people were truly cultured…

But here’s what I want to tell you: even if you actually lived through those supposedly glorious days, your nostalgia is an illusion.

In the early 1980s, only 20% of China’s population lived in cities. By 1990, only 3% of high school graduates entered universities…whereas today, the urbanization rate has reached 67% and university enrollment exceeds 60%. Those happy workers and proud college graduates you remember from memory were a tiny minority—the vast majority of Chinese lived in rural areas in extreme poverty. They weren’t consumers of culture and weren’t part of the narrative; they could barely even complain loudly enough for you to hear.

fig_old

The reason today’s culture seems so vulgar is that these previously silent people are now online. They’ve become the main audience and advertising targets; they’re the ones deciding what culture becomes.

Chinese culture didn’t degrade—it expanded.

Today those living in the top 20% of neighborhoods with education levels in the top 3%—the elite—may still be doing fine and have even more sophisticated tastes than before. But their cultural presence has been drowned out: there’s no point making TV shows specifically for such a small group.

What people actually miss isn’t the average of past China, but a heavily filtered version of China. Your impression isn’t determined by the whole sample, but by the “sampled-in” portion: back then, a tiny urban elite sample posed as representative of all China; now what influences you are more ordinary people, plus algorithmic amplification.

This mechanism is called selection bias. The real world contains countless conceptual traps. You must master this tool to make accurate judgments.

The Foundation of Distorted Reality #

Simply put, selection bias means the sample that enters your vision doesn’t equally represent the real world.

You think you’re observing the whole ocean, but you’re really just looking at the fish a fishing net catches—and the net’s mesh size and location were determined long before, deciding exactly what you’d see.

fig_perception

Or imagine standing at an emergency room entrance all day concluding “everyone in this city is bleeding”—everything you saw is real; nobody lied to you—but your worldview is false. I once saw a short video where a woman working at a paternity testing clinic said that based on her experience with so many cases, she believes men are the disadvantaged sex… She apparently didn’t realize that most couples never get paternity tests.

fig-four-biases

Academics have identified over a dozen types of selection bias. I’ll roughly categorize them into four types.

Type 1: Self-Selection Bias #

“Self-selection bias” is like throwing a party—everyone attending chose to attend.

The most typical phenomenon might be called the “social media paradox”: you feel like almost everyone on your feed is living better than you. Xiaoming went diving in the Maldives, Xiaoli just got a new car, Laowang’s daughter won first prize in a piano competition, while you’re eating takeout and filling out forms.

They weren’t lying. But people typically only share the good side of themselves—posting is an act of self-selection. That couple who just fought about their mortgage, that worker who just got chewed out by their boss, and you right now, won’t be posting.

type1

Your feed is a highlight reel of life, not a random slice of living. But if someone doesn’t understand this and compares their everyday reality to others’ promotional posters, they’ll fall into selection bias and feel inadequate. Research shows that the more frequently people use Facebook, the more they think others are happier and living better [1]; but if people limit social media use to under 30 minutes daily, they experience significantly less loneliness and depression [2].

Another example is online reviews. Why do many movies, products, and restaurants show what’s called a “J-shaped distribution”—with the main peak at five stars praising enthusiastically and a secondary peak at one star attacking viciously, with almost nothing in between? Because you only have motivation to open an app, log in, type, and post when something has stirred strong emotions in you. The “it’s okay” middling experience generates no motivation to speak.

More detailed research [3] suggests that behind reviews lie two layers of self-selection—in addition to the reporting bias of extreme voices, there’s a purchasing bias: people willing to buy are already more likely to be positive…

Platforms have racked their brains trying to make reviews more useful—some default to good reviews, allow reviews only with purchase, use point incentives, send review reminders—but bias cannot be eliminated. The fundamental problem isn’t technology, it’s motivation: not everyone wants to speak, and those who do aren’t average people.

When you open international news, the world seems chaotic; when you open social news feeds, people’s quality seems to be declining; you think certain countries are extremely dangerous, that people in certain places are all bad… all this is self-selection bias: only the rarest, most terrible events deserve news coverage; peaceful daily life never trends.

Self-selection bias keeps people mentally surrounded by extreme samples, easily slipping into unnecessary anxiety, depression, and psychological imbalance.

Remember: social media isn’t life, comments aren’t public opinion, trending topics aren’t humanity.

Type 2: Survivorship Bias #

“Survivorship bias” asks: if self-selection bias is “only interested people come,” what happens when people who didn’t come couldn’t come, not because they weren’t interested?

Survivorship bias might be the deadliest cognitive trap in the business world, because it helps you draw the wrong lessons about success.

type2

If you only read biographies of tech titans, you’ll conclude that success requires extreme passion, willingness to take risks, breaking rules, even obsession, “going all in” on new opportunities—especially starting young: Bill Gates, Steve Jobs, Mark Zuckerberg, Sam Altman all dropped out of college… So you conclude: success = dropping out + going all in.

But you don’t know how many other people who were just as young and went just as all-in failed. Those people went bankrupt, lost their jobs, went home—they never entered your field of vision.

If you included failure data in the statistics, you’d find that most startups disappear halfway through; the average age of founders at fastest-growing companies isn’t in their twenties—it’s 45: they succeeded not because they rebelled, but precisely because they accumulated industry experience [4].

You can’t hear about a few people who got rich from lottery tickets and conclude that playing the lottery is an ideal wealth strategy. Success mythology often mistakes a survivor’s luck for causation.

Investment funds are another hard-hit area. When you want to invest, the fund company representative opens a chart and says “look at our beautiful historical returns.” She probably isn’t lying to you. But she can’t prove whether the fund’s performance is due to company capability or just lucky bets on that particular fund—because she’s not showing you the failed data.

In reality, those funds with terrible performance that were liquidated, merged, or went under never got the chance to appear before you. Finance researchers have long studied this phenomenon: if you only look at surviving fund samples, the company’s true performance will definitely be systematically overestimated [5].

You see the champion display, but not the graveyard behind it.

Some people say a folk remedy can cure serious illness, or a fitness method or diet is magical—it’s just that people who felt better are more likely to tell the story; those whom it didn’t work for already left the stage.

Some look at office workers getting depressed and unhealthy, then notice factory workers always seem robust and sturdy, and conclude that physical labor benefits health and mental wellness—actually, the truth is that unhealthy people simply couldn’t withstand factory heavy labor; they were filtered out. Epidemiology even has a term for this: the “Healthy Worker Survivor Effect” [6].

Nostalgia too carries a strong component of survivorship bias [7]. Human memory has this feature: over the long term, what we remember are often just high moments. We forget the confusion and trouble.

Some miss past works of art, some miss old community relationships, some miss past politics. But if you could actually visit the past, you’d find those people had the same worries as us—they were missing even earlier times. You were very young back then, and youth always comes with beautiful memories; your past troubles are all solved, so you don’t think of them as troubles anymore, while your current troubles haven’t been solved yet.

The era didn’t regress. Time just ran a filter on the past.

Type 3: Sorting Bias #

“Sorting bias” (or selection bias) occurs at the very beginning of distribution: groups were never randomly assigned.

Here’s a counter-intuitive example that strikes at the heart of middle-class nerves: the prestige school effect.

type3

You mortgage everything to buy a house in a good school district, sacrifice all your free time taking kids to activities, all to get into a top middle school and good university, ideally an Ivy League school. You believe elite schools represent the best education and can develop children into superior talents… but this is a deeply unscientific illusion.

Children aren’t blindly randomly assigned to different schools. They’re admitted through strict selection. The children who were selected are already smart, hardworking, and from good families—they were already more likely to be talented people—so did the elite school develop them, or just ride their coattails?

American economists Stacy Dale and Alan Krueger conducted a series of studies [8]. Rather than simply comparing “elite school graduates” with “ordinary school graduates,” they specifically looked at students who were roughly equivalent, both capable of attending elite schools, but some went and others, for various reasons, didn’t and chose ordinary schools instead. The result: elite schools didn’t give graduates notably higher earnings—especially looking at long-term returns, the school effect shrunk to nearly zero.

Dale and Krueger’s research found that elite school credentials only showed obvious benefits for students from disadvantaged backgrounds—like minorities whose parents had very little education. Possibly because they needed the school’s social networks more.

In short, elite schools are more like filters than alchemical furnaces.

By the same logic, you can’t see some children improve their grades with tutoring and conclude that tutoring improves grades; you can’t see children attending various extracurriculars from childhood grow into successful adults and conclude that enrichment education leads to success. It may simply be that kids who love learning love tutoring, wealthy families can afford fencing and horseback riding, and their success is merely a spillover of personal and family capability.

Type 4: Threshold Bias (Berkson’s Paradox) #

“Threshold bias,” also called “Berkson’s Paradox,” highlights a curious phenomenon: you only see samples that crossed a certain threshold [9].

Some folk wisdom says:

  • Handsome men are all cads.
  • Beautiful women have terrible tempers.
  • Those with strong business skills have low emotional intelligence.
  • Athletic students do poorly in academics; top academic students do poorly in sports.

Does God insist on fairness—give someone one advantage and take away another? Actually these are all illusions.

type4

Two variables that might be completely unrelated originally—if you insist on combining them, set a total threshold, then only look at samples that crossed it, you’ll find the two variables appear negatively correlated [10].

Take the marriage market: suppose the two main things women value in men are looks and character, ideally having both, but absolutely not being both ugly and a cad. You silently set the formula: male attractiveness = looks + character, and calculate total scores for everyone.

Even if a man wants to be your backup option, his total score must cross a threshold. As you can imagine, above the threshold, those scoring high in both are a rare minority; most people must score high in one and low in the other—so you conclude that looks and character are mutually exclusive.

The truth is simply that men who are both handsome and faithful were already rare to begin with, and they were snatched up early, while men who are both ugly and cads never entered your sample at all.

By the same logic, why do those with strong business skills have low emotional intelligence? Because companies hire people whose “business skills + emotional intelligence” exceeds a certain threshold. Those strong in both are a tiny minority, those weak in both can’t get hired, leaving you to see those who are strong in one and weak in the other.

Speed dating isn’t a population census, hiring isn’t social sampling; thresholds create false oppositions, sieves forge spurious patterns.

Moving Beyond Biased Sampling #

Selection bias occurs because of flawed samples. But sometimes even with good samples, your eyes can malfunction. You might only notice extreme cases while overlooking mundane things; you might only see evidence supporting your views—phenomena called “availability bias” and “confirmation bias,” which we needn’t elaborate on here.

Gaining any real knowledge about the actual world is extraordinarily difficult.

Scientists go to great lengths to combat selection bias. The ideal method is random assignment of people for experimentation. But if you can’t experiment and must passively analyze existing data, you must ensure your sample is clean—most importantly, you must track those who didn’t appear. They might have exited the game through failure, but their participation still matters: they contribute knowledge.

When you hear any story, you can always ask four critical questions first: Who didn’t come? Who couldn’t come? Was the grouping random? Is there a threshold here?

If you hear lots of bad news, remind yourself that the real world might be a little better than you think. About ordinary people’s daily experiences, remind yourself how easily people fall into bias.

beyond-bias

As the saying goes:

Lights make hurried people in the square,
Waterfalls show rushing torrents everywhere.
To know the world’s true face, you must see
Those silent ones who left the reverie.

References #

[1] Chou, Hui-Tzu Grace, and Nicholas Edge. “They Are Happier and Having Better Lives Than I Am.” Cyberpsychology, Behavior, and Social Networking 15, no. 2 (2012): 117–121.

[2] Hunt, Melissa G., et al. “No More FOMO: Limiting Social Media Decreases Loneliness and Depression.” Journal of Social and Clinical Psychology 37, no. 10 (2018): 751–768.

[3] Hu, Nan, Paul A. Pavlou, and Jie Zhang. “Why Do Online Product Reviews Have a J-Shaped Distribution? Overcoming Biases in Online Word-of-Mouth Communication.” 2009.

[4] Azoulay, Pierre, Benjamin F. Jones, J. Daniel Kim, and Javier Miranda. 2020. “Age and High-Growth Entrepreneurship.” American Economic Review: Insights 2 (1): 65–82.

[5] Elton, Edwin J., Martin J. Gruber, and Christopher R. Blake. 1996. “Survivorship Bias and Mutual Fund Performance.” Review of Financial Studies 9 (4): 1097–1120; Carhart, Mark M., Jennifer N. Carpenter, Anthony W. Lynch, and David K. Musto. 2002. “Mutual Fund Survivorship.” Review of Financial Studies 15 (5): 1439–1463.

[6] Arrighi, H. Michael, and Irva Hertz-Picciotto. 1994. “The Evolving Concept of the Healthy Worker Survivor Effect.” Epidemiology 5 (2): 189–196.

[7] Elite Daily, Season 5, “What Nostalgia Longs For”; Johan Norberg, Open: The Story of Human Progress, Atlantic Books, 2020.

[8] Dale, Stacy Berg, and Alan B. Krueger. “Estimating the payoff to attending a more selective college: An application of selection on observables and unobservables.” The Quarterly Journal of Economics 117.4 (2002): 1491-1527.

[9] Elite Daily, Season 4, “Berkson’s Paradox”

[10] Westreich, Daniel. “Berkson’s Bias, Selection Bias, and Missing Data.” Epidemiology 23, no. 1 (2012): 159–164