Skip to main content

Reference Class: The Spectator Sees Clearly, You Are Not Special

·14 mins

A high school student grinds through problems until late at night, often moved to tears by his own hard work. His parents tell him he’ll succeed, and he is convinced that with such effort, he is bound for Tsinghua University.

A city woman firmly believes that as long as she doesn’t settle, she will eventually meet her “Great Hero” who arrives on colorful clouds to marry only her.

An entrepreneur has just conceived a product prototype and is already imagining a user growth curve that looks like a rocket.

These are all very endearing people. They possess a sincere heroism; they are the protagonists of their own lives, making the world more vibrant. However, they are all living in a dream.

As we’ve discussed, the first principle of this universe is narrative. Humans need narratives to give themselves meaning. It’s best for us to have a bit of passion, imagination, and even self-inspiration. But if you want to plan scientifically and make reliable predictions about the future, the most effective guide isn’t the narrative in your dream, but the experiences of similar people and events that have already happened.

The thinking tool for this session is called “Reference Class Forecasting (RCF).” Originally a forecasting algorithm, I see it more as an “anti-narcissism device.” If you think “I am special” or “this time will be different,” you had better use this tool first.

The Gap Between Dreams and Reality #

gap-dreams-reality

Regarding the gap between dreams and reality, there is a well-established law called the “Planning Fallacy.”

A study by psychologist Roger Buehler and others illustrates this [1]. A group of students was asked to estimate how many days they needed to complete their theses. On average, the students estimated 33.9 days. Buehler challenged them, saying they were too optimistic, and asked for a “worst-case scenario” estimate. This time, the average was 48.6 days.

Can you guess how many days they actually took? 55.5 days.

The Planning Fallacy states that when people make plans, they systematically underestimate time, costs, and risks while overestimating benefits. Reality is often more pessimistic than your most pessimistic estimate.

You might wonder if this only applies to ordinary people and if large-scale formal projects are estimated more accurately. They are not. The Danish economist Bent Flyvbjerg, an expert in mega-projects whom we mentioned earlier, studied 258 transportation infrastructure projects with his collaborators. They found that delays and cost overruns were universal—rail projects overran by an average of 45%, bridges and tunnels by 34%, and roads by 20% [2]. Flyvbjerg wrote a book specifically about the failure of mega-project planning [3], with the most famous example being the Sydney Opera House: originally budgeted at $7 million and planned for completion in 4 years, it ended up costing $102 million and taking 14 years to finish.

The Planning Fallacy is the modern version of “wishful thinking,” created by a series of illusion generators in the brain:

  • “Optimism Bias” tells you “it will go smoothly this time”
  • “Confirmation Bias” makes you collect only favorable evidence
  • The “Illusion of Control” makes you think the world is orderly and accidents are under control
  • “Self-Serving Attribution” lets you blame the last failure on the weather, teammates, or the client
  • “Survivor Bias” and “Availability Bias” ensure you only see the success stories and not the corpses submerged underwater… finally, the “Narrative Fallacy” is responsible for piecing these fragments into a heroic movie.

What you rehearse in your mind is a script, not the real world.

The reason you think in scripts is that you are using an “inside view.” Looking at yourself, you only focus on the details of this specific case, your efforts, your sincerity, and your resource allocation—thinking, “What could possibly go wrong?”

The spectator sees clearly while the player is blind. You need an “outside view.”

Inside vs. Outside View #

inside-outside-view

The “Outside View” and Reference Class Forecasting both originated with Daniel Kahneman [4], and were later popularized from psychology into engineering practice by Bent Flyvbjerg [5].

Simply put, the inside view looks at your “singular information,” whereas the outside view looks at “distributional information”: in the eyes of others, you are just a statistical data point subject to objective laws; you are not special.

Outside observers care about the distribution of similar events—your “Reference Class”—asking: What is the average? What is the median? What are the tail risks? What is the failure rate? They use this statistical data to infer how you will fare.

The inside view stares at the project at hand; the outside view looks at similar projects first. This is the modern cognitive science version of “The spectator sees clearly, while the player is blind.” The spectator isn’t necessarily smarter than you, but their advantage is that they don’t have your narcissistic complexes: they treat you as a sample, not the protagonist.

Here is the most classic story, from Kahneman himself, recorded in Thinking, Fast and Slow.

In the 1970s, Kahneman led a team to write a decision science textbook for high school students. At the team’s first meeting, Kahneman asked, “How long will it take us to finish this book?” Everyone was enthusiastic and swore: “Two years! Two years at most!” But Kahneman had a hunch and turned to an education expert on the team: “For teams you know that are similar to ours, how many years does it usually take to write such a book?”

The expert said about 7 to 10 years, and that 40% of the teams eventually gave up entirely.

Kahneman was shocked, but he thought his team was surely an exception and would be more efficient.

…As it turned out, Kahneman’s team took a full 8 years to produce the book. By then, the Ministry of Education had already canceled the curriculum requirement for the course.

You think you are different, but in fact, everyone thinks they are different. The reality is that, in a statistical sense, everyone is the same.

Three Steps to Reference Class Forecasting #

three-steps-rcf

Flyvbjerg standardized Reference Class Forecasting into three steps [6]:

  1. Step one: Find a “Reference Class” with a sufficient number of projects similar to yours—not the idols you admire, but a group of completed cases sufficiently similar to your own.
  2. Step two: Examine the distribution of this reference class to find its baseline, including average costs, time, failure rates, etc.
  3. Step three: Compare your own project and make fine adjustments based on objective conditions: Am I more like the median, or should I be more conservative?

Step three is the most challenging. In fact, the simplest method is to directly use the median of the reference class to predict your own outcome. Flyvbjerg repeatedly emphasizes that if you insist you are different from others, you must have very strong evidence—otherwise, you are just secretly reintroducing bias.

The underlying logic of some modern machine learning algorithms is Reference Class Forecasting. It all boils down to how large your “training set (reference class data)” is and how accurately your “similarity metric” is captured; essentially, it industrializes the act of “finding similar cases” [7].

To know how you will fare in a task, you should most importantly examine how people similar to you, doing similar things, ended up in the end.

Reference Class Forecasting (hereafter RCF) forces you to switch from “I am a story” to “I am a sample”—the story is responsible for explaining who you are, while the reference class is responsible for predicting how you will turn out.

Real-World Applications #

Every pretentious project gets a lesson from RCF. Let’s look at a few notable applications.

One is large-scale engineering and public investment, the main battlefield for RCF. Flyvbjerg’s conclusion after reviewing countless projects is: if you don’t use RCF for mandatory correction, your project will almost certainly overrun its schedule and budget. How do you correct? The Hong Kong government provides a model example.

In 2012, Hong Kong began introducing Reference Class Forecasting into roadwork assessments. They compared their 25 road projects with 863 similar international projects, looking first at how similar projects typically overran budgets and schedules in the past, and then correcting their own plans accordingly. They found that in the preparation and justification phase (Category C), because the plans were still rough and uncertainty was high, if you wanted to control risk to P80—meaning an 80% certainty of not exceeding the budget or schedule—then the initially reported figures could not be accepted at face value. Instead, an “uplift” was required: costs needed to be increased by 44%, and schedules by 75% [8].

The more a project is just starting, the more passionate everyone is and the more likely they are to make grand promises, and the less you should trust the inside view. The most expensive sentence in the engineering world is: “We can control it this time.”

Another is corporate mergers and acquisitions (M&A). Every CEO loves acquiring other companies because it expands their power—it’s like territorial expansion. CEOs claim mergers will bring synergy and that strategic complementarity will inevitably integrate the market… but as soon as you look at the reference class, you see that, on average, the performance of the acquiring company does not improve due to the merger and may even suffer a slight negative impact [9].

To this, CEOs will say: “Other mergers failed because their integration capabilities were poor; our company is different this time, our corporate culture is invincible!” Scholars have had to come up with more advanced methods, using machine learning to automate RCF [7] and improve prediction accuracy—finding a batch of historically highly similar mergers, laying out their outcomes, and then listening to the CEO explain why this one won’t crash.

Another is movie box office. If you are going to invest heavily in a blockbuster, you had better effectively predict whether it can be a hit. In the past, Hollywood’s success was a form of mysticism: of the movies released in the U.S. from 2008 to 2012, more than half were not profitable, while the top 10% of movies captured nearly 70% of the box office revenue… yet no one knew who could enter that top 10%.

Rather than talking about sentiment, look at RCF. In a study published in 2023 [10], researchers used a “Random Forest” model based on RCF with finer granularity to classify movies, increasing the accuracy of “cost recovery” predictions to 90%.

Then there is a current one: nuclear fusion. AI data centers are now overwhelming the U.S. power grid, and tech giants are considering using nuclear fusion for power. I used to research nuclear fusion, and I strongly disbelieve in the possibility of its commercialization within ten years, yet the giants are clearly much more optimistic.

OpenAI CEO Sam Altman himself invested in a nuclear fusion company called Helion. In 2021, Helion claimed it would generate electricity by 2024. However, by the time they signed a contract with Microsoft in 2023, the timeline for power delivery was pushed back to 2028. Helion only began constructing the site for Microsoft’s power supply in 2025. By 2026, Helion said its prototype had reached “new milestones”… I don’t know how far those milestones have progressed, but I do know the power generation schedule has been constantly retreating.

Since no nuclear fusion facility can yet generate electricity, we have no ready-made reference class. However, we can refer to projects of similar high complexity. Research [11] using similar high-difficulty projects estimates the fusion uplift to be about 118%; if you use international large-scale scientific research infrastructure as a reference class, the uplift could even reach 220%.

Simply put, if a nuclear fusion project asks for a certain amount of investment, you had better prepare more than three times that amount.

Even Elon Musk, who claims to think in “First Principles” all day, cannot escape the planning fallacy. His promised Tesla Full Self-Driving (FSD) has been said to be “achievable next year” almost every year, dragging on for nearly a decade. When founding SpaceX, Musk used an inside view for budgeting, believing $100 million was enough to launch three rockets and surely succeed—as it turned out, all three exploded, the company was on the brink of bankruptcy, and it only narrowly succeeded with the fourth launch, funded by scraping together everything left. Recently, Musk said xAI will soon significantly lead all AI companies. Do you believe him?

From Narrative to Sample #

When you start learning a new skill and think you can master it in a month and make money from it in two; when you prepare to move on the weekend and think you can finish packing in half a day; when you set a budget of 300,000 RMB and three months for a new home renovation, please think of these cases and Kahneman’s story.

A grand wish is not a prediction; a determination to work hard is not a prediction; and a schedule based on needs is certainly not a prediction. You had better think about when you gave up during your last three attempts to learn a new skill; look at how many hiccups typically occur when others move; and ask around the same neighborhood and floor plan how many days renovations are usually delayed and how much they go over budget.

Treating yourself as just a sample will make your plans much more accurate and give you more confidence.

Many job seekers, especially fresh graduates, have no concept of money and don’t know how much salary to ask for. Some tend to ask for too much, but many ask for too little. What you should consider is not how much money you need for monthly rent and living expenses, but the reference class. Do some research on major recruitment websites, ask senior alumni, and see how much someone with your education and skills is roughly worth in the corresponding city. Don’t forget, the recruiters have already studied people like you very thoroughly.

It’s like finding a partner. If you think, “Someone as outstanding as I am must be paired with someone like this or that,” such an internal narrative, the recommendations from a reliable matchmaking service will definitely disappoint you—because it uses RCF.

Conclusion: Use the Spectator’s Eye #

Of course, not everyone should live as a median. Many people are different from the majority in some respect—but if you find the right reference class, you will find that you are very similar to a significant number of people.

Everyone is special. But are your talent, resources, and methodology special enough to make you not belong to the current reference class? If that is the case, you still shouldn’t listen to the internal narrative—you belong to the next reference class.

Every scholar traveling to the capital for the imperial examination felt they would save China, intending to lead reforms and uphold justice. Little did they know that the examiner saw not passion, but your connections, lineage, age, and character—they knew all too well where people with an opening like yours eventually ended up…

For specific actions, you might use internal narratives to cheer yourself up; but for decision-making, you must use external reference classes to pour cold water on yourself.

Notes #

[1] Buehler, Roger, Dale Griffin, and Michael Ross. 1994. “Exploring the ‘Planning Fallacy’: Why People Underestimate Their Task Completion Times.” Journal of Personality and Social Psychology 67 (3): 366–81.

[2] Flyvbjerg, Bent, Mette K. Skamris Holm, and Søren L. Buhl. 2003. “How Common and How Large Are Cost Overruns in Transport Infrastructure Projects?” Transport Reviews 23 (1): 71–88.

[3] Flyvbjerg, Bent, and Dan Gardner. How Big Things Get Done: The Surprising Factors That Determine the Fate of Every Project, from Home Renovations to Space Exploration and Everything In Between. New York: Currency, 2023. Our column has interpreted this: “Elite Daily Lesson” Season 5, How to Get Big Things Done 1: Plan Slowly, Act Quickly.

[4] Kahneman, Daniel, and Dan Lovallo. 1993. “Timid Choices and Bold Forecasts: A Cognitive Perspective on Risk Taking.” Management Science 39 (1): 17–31.

[5] Flyvbjerg, Bent. 2008. “Curbing Optimism Bias and Strategic Misrepresentation in Planning: Reference Class Forecasting in Practice.” European Planning Studies 16 (1): 3–21.

[6] Bent Flyvbjerg. “From Nobel Prize to Project Management: Getting Risks Right.” Project Management Journal, 37(3): 5–15, 2006.

[7] Bi, Wenbin, and Qiusheng Zhang. 2021. “Forecasting Mergers and Acquisitions Failure Based on Partial-Sigmoid Neural Network and Feature Selection.” PLOS ONE 16 (11): e0259575.

[8] Flyvbjerg, Bent, Chi-keung Hon, and Wing Huen Fok. 2016. “Reference Class Forecasting for Hong Kong’s Major Roadworks Projects.” Proceedings of the Institution of Civil Engineers 169 (6): 17–24.

[9] King, David R., Dan R. Dalton, Catherine M. Daily, and Jeffrey G. Covin. 2004. “Meta-Analyses of Post-Acquisition Performance: Indications of Unidentified Moderators.” Strategic Management Journal 25 (2): 187–200.

[10] Einberg, Isak, and Arian Hanifi. 2023. Forecasting U.S. Movie Gross Revenues: A Random Forest Classifier Approach Based on Pre-production Data. Stockholm: KTH; de Souza, Thiago L. D., et al. 2023. “Revisiting Predictions of Movie Economic Success: Random Forest Applied to Profits.” PLOS ONE 18 (3).

[11] Brown, Chris, Hanni Lux, and James R. Cowan. 2024. “Reference Class Forecasting and Its Application to Fusion Power Plant Cost Estimates.” IEEE Transactions on Plasma Science 52 (9): 3628–33.