Cognitive Load Theory: Why "Bad Students Have Many Stationery" is a Scientific Truth

Table of Contents

The Laptop Parable: Hardware vs. Neural Processing #

Imagine you are the Minister of Education in a developing country with a sudden budget surplus. You decide to use this money to improve the learning levels of children in remote areas. What should you do?

The Peruvian government’s approach was to buy computers. They implemented the “One Laptop per Child” (OLPC) project in rural areas. Computers are the standard for productivity in the modern world; poor children can’t afford them, so the government gives them for free. Isn’t that great?

But some scholars don’t see it that way. Computers are good, but do they actually help with learning?

Peru’s project was part of a large-scale educational experiment. Between 2006 and 2012, nearly 10 million laptops were distributed across 20 countries in Latin America and the Caribbean. Economists and educators conducted large-scale randomized controlled trials for over a decade.

In 2025, the results were revealed [1]: whether in the short or long term, giving laptops had no significant positive impact on children’s academic performance. There was no improvement in math or reading scores—in fact, the rate of students moving up to the next grade on time decreased by one percentage point on average.

The only skill that improved in this study was the skill of operating a computer.

I think this is a modern parable. Buying computers is very similar to how schools and parents understand education: you think learning is a macro problem, or even a hardware problem. Child not doing well? Spend money! Tutoring classes, better schools, upgraded learning environments, better equipment, more stationery… once money is spent, I feel at ease.

But learning is actually a micro problem.

Learning is a process of neural remodeling. You must consider the problem at the level of neural processing: when a child sits at a desk, how does that thing called “knowledge” actually enter their brain?

What is Cognitive Load Theory? #

As the first lecture in the “Education and Learning” section, let’s talk about the most hardcore and engineering-aesthetic mental model in modern educational psychology: Cognitive Load Theory (CLT).

I bet most teachers, including some self-proclaimed education experts, have never heard of CLT. But it is the single most important theory that teachers and parents need to understand.

Once you understand this theory, you’ll know why most educational practices are just messing around.

✵

Cognitive Load Theory was first proposed by Australian educational psychologist John Sweller in the 1980s [2]. It has since been refined by Sweller and many other scholars and applied in various fields.

Memory Architecture: RAM vs. Hard Drive #

Sweller’s question was: if someone can’t seem to learn, and we say their “brain isn’t enough,” is it because their hard drive is bad, or their RAM is insufficient?

First, there can’t be a problem with the hard drive. The capacity of human “long-term memory” is almost infinite. No matter how many books you read, words you memorize, skills you master, or movies you watch, you don’t need to worry about the hard drive being full.

The bottleneck of learning new things is mainly in the RAM. That RAM is the brain’s “working memory,” and its bandwidth is extremely limited. It’s generally believed that an average person can only handle 4 to 7 elements of information simultaneously. If you don’t write this information into the hard drive in time, it will be forgotten.

The essence of learning is processing scattered external information within the narrow working memory and then packaging it into long-term memory.

The Three Types of Cognitive Load #

The core insight of CLT is: learning failure isn’t because the brain’s hard drive can’t store it, but because new information gets stuck when passing through the narrow gate of working memory.

The total pressure on working memory when information pours into the brain is called “Cognitive Load.” It is generally divided into three types:

Intrinsic Load: The inherent complexity of the learning material, i.e., how many elements must be processed simultaneously.
Extrinsic Load: Irrelevant information that forces its way in, creating a load on the brain. This could be environmental noise, irrelevant content, or confusion caused by a teacher’s poor explanation.
Germane Load: The effective effort the brain makes to convert knowledge into long-term memory—the energy spent suppressing extrinsic load while operating on intrinsic load.

Modern understanding, including Sweller’s 2023 paper [3], suggests there’s no need to count “germane load” as a separate category. Simply put, to make learning effective, we must minimize extrinsic load and save bandwidth for intrinsic load.

Now you understand why buying laptops for kids doesn’t improve grades: computers are a distraction that increases “extrinsic load.”

So why do some children learn very fast while others can’t? Is it because their working memory size is different? Not really. The difference in working memory among people is not that large. CLT has another crucial concept: the “Schema.”

Schemas: The Brain’s Compressed Files #

This important insight is: “knowledge” stored in the brain’s long-term memory isn’t scattered fragments of information, but something called a “schema” [4].

A schema is a cognitive structure that packages multiple related elements into a single unit. You can think of it as a “knowledge chunk,” a “mental macro,” or a “compressed file.”

For example, to a child who just started learning to read, each character in “United States of America” is an independent element. If all these characters appear at once, they immediately fill up their working memory bandwidth. But for you, these words have long been packaged into a single schema. It only takes up one “slot” in your memory.

Why can’t “bad students” learn while “top students” understand at a glance? Because bad students don’t have enough schemas in their heads; they see a lot of fragmented information, leading to high cognitive load. Top students, having relevant schemas, see information in large blocks, processing fewer elements and avoiding cognitive overload.

For instance, if a teacher writes an equation on the board:

3(x+2) = 15

A top student already has schemas for “distributive law” and “transposition.” They automatically call those schemas to solve it and might even find it boring.

But a bad student lacks those schemas. They see individual elements. These elements instantly fill their working memory: What does the parentheses mean? Do I multiply the 3? Or subtract the 2 first? Their intrinsic load is already overloaded.

In the view of CLT, how difficult a material is depends not on how much information it objectively contains, but on how much information it counts as in your head: if you have relevant schemas, the information volume is small; if not, it’s large, leading to overload.

In other words, top students learn faster not because their brains are faster or bigger, but because they have more “compressed files” in their brains.

This is the “compound interest effect” of learning: the more schemas you know, the faster you learn new things, just as the more books you’ve read, the faster you read.

Therefore, learning is not about moving knowledge into the brain one by one, but about the process of the brain compressing and calling schemas.

The “Stationery Trap”: Why Bad Students Struggle #

Why is there a folk saying “Bad students have many stationery”?

This student already lacks schemas, so their intrinsic load is overloaded -> the teacher explains unclearly, even adds unnecessary “flair” like drawing cute characters on the board with colorful chalk (extrinsic load) -> the father sees the child still doesn’t understand and starts shouting (more extrinsic load) -> the mother says the child needs help and buys a bunch of fancy highlighters, mistake notebooks, and post-it notes -> these stationery provide a false sense of control, and the child uses three colors to highlight the textbook as if that would make the knowledge enter their brain…

How can one learn like this?

One could say that because there are too many interference factors, the “bad student” is a “bad student.”

True learning requires a clean information environment—effective teaching should eliminate all extrinsic load and use the simplest explanations to reduce intrinsic load as much as possible, offloading working memory to pave the way for long-term memory.

From this perspective, many classrooms are not teaching; they are performing “performance art” to provide emotional value for parents.

Based on my research, I’ve distilled four mental methods for effective teaching from Cognitive Load Theory.

Method 1: Managing Intrinsic Load #

The first method is Managing Intrinsic Load.

When facing “bad students,” a teacher’s instinct is to reduce difficulty—but it’s not a difficulty problem; it’s a bandwidth problem. Effective teaching shouldn’t be about leveling the mountain for the child, but about building a ladder for the child to climb, carefully arranging the pace at which schemas enter the brain.

If a student doesn’t have enough relevant schemas, don’t give them complex comprehensive problems right away. You must break down complex knowledge into small schemas and have them master those first.

Content should be segmented, prerequisite knowledge should be pre-heated, components should be automated first, local structures should be given before integration [5]. Let the student stand firmly on one step before showing them the next level.

People often say the problem for bad students lies in their “basic skills,” which actually means their previous schemas were not firmly established. Schemas are the root.

Effective teaching is not about reducing difficulty, but about tactical arrangement.

Method 2: The Power of Direct Instruction #

The second method is Direct Instruction.

In recent decades, a romantic “discovery-based learning” has been popular in education, advocating that teachers shouldn’t give answers directly but let children explore and discover rules in real situations. However, a large amount of research has repeatedly proven that this method of “treating children like scientists” is extremely inefficient [6].

Because the cognitive load is too high.

If you give a novice a problem to explore on their own, their brain has to do many things at once: focus on the problem, guess the goal, search for steps, trial and error, compare current state with goal state, and try to figure out what the teacher actually wants them to learn. Their RAM is consumed by searching for paths; how can they have spare capacity to summarize rules?

For the knowledge taught in schools, the most effective teaching method is direct instruction with explicit guidance.

Research proves the most efficient method is Worked-example learning [7]: on the left side of the board is an example with complete steps provided by the teacher; on the right is an exercise with the same structure but different numbers. Let the student follow the example to do the exercise. It’s that simple and “brute-force.” This is the fastest way to write schemas into the brain.

Once the student has mastered the schema, you then give them flexibility and gradually move towards independence. This process is called “guidance fading.”

It’s not that children aren’t allowed to explore, but teaching is teaching. Lower the search cost first to raise the understanding difficulty later.

Method 3: Eliminating Extrinsic Load #

The third method is Eliminating Extrinsic Load.

People often say poor learning is due to lack of concentration, but you must create conditions for concentration. Chatting and playing on the computer are distractions, but parents buying a lot of stationery, or constantly bringing food and checking in, also creates extrinsic load.

In the view of CLT, even the dramatic multimedia teaching techniques teachers use in class are often ineffective and harmful. Slides with too many decorations, animations flying around, diagrams on the left with explanations on the right, teachers speaking one sentence while the screen displays the exact same text, videos that can’t be paused, and assignment platforms with more buttons than questions—these are not educational innovations; they are cognitive pollution [8].

Isn’t the learning content itself enough for students to look at?

Don’t force students’ eyes to look left and then right. Combining an image with an oral explanation works better than an image with a screen full of text.

Teachers playing “tricks” can sometimes spark interest, but tricks are not teaching.

Method 4: Differentiated Teaching #

The fourth method is Differentiated Treatment for Top and Bad Students.

This isn’t about labeling people, let alone saying bad students’ brains are inferior—bad students can become top students in the future, but right now, you lack sufficient schemas. A 2025 meta-analysis [9] concluded clearly: learners with low prior knowledge benefit more from high-support instruction, while those with high prior knowledge benefit more from low-support instruction.

For bad students, teachers must provide more explicit hints, more worked examples, more practice of the same type, control the pace, skip fewer steps, be less open, and allow less independent searching.

For top students, who already have many schemas, you must satisfy their intrinsic load, or they will feel bored. For top students, use less repetition, faster fading, more varied problems, and more transfer; give them space for exploration.

In this sense, having dozens of people sitting together listening to a teacher is an extremely inefficient way of education. Education should be personalized. What can we do?

You can use AI.

The Role of AI: Tutor vs. Doer #

The bad news is that several studies have found that AI is hurting students’ learning abilities. Many students use ChatGPT as a “Doer” for homework, and as a result, their scores in surprise tests drop significantly [10].

This is because students outsource the process of constructing schemas—which should be “germane load”—directly to AI. The brain doesn’t experience the load, neurons don’t connect, and schemas don’t form.

The good news is that if you strictly limit AI’s role to a Tutor—where it’s responsible for breaking down complex knowledge into small schemas, eliminating irrelevant extrinsic load, and guiding you step-by-step instead of giving answers directly—the learning effect of using AI is higher than traditional classroom learning [11].

Conclusion: Programming the Brain #

Many people think of learning as an ascetic practice. But learning is actually an engineering problem. Teaching is programming the brain.

Cognitive Load Theory tells us that the human brain is an extremely limited biological device. You cannot force-feed it; you must respect its input bandwidth and, step by step at a certain pace, help it turn knowledge into schemas.

Even if brain-computer interfaces become advanced in the future, I don’t quite believe people can directly “download” a skill, because neurons are made of flesh. This is the most important hard constraint of learning and education.

Notes #

[1] Cueto, Santiago, Diether W. Beuermann, Julian P. Cristia, Ofer Malamud, and Francisco Pardo. 2025. “Laptops in the Long Run: Evidence from the One Laptop per Child Program in Rural Peru.” NBER Working Paper 34495. doi:10.3386/w34495.

[2] Sweller, John. 1988. “Cognitive Load During Problem Solving: Effects on Learning.” Cognitive Science 12 (2): 257–285.

[3] Sweller, John. 2023. “The Development of Cognitive Load Theory: Replication Crises and Incorporation of Other Theories Can Lead to Theory Expansion.” Educational Psychology Review 35 (4): 95.

[4] Sweller, John. 2024. “Cognitive Load Theory and Individual Differences.” Learning and Individual Differences 110: 102423.

[5] Sweller, John. 1994. “Cognitive Load Theory, Learning Difficulty, and Instructional Design.” Learning and Instruction 4 (4): 295–312.

[6] Kirschner, Paul A., John Sweller, and Richard E. Clark. 2006. “Why Minimal Guidance During Instruction Does Not Work.” Educational Psychologist 41 (2): 75–86.

[7] Barbieri, C. A., et al. 2023. “A Meta-analysis of the Worked Examples Effect on Mathematics Performance.” Educational Psychology Review 35: Article 11.

[8] Schroeder, N. L., and A. T. Cenkci. 2018. “Spatial Contiguity and Spatial Split-Attention Effects in Multimedia Learning Environments: A Meta-Analysis.” Educational Psychology Review 30: 679–701; Ginns, Paul. 2005. “Meta-analysis of the Modality Effect.” Learning and Instruction 15 (4): 313–331.

[9] Tetzlaff, Leonard, et al. 2025. “A Cornerstone of Adaptivity: A Meta-analysis of the Expertise Reversal Effect.” Learning and Instruction 98: 102142.

[10] Barcaui, A. 2025. “ChatGPT as a Cognitive Crutch: Evidence from a Randomized Controlled Trial on Knowledge Retention.” Social Sciences & Humanities Open 12: 102287.

[11] Kestin, Greg, Kelly Miller, Anna Klales, Timothy Milbourne, and Gregorio Ponti. 2025. “AI Tutoring Outperforms In-Class Active Learning.” Scientific Reports 15: 17458.