Leveling Up or Leveling Off? Understanding the Science Behind Skill Plateaus

If you’re going into surgery, you want the youngest operating surgeon available.

This is a slight exaggeration – you don’t want a doctor in their first year out of medical school.[1] After that, it’s less clear. One review found thirty-two studies indicating that the older a doctor was, the worse their medical outcomes; that review only found one study indicating that all outcomes got better with increasing age.[2] Other analyses suggest that middle-aged doctors might do better than younger doctors (though the effect is not statistically significant)[3], but older doctors are still clearly worse than middle-aged doctors.[4]

It’s not like doctors become terrible with more experience, but they are measurably worse. In one study, an extra twenty years of experience translated to about one additional elderly patient dying out of every hundred treated.[5] Why would twenty extra years of practice make a doctor worse at helping people?

Similarly, some research on famous painters, writers[6], and composers[7] found that it’s most common for these artists to produce their best work before the age of 45. A famous composer is more likely to have written their best piece of music in their 20s than their 40s, and they’re almost twice as likely to have written their best piece in their 30s instead of their 50s.[8]

Again, creatives are doing worse after a couple of extra decades of experience. This phenomenon is called skill plateaus. The idea is that performance stops improving after a relatively short time. After that, putting in thousands more hours on the job won’t reliably make you much better.

What’s going on here?

Deliberate practice                      

Anders Ericsson thought he had the answer. Ericsson was a psychologist famous for his research on expertise and human performance. He claims that most people practice a new skill until they can perform it adequately for their purposes.

For example, someone might practice tennis until they’re good enough to play with their friends. At that point, they stop practicing to just enjoy the casual games. Many people assume that they’ll continue to slowly get better just by playing games. However, that doesn’t seem to be the case. If they just play games using the skills they already know, they stagnate or even get worse over time.[9]

Ericsson sought to study what separated the truly outstanding top performers from those who were merely good. He concluded that the key factor was a particular approach to improvement, which he called “deliberate practice”.

Deliberate practice is not just any focused practice.

Deliberate practice is breaking your work down into individual steps, finding the optimal technique for each step, and practicing each technique until you execute it automatically. It requires clear, rapid feedback so that you know what you’re doing right and what you need to work on (this feedback usually comes from a coach).

Say our amateur tennis player wanted to continue improving. They might get a tennis lesson. The coach would identify what is holding the player back -- perhaps they hold the racket poorly. Here the coach breaks tennis down into individual subskills and identifies which are making their performance worse.

So, the coach demonstrates a better way to grip the racket and marks their racket at the correct place to grip. Here the coach gives the player a better technique and a training method to practice it.

The player practices gripping the racket correctly, takes one swing, and then stops to check their grip. Here the player repeatedly practices the skill so that they learn it in their muscle memory. They get feedback by checking whether their hands align with the coach’s marks on the racket after each swing. They might get more feedback if the coach corrects their grip as they practice.

After repeating the drill a few dozen times, they have mastered the better grip and are ready to move on to the next improvement.

These repeated small loops are good for improving your skill effectively, but don’t try them during a game! During the practice loops, you practice one tiny skill over and over until it’s automatic. Then during a real game, you put all of the tiny skill improvements together to play better. Experts have often practiced hundreds or thousands of these individual subskills to build up their expertise.

In Ericsson’s mind, ceasing to deliberately practice causes the performance plateau. People stop learning better techniques. Their performance slowly worsens over time as their autopilot skills atrophy.

Does his theory hold up?

Does deliberate practice explain which people are successful?

There’s a good amount of evidence that deliberate practice can work well – and a lot of caveats.

There are plenty of studies supporting deliberate practice. However, I focused on the studies trying to disprove deliberate practice to see what the counter arguments were.

The result was kind of funny. The authors of one such paper wrote a New York Times piece called “Sorry, Strivers: Talent Matters” attempting to debunk deliberate practice by highlighting their finding that working memory determined 7% of success in their study. Meanwhile, their study also found that deliberate practice determined 45% of success.[10]

Another study found that deliberate practice “only” determined one third of success[11], and a meta-analysis found that deliberate practice only predicted 11%.[12] Even the naysayers agree that deliberate practice is important; they just don’t believe that it entirely explains success.

The skeptics could have made a stronger argument by pointing at the limited domains in which deliberate practice is strongest.

In the above studies, the ones finding that deliberate practice accounts for 45% and one third of variance in performance were only looking at highly developed fields like music and chess. Highly developed fields have objective methods of evaluating performance (such as music competitions or chess ratings) and effective training methods, usually developed over decades or centuries. Meanwhile, the meta-analysis (which found that deliberate practice explained 11% of the variance in performance) looked at multiple fields.

In highly developed fields like music and chess, this meta-analysis also found that deliberate practice was important, predicting more than 20% of success. Yet, it found that deliberate practice explained only 1% of the variance for professional performances such as computer coding or selling insurance.

This feels a bit like the growth mindset debate. “Researcher becomes famous by discovering the secret of learning. Critics later claim that the effect is bogus. Fierce back and forth ensues.” Likely the result will be similar here: “Yes, the effect is probably real, just smaller than everyone initially assumed when reading glowing news articles.”

But why this discrepancy?

Why would deliberate practice work so well when learning chess, and so poorly in professional work?

Remember that deliberate practice is about finding better techniques.

If you have to invent these good techniques from scratch, this is really hard. If other people have put thousands of hours into identifying the best techniques for doing a task, and the best training methods for those techniques, then it is easy.

In one of the early deliberate practice studies, the study subject started out able to remember a string of seven numbers. Once he developed a system for grouping numbers, he could remember up to 18. But then he struggled to continue learning.

He had to develop a meta-system for remembering groups of numbers before he could break that plateau. It took him more than 200 hours of practice to develop the techniques to be able to remember nearly 80 digits.[13] That’s a lot of time to develop techniques from scratch for a relatively simple task.

Ericsson claims that running, swimming, piano, and countless other fields have far higher standards today than they did a hundred years ago – because we’ve learned better techniques and training methods.[14] It took Newton to invent calculus, but since then we’ve refined the field and discovered better teaching methods. Now it's a skill that can be learned by the average high school student.

Meanwhile, the meta-analysis lumped computer programming, military aircraft piloting, soccer refereeing, and insurance selling under “professional performances.” None of these fields have refined their optimal techniques or training methods as much as chess or piano has.

It seems possible that doctors and creatives stopped practicing because they couldn’t find new techniques to practice. Maybe we occasionally get a good, easy-to-use technique that lots of doctors adopt, like using checklists that reduce patient mortality by 23%.

But maybe most of the time, doctors aren’t sure what sub-skills would reduce mortality or improve their bedside manner. Furthermore, taking several hours each week to practice -- instead of producing immediate output -- is hard. So maybe the doctors don’t practice anything if they’re not certain what would make them better.

It’s not clear which elements of deliberate practice are essential for skill improvement. Doctors could find some bottlenecks, look up techniques, and practice it themselves. How would the results compare to deliberate practice? I’m not sure yet.

However, even in the contexts like professional sports and music -- where people are highly incentivized to deliberately practice and coached on well-refined techniques – deliberate practice explains less than half of the variance in success. That should be our upper bound on how useful deliberate practice is.

What else explains why doctors don’t get better?

I think the people who say  “Talent matters!” are pointing at a big chunk of the answer.[15] Fortunately, we can build a more detailed model of what it means to say someone is “talented”.

Psychiatrist Scott Alexander looked at med school data and found that medical students’ scores plateaued after about their third year. Students got steadily better in their first year. In year two, they learned a bit less but were still improving. But after year three, they barely increased their test scores. “Compare fourth-year and fifth-year surgeons, and it’s pretty close to 50-50 which of them will know more surgery.”

Scott says students in his program were still spending just as much time attempting to learn in their fourth and fifth years. They were far below the maximum score, and some students were doing better than other students. So why would their learning level off?

Because human minds can’t remember the entirety of medical knowledge.

A cognitive constraint is a limitation to how much we can learn that is imposed by the architecture and processing capabilities of the human brain.

One deliberate practice study found that working memory explained 7% of success at sight reading music. Working memory capacity is one cognitive constraint at the point of learning. Other constraints impact the ability to remember information over time or process new information efficiently.

These cognitive constraints differ from person to person, and can explain what we mean when we talk about “talent”. If someone is less limited by the constraints that bottleneck other people, they improve more quickly and seem naturally talented.

For example, people forget things over time unless they review the information at some gradually expanding intervals. (This is the widely accepted idea behind spaced repetition learning techniques.) Scott suggests that in order to remember a fact, people need to review that fact within different time intervals: some might need to review it after a week, others after a few months. This means that some people build up larger memory trees than others.[16] 

Scott writes: “Suppose that you forget any fact you haven’t reviewed in X amount of time (X might be shorter or longer depending on your intelligence/memory/talent). And suppose that an average doctor sees 5 diseases ~weekly, another 5 diseases ~monthly, and another 5 diseases ~yearly. A bad doctor might forget anything she sees less than once a week, a mediocre doctor might forget anything she sees less than once a month, and a great doctor might forget anything she sees less than once a year. So, the bad doctor will end up knowing about 5 diseases, the mediocre doctor 10, and the great doctor 15.”

I’m guessing these constraints explain a lot of the rest of skill plateaus. Individuals have a certain range of ability to improve. For example, doctors have some limit to how many medical facts they can memorize. If medical students are already hitting their limit after three years, no wonder doctors don’t keep improving for decades.

Forget the old myth that humans only use 10% of their brains. If doctors are constantly operating at their max cognitive capacity, they’re already at 100%. If this is the case, we should expect any small cognitive decline as they age to translate directly into decreased performance. If the doctor was operating at 100% of their ability and then that ability goes down by 1%, we should expect them to only operate at 99% of their former capacity.

Can we tease apart whether cognitive constraints or lack of deliberate practice contributes more to skill plateaus?

To attempt to disentangle which factors are contributing to skill plateaus, we can check if a skill known for deliberate practice follows the same skill plateau pattern as doctors and creatives. Doctors and creatives possibly get better until their thirties or early forties, then they get worse.[17]

If deliberate practice can solve skill plateaus, we should expect to see chess players continue to improve their skills later in life. They have more skills to keep practicing, so they can continue to improve over time.

However, if cognitive constraints are the main factor, we would expect to see chess players following the same pattern of decline as doctors and creatives.

Do chess players also get steadily worse after their mid 30s?

Mostly yes, but there’s a twist.

Studies on chess players found that their performance declined after the age of 35 or 40.[18] Younger players got sharply better, then their performance tapered off as they got older. Additionally, if you look at the top chess players, almost all are in their twenties and thirties.[19] So, chess players fit a similar pattern of decline to doctors and creatives.

However, chess players also demonstrated a cohort shift.

Over a hundred years, players got better. Subsequent generations made a higher percentage of optimal moves than the previous generation. This pattern has persisted across four generations.

Additionally, younger players experienced faster improvement earlier in life compared to previous generations.

My guess is that there are two things going on here.

First, across a field, practitioners on average decline after approximately the age of 40. Doctors, creatives, and chess players all seem to get worse at their craft after about that age.

This could be consistent with a story where “just doing the job” actually improves performance for the first few decades; and with a different story where they do deliberate practice but run out of obvious techniques after 20 years.

However, I’m guessing this pattern is better explained by cognitive decline. We found similar results in each field regardless of whether the field tends to do deliberate practice, which makes me feel that this pattern is less likely explainable by deliberate practice.

Similarly, if raw talent just improves skills for a while, that doesn’t explain why that pattern would reverse and people start getting worse. The theory of deliberate practice tried to explain this by saying that skills decay without deliberate practice, but the “just doing the job” case doesn’t have that excuse.

If people are already hitting their cognitive constraints given the current techniques, then performance will go down with even slight cognitive decline. If that decline is fairly stable across age, then we’d expect to see a pattern like this. (Note: all of these examples indicated fairly gradual performance declines, so we probably don’t need to panic until at least our sixties.)

Second, within a field, practice probably explains part of success, though it might depend a lot on the field. Deliberate practice probably doesn’t explain why people start getting worse in their 40s, but it might explain which already-talented people become the best.

The original deliberate practice research focused on what separated the very best from the merely decent within a highly specialized field. If you compare two highly skilled practitioners within a highly developed field, it seems plausible that deliberate practice determines up to half of which is better.

The finding that the learning curve for chess was shifting earlier and earlier introduces another possibility. Perhaps regardless of where the max performance tops out, deliberate practice speeds up the learning curve. A chess player or doctor will learn better skills more quickly, so they have more time to use their skills.

What do we actually do with this?

I set out to investigate whether deliberate practice or cognitive constraints could better explain skill plateaus. Now, I’m guessing the answer is mostly cognitive constraints, at least for the original definition of skill plateaus. 

However, if I ask “which is more important to focus on?”, then deliberate practice matters more. We can’t yet prevent cognitive decline, but we can do something about deliberate practice.

When done well, deliberate practice – i.e. better techniques and teaching methods – makes a big difference in continued improvement. For the medical students, some practice did matter. Students’ scores definitely went up for the first few years, and I doubt Frank Abagnale[20] learned as much pretending to be a doctor as med students do in school.

Furthermore, cognitive constraints and deliberate practice aren’t totally independent.

My guess is that cognitive limitations definitely matter, but they are more like a multiplier for the techniques you’re using. Without deliberate practice, everyone just does whatever random technique they stumbled upon when they first learned to do the thing. If the common techniques are cognitively demanding, then cognitive constraints will stop most people from becoming really good at the skill.

However, if you give them less cognitively demanding frameworks or tools, they will keep getting better. So we should be able to mitigate cognitive constraints with better techniques or better training methods.

So, as long as the natural ability to memorize symptoms determines doctor quality, then doctors will be limited by their cognitive constraints to a certain range of skill. If we start having doctors do spaced repetition or give them an AI tool that reminds the doctor of unusual options, that limit changes.

This would imply that we can raise the bar for entire fields, but we need specific skills to practice.

Maybe individuals should put more effort into deliberate practice. More importantly, societies need people researching and developing better techniques and training methods so that individuals can do deliberate practice.[21]



Footnotes:

[1] Association of Hospitalist Years of Experience With Mortality in the Hospitalized Medicare Population

Observed hospital mortality slightly improved after the first year (3.33% for patients cared for by first-year hospitalists vs 2.96% for second-year hospitalists), but didn’t change between the second year and subsequent years of experience.

[2] Systematic Review: The Relationship between Clinical Experience and Quality of Health Care

A systematic review of 62 studies found: “32 of the 62 (52%) evaluations reported decreasing performance with increasing years in practice for all outcomes assessed; 13 (21%) reported decreasing performance with increasing experience for some outcomes but no association for others; 2 (3%) reported that performance initially increased with increasing experience, peaked, and then decreased (concave relationship); 13 (21%) reported no association; 1 (2%) reported increasing performance with increasing years in practice for some outcomes but no association for others; and 1 (2%) reported increasing performance with increasing years in practice for all outcomes.”

[3] Association between surgeon age and postoperative complications/mortality: a systematic review and meta-analysis of cohort studies

In this meta-analysis of ten studies, older surgeons (above 50 or 60, depending on the study) had worse mortality outcomes than middle-aged (some subset of 40-60). It looks like young surgeons had slightly worse mortality than middle-aged (but not statistically significant).

The mortality in patients undergoing surgery by young surgeons was 1.02 compared to those by middle-aged surgeons. The mortality in patients undergoing surgery by old-aged surgeons was 1.14 compared to those by middle-aged surgeons.

[4] Physician age and outcomes in elderly patients in hospital in the US: observational study

This one is a single study rather than a meta-analysis or systematic review but has a more easily understandable finding: patient mortality gets slightly but steadily worse for each decade of physicians’ age. Patients’ adjusted 30-day mortality rates were 10.8% for physicians aged <40, 11.1% for physicians aged 40-49, 11.3% for physicians aged 50-59, and 12.1% for physicians aged ≥60.

[5] Physician age and outcomes in elderly patients in hospital in the US: observational study

This one is a single study rather than a meta-analysis or systematic review but has a more easily understandable finding: patient mortality gets slightly but steadily worse for each decade of physicians’ age. Patients’ adjusted 30-day mortality rates were 10.8% for physicians aged <40, 11.1% for physicians aged 40-49, 11.3% for physicians aged 50-59, and 12.1% for physicians aged ≥60.

[6] When Did Nobel Prize Laureates in Literature Make Their Best Work?

An analysis of 189 modern art painters’ highest-priced works found that the painters were on average aged 41.9 when they created their best works. Meanwhile, 89 Nobel Prize laureates in literature wrote their most important work at the average age of 44.7. The distribution of ages when they completed their best piece of literature looks roughly normal.

[7] When Did Classic Composers Make Their Best Work?

An analysis of the most popular works by the 100 most popular classic composers found that the average age of peak creativity was around 39. The distribution of ages when they completed their best piece looks roughly normal.

[8] When Did Classic Composers Make Their Best Work?

Link to table with histogram.

[9] Ericsson, Anders; Pool, Robert. Peak : Secrets from the New Science of Expertise (p. 10-13).

[10] Deliberate Practice Is Necessary but Not Sufficient to Explain Individual Differences in Piano Sight-Reading Skill: The Role of Working Memory Capacity

“It has even been suggested that deliberate practice is sufficient to account for expert performance. Less clear is whether basic abilities, such as working memory capacity (WMC), add to the prediction of expert performance, above and beyond deliberate practice.”

“For piano players, deliberate practice accounted for nearly half the variance (45.1%) in sight-reading performance. However, WMC accounted for a significant proportion of the variance (7.4%), above and beyond deliberate practice.”

[11] Deliberate practice: Is that all it takes to become an expert?

Deliberate practice accounted for about one-third of the reliable variance in performance in music and chess, leaving most of the variance explainable by other factors.

[12] Deliberate Practice and Performance in Music, Games, Sports, Education, and Professions: A Meta-Analysis

“More than 20 years ago, researchers proposed that individual differences in performance in such domains as music, sports, and games largely reflect individual differences in amount of deliberate practice, which was defined as engagement in structured activities created specifically to improve performance in a domain. This view is a frequent topic of popular science writing—but is it supported by empirical evidence? To answer this question, we conducted a meta-analysis covering all major domains in which deliberate practice has been investigated. We found that deliberate practice explained 26% of the variance in performance for games, 21% for music, 18% for sports, 4% for education, and less than 1% for professions. We conclude that deliberate practice is important, but not as important as has been argued.”

[13] Acquisition of a Memory Skill

“After more than 230 hours of practice in the laboratory, a subject was able to increase his memory span from 7 to 79 digits.”

[14] Ericsson, Anders; Pool, Robert. Peak (p. 6-8). HarperCollins. Kindle Edition.

[15] Another theory I saw was that people aren’t actually getting worse as they age, they just get busier and less motivated.

This could explain why creatives are more likely to produce their great works earlier. But doctors? Maybe I’m naive, but I don’t think doctors would get apathetic and start letting patients die if it was purely a matter of motivation. But if their cognitive capacity declines a bit? Sure, that makes sense.

[16] Skills Plateau Because Of Decay And Interference

I was originally inspired to look into skill plateaus because I wanted to understand how Scott’s memory constraints explanation and the deliberate practice explanation popularized in Cal Newport’s book “So Good They Can’t Ignore You” fit together. I ended up wanting to focus on the bigger picture of cognitive constraints and decline, but here’s the summary of Scott’s posts that I made along the way. It’s a good example of two cognitive constraints.

Scott Alexander writes a popular blog on medicine, neuroscience, rationality, and [insert topic of your choice]. He also daylights as a psychiatrist.

Scott proposes two cognitive constraints that together (he thinks) answer the dilemma of skill plateaus.

To understand these, it will be helpful to think of knowledge as trees of complex facts/concepts branching off from personal experience and things you learned. If I understand Scott’s idea correctly, the number of relevant branches you can remember determines how good you are at something. A doctor who can look at a list of ten symptoms and tell you only the most common possible diagnoses is a worse doctor than the one who can look at the list, tell you the most common diagnoses, and then also tell you all the uncommon ones if it turns out that the common ones are wrong.

Scott first proposes the Decay Hypothesis, which says that people forget things over time unless the idea is reviewed at some gradually expanding intervals. This is the widely accepted idea behind spaced repetition learning techniques . He suggests that people have different intervals over which they can review the fact, such that some people build up larger memory trees than others.

Secondly, Scott proposes the Interference Hypothesis, which says that memories that are too similar to each other are easier to “collapse” – e.g., it starts getting hard for our hypothetical doctor to recall which diseases this particular set of symptoms suggests, because a bunch of diseases have very similar sets of symptoms. Things that stand out in memory more vividly are easier to remember because they don’t get smushed into all the other similar memories.

Scott speculates that these constraints may explain overall skill plateaus, but he acknowledges that his evidence is mostly for more narrow learning, such as a constraint on how much you can learn per day (e.g. twenty words of Spanish vocabulary a day).

My guess is that Scott is pointing at two important ideas, but I’m not convinced they explain all or even a majority of the skill plateau that is due to cognitive constraints. I speculate that they are two among a large number of cognitive constraints.

For example, working memory seems like a cognitive constraint at the point of learning. That’s completely different from the interference or decay hypotheses, which impact the ability to remember information over time.

[17] The studies mentioned in the introduction indicate that the creatives probably got better for a bit before they got worse - e.g. composers were more likely to produce their most popular piece in their 30s rather than their 20s, then it sharply declines. The studies on doctors were ambiguous as to whether doctors in their 40s were better or worse than younger doctors, but clear that middle-aged doctors were better than older doctors.

[18] Life cycle patterns of cognitive performance over the long run

“This study presents evidence for the dynamics of life cycle patterns of cognitive performance over the past 125 y based on an analysis of data from professional chess tournaments. Individual move-by-move performance in more than 24,000 games is evaluated relative to an objective benchmark that is based on the respective optimal move suggested by a chess engine. This provides a precise and comparable measurement of individual performance for the same individual at different ages over long periods of time, exploiting the advantage of a strictly comparable task and a comparison with an identical performance benchmark. Repeated observations for the same individuals allow disentangling age patterns from idiosyncratic variation and analyzing how age patterns change over time and across birth cohorts. The findings document a hump-shaped performance profile over the life cycle and a long-run shift in the profile toward younger ages that is associated with cohort effects rather than period effects. This shift can be rationalized by greater experience, which is potentially a consequence of changes in education and training facilities related to digitization.”

[19] Ages of top 10 on the FIDE list in classical chess on May 25, 2023: 19, 27, 28, 29, 30, 30, 32, 32, 35, 53. Only two are not in their twenties or thirties: only one was above 35.

[20] An infamous conman who claims to have illegally practiced medicine as an untrained teenager.

[21] If you've made it through all the footnotes, I feel like you deserve the caveats that I'm told make rationalist writing clunky and terrible. This post is my current best guess for how to interpret a bunch of data points. As with all grand theories explaining complex concepts, there's a decent chance I'll change my mind with new information.

  • Skills peak and start declining somewhere between 30 and 50. – 75% confidence. This seems to be true for doctors and creatives. Furthermore, the pattern held when I checked if chess players also peaked around those ages, leading me to think the trend is likely to generalize.
  • Deliberate practice explains between 5% and 50% of the variance in success between people in well-developed fields. – 50% confidence. This range is my best guess from the deliberate practice studies, but I wouldn’t be surprised to find new evidence that shifts this range.
  • Additional high-quality research would make me conclude that Ericsson’s specific deliberate practice process is much better than other forms of practice. – 30% confidence. It’s not currently clear to me how necessary Ericsson’s exact process is to effective practice. It’s possible that other practice strategies, like learning by addressing bottlenecks as they arise in everyday work, might also lead to significant improvement. My current guess is that Ericsson’s process is good when possible, but that you can change many elements and still get meaningful skill improvement. Further experimentation is needed here.
  • Cognitive constraints determine a big chunk of the variance in success between people – 90% confidence. Though I don’t have good estimates on how much, since ‘cognitive constraints’ is such a wide category. I would be surprised if it was less than half.
  • Deliberate practice shifts the learning curve earlier, making people become better faster. – 50% confidence. This seems intuitively likely, but I only have one study supporting the idea.
  • Better techniques improve fields – 90% confidence. This seems intuitively and almost tautologically true. Consistent with this idea, the fields of sports, music, and chess all demonstrated improvements over decades. Things like The Toyota Way in car manufacturing seem to indicate that fields like business can have similar improvements via better techniques, so I expect this to apply to all fields, at least in theory.
  • Deliberate practice improves fields above and beyond better techniques – 60% confidence. It seems true that deliberate practice would help people learn about and master the better techniques, but I’m not sure if it’s always necessary. I want to do more research on this.

Thanks to Amber Ace for editing.