Retraining at Scale: Lessons from the IBM Watson Era

In 2022, IBM sold the data and analytics assets of Watson Health to the private equity firm Francisco Partners for approximately $1 billion, a fraction of the more than $5 billion IBM had invested in the division through acquisitions and development since 2015. The sale closed a chapter that began in February 2011, when IBM's Watson supercomputer defeated two champions on Jeopardy! in a nationally televised event that the company immediately leveraged into an ambitious claim: this technology would transform healthcare, finance, customer service, and eventually the broader labor market, by taking on cognitive work that had previously required years of human training and expertise.

The Watson era's relevance to the current generative AI moment is not that the technologies are similar, they are not, Watson was built on a fundamentally different architecture than the large language models driving today's transformation. The relevance is that the Watson era represents the most recent large-scale case study in what happens when a major technology company makes sweeping public claims about AI's capacity to transform work, builds workforce strategy and public expectations around those claims, and then confronts the gap between the claims and the technology's actual capabilities. That gap, and how IBM responded to it both for its customers and for its own workforce, offers concrete lessons for the retraining and workforce development challenge that generative AI now presents at a vastly larger scale.

The Promise: AI as a Replacement for Expert Judgment

Watson's entry into healthcare in 2011 was built on a specific and ambitious premise: that a system capable of parsing natural language questions and retrieving relevant information from a vast corpus of text, the capability that won Jeopardy!, could be extended to parse medical literature and patient data to assist, and eventually potentially replace, the diagnostic and treatment-planning judgment of oncologists. IBM partnered with MD Anderson Cancer Center and other leading institutions to develop Watson for Oncology, with the explicit goal of bringing expert-level cancer treatment recommendations to any physician, anywhere, regardless of their access to specialist colleagues.

The workforce implications of this premise, had it succeeded as initially framed, would have been significant. A tool that could provide oncologist-level treatment recommendations to general practitioners would have reduced the labor market premium on specialist training in oncology, at least for the diagnostic and treatment-planning components of that work, while simultaneously expanding access to that level of care in regions with specialist shortages. IBM's marketing in this period, described by industry analysts as having "led with marketing" ahead of what the technology could deliver, reinforced an expectation among healthcare institutions, policymakers, and the public that this transformation was imminent rather than aspirational.

The Reality Gap: What Watson Could and Could Not Do

The gap between Watson's marketed capabilities and its actual performance became visible gradually, through a series of specific institutional experiences rather than a single dramatic failure, and the pattern of that gradual revelation is itself instructive.

By 2018, more than a dozen of IBM's healthcare partners and clients had stopped or scaled back their oncology projects with Watson. The five-year partnership with MD Anderson, one of the highest-profile collaborations, ended with MD Anderson alleging that Watson had not provided safe and correct treatment recommendations during the engagement. Subsequent analysis identified a specific technical limitation that proved difficult to overcome: Watson's training relied substantially on synthetic and hypothetical patient cases developed in collaboration with a relatively small number of specialists, an approach that did not generalize well to the diversity of real-world patient presentations and treatment contexts that oncologists encounter across different healthcare systems, patient populations, and available treatments.

A 2026 analysis of the Watson Health failure identified what it called the "first major lesson" from the episode: the importance of starting small and iterating quickly rather than attempting to solve, as a first major application, one of the hardest problems in a domain. Watson's healthcare ambitions began with cancer treatment, arguably the most complex diagnostic and treatment-planning challenge in medicine, rather than with a narrower, lower-stakes application where the technology's actual capabilities and limitations could be established before broader deployment. The 2011 timeline, capitalizing on Jeopardy!-generated public goodwill while it remained fresh, created pressure to move quickly into high-visibility applications rather than methodically into well-suited ones.

What makes this lesson directly relevant to the current generative AI moment is that large language models, while substantially more capable and general-purpose than Watson's question-answering architecture, share the same fundamental characteristic that made Watson's healthcare ambitions difficult: they perform well on tasks that resemble their training data and degrade, sometimes unpredictably, on tasks that require judgment in genuinely novel situations. The difference in degree between Watson and current generative AI is enormous. The difference in kind, in terms of where the technology's limitations lie relative to its marketed capabilities, is smaller than the decade-plus gap between the two technologies might suggest.

What IBM Did for Its Own Workforce

While Watson Health's external story was one of overpromise and retrenchment, IBM's internal workforce response during the same period followed a different and, in retrospect, more durable trajectory, one that offers a more directly applicable model for the retraining challenge generative AI now presents.

IBM launched SkillsBuild, a free education program providing technology training to high school students, community college and university students, and adult learners, as part of a broader reskilling commitment that the company has stated exceeds $1 billion in investment. By the program's most recent reporting, IBM stated it had reached 16 million learners through SkillsBuild and related programs, with a specific 2023 commitment to train 2 million people globally in AI skills by the end of 2026, with particular focus on higher education institutions and underrepresented communities.

The most concrete and measurable finding from IBM's internal reskilling experience concerns the time required for retraining, and it documents a trend that should inform expectations about the generative AI transition. A 2019 IBM survey found that closing a skills gap through training required an average of 36 days, compared to just 3 days in 2014, a twelvefold increase over five years. The survey's explanation for this increase is significant: the skills now most in demand, including behavioral and soft skills like communication, collaboration, and adaptability, alongside technical skills, take substantially longer to develop than the more narrowly technical skills that dominated retraining needs five years earlier. Behavioral skills, the 2019 survey noted, are best developed through experience rather than structured learning programs like webinars or short courses, a finding with direct implications for how retraining programs should be designed and how long they should be expected to take.

This finding has only become more relevant with generative AI. A 2025 IBM Institute for Business Value survey found that CEOs estimate 31 percent of their workforce will require retraining or reskilling within three years due to AI and automation, down somewhat from earlier estimates but still representing a substantial share of the workforce on a relatively short timeline. If the skills most in demand continue to be the behavioral and judgment-oriented skills that take longest to develop, rather than the narrowly technical skills that can be addressed through shorter structured training, the gap between the retraining need CEOs are identifying and the retraining capacity that can be delivered on a three-year timeline may be larger than the headline percentage suggests.

The Half-Life Problem

A related finding from IBM's internal research deserves more attention in workforce policy discussions than it typically receives: the concept of skill "half-life," the time it takes for half of a given skill's value to become obsolete. IBM's research has placed the general half-life of skills at approximately five years, with more technical skills at closer to two and a half years.

The half-life framing matters because it reframes retraining from a one-time response to a specific disruption into a continuous process that workforce systems need to be designed around from the outset, a point that connects directly to PPV's analysis of lifelong learning infrastructure. If technical skills have a half-life of two and a half years, a worker who completes a technical retraining program today should expect that roughly half of what they learned will be substantially less valuable within roughly thirty months, not because the worker failed to learn it well, but because the underlying technology and its applications will have continued to evolve. A workforce development system designed around discrete retraining events, triggered by displacement and intended to produce durable new competencies, is working against the grain of how technical skills actually depreciate in a generative AI economy. A system designed around continuous, lower-intensity skill renewal, integrated into ongoing work rather than treated as a separate activity, is better aligned with the half-life reality IBM's research describes.

What the Watson Era Gets Wrong as an Analogy, and Why That Matters

It is worth being precise about where the Watson analogy breaks down, because overstating the similarity between the Watson era and the generative AI moment risks producing the wrong lessons.

Watson Health's failure was substantially a failure of a specific technical approach applied to a specific high-stakes domain, not a demonstration that AI broadly was overhyped relative to its eventual impact on work. In the years since Watson Health's sale, AI capabilities have advanced dramatically, and the 2025 Harvard Business School research on job postings found measurable shifts in employer demand, a 13 percent decline in postings for occupations with high shares of automatable tasks and a 20 percent increase in demand for occupations involving analytical, technical, or creative work that AI can augment, in the roughly two and a half years following ChatGPT's public release. This is a real and rapid labor market effect, not a repeat of Watson's gap between promise and delivery.

The lesson from the Watson era is therefore not "AI will underdeliver again." It is narrower and, in some ways, more useful: large technology companies have a demonstrated tendency to make workforce-relevant claims about AI capabilities that run ahead of the technology's actual, validated performance in specific applications, particularly in high-stakes domains, and workforce policy that is built on the marketed timeline rather than the validated timeline risks both over-preparing for transformations that arrive more slowly than promised and under-preparing for the transformations, like the labor market shifts the Harvard research documents, that are already underway and measurable.

For workforce development institutions, the Watson era's most transferable lesson may be the one IBM itself eventually internalized in its own workforce strategy, even as its external healthcare ambitions struggled: retraining investment that is structured around continuous renewal, addresses behavioral and judgment skills alongside technical skills, and is sized to the actual measured rate of skill depreciation rather than to the rate implied by the most dramatic claims about AI's transformative potential, is the model that has demonstrated durability. The technology driving the current transition is different from and more capable than Watson's. The workforce response that the most relevant available case study points toward has not fundamentally changed.

The Promise: AI as a Replacement for Expert Judgment

The Reality Gap: What Watson Could and Could Not Do

What IBM Did for Its Own Workforce

The Half-Life Problem

What the Watson Era Gets Wrong as an Analogy, and Why That Matters

Key Takeaways

The Workforce Intelligence Dispatch