← Back to Insights

tech-ai

AI and Scientific Discovery: How Foundation Models Are Reshaping the Research Frontier

By Moussa Rahmouni7 June 202637 min read

In October 2023, a paper published in Nature announced that AlphaFold 2, DeepMind's protein structure prediction system, had been used to generate structural predictions for virtually every known protein in the UniProt database—approximately 200 million structures, more than had been experimentally determined in the entire prior history of structural biology. The announcement was received with a mixture of awe and confusion. Awe, because the scale of the achievement was genuinely unprecedented. Confusion, because it was not entirely clear what it meant: had science been accelerated, transformed, or replaced? Were researchers equipped to use this deluge of structural data, and to what problems could it most usefully be applied?

These questions—which have since been posed about AI's role in chemistry, materials science, genomics, climate modeling, mathematics, and an expanding range of other disciplines—go to the heart of one of the most significant transformations in the history of science. Foundation models, large-scale generative AI, and adjacent machine learning systems are not merely tools for doing existing science faster. They represent a potential restructuring of the epistemic processes through which scientific knowledge is generated: how hypotheses are formed, how experimental space is explored, how data is interpreted, and how findings are integrated across disciplines. Understanding this transformation—its genuine promise, its real limitations, and its strategic implications for research institutions, funders, and policymakers—is one of the most important intellectual tasks of the current decade.

The Structure of Scientific Discovery, and What AI Changes

Scientific discovery is not a uniform process. It varies enormously across disciplines, and the degree to which AI can accelerate or transform it depends critically on the specific epistemic structure of each domain.

The classic philosophical account distinguishes between the context of discovery—how hypotheses are generated—and the context of justification—how hypotheses are tested and confirmed. AI has historically been more useful in the context of discovery: it can identify patterns in large datasets, generate plausible candidate structures or sequences, and propose connections across bodies of literature that human researchers might miss. Its usefulness in the context of justification—in the design and interpretation of experiments that confirm or refute claims about the world—has been more limited, though this is changing rapidly in certain domains.

A more practically useful taxonomy for thinking about AI's role in science distinguishes three types of scientific work, each of which presents a different profile of opportunity and limitation:

Data-intensive pattern recognition — domains where the primary bottleneck to discovery is the ability to extract patterns from large, high-dimensional datasets. Genomics, structural biology, materials science, and astrophysics fit this description. In these domains, AI systems have already demonstrated capabilities that substantially exceed human performance on specific benchmark tasks. The transformation is most visible and most complete here.

Hypothesis-generation and experimental design — the creative and strategic work of deciding what questions to ask and how to test them. This remains substantially human, though AI is beginning to provide meaningful assistance in navigating large experimental search spaces and in suggesting experimental designs that efficiently discriminate between competing hypotheses.

Theoretical synthesis and conceptual integration — the work of building explanatory frameworks that make sense of empirical findings and connect them to broader theoretical structures. This is currently the domain most resistant to AI assistance, and it is also arguably the most scientifically valuable. The great theoretical advances of science—from Newtonian mechanics to Darwinian evolution to quantum mechanics to the structure of DNA—have been acts of conceptual synthesis that reorganized vast amounts of existing observation into frameworks with deep explanatory and predictive power. Nothing in the current AI landscape suggests that systems are close to making contributions at this level.

The prevailing excitement about AI in science is concentrated on the first type of work. The harder and more important question is what AI can do in the second and third types—and whether its assistance there is likely to accelerate or distort scientific progress. These questions do not have simple answers, and the scientific community would benefit from engaging with them more rigorously than it currently does.

AlphaFold and the New Biology

The story of AlphaFold is the most extensively documented case study of AI-driven scientific transformation, and it deserves careful analysis precisely because it illustrates both the genuine achievements and the genuine limitations of the current paradigm.

The protein structure prediction problem had been one of the central challenges of structural biology for half a century. The physical chemistry of protein folding is well understood in principle: a protein's three-dimensional structure is determined by its amino acid sequence and by the thermodynamics of folding in solution. But the computational problem of predicting structure from sequence turned out to be extraordinarily difficult, because the conformational search space is astronomically large and the energy landscape is highly complex, with many local minima that can trap optimization procedures short of the global minimum representing the correct native structure.

AlphaFold 2 effectively solved the structure prediction problem—achieving accuracy comparable to experimental methods for most single-chain proteins—by training a very large neural network on a database of known protein structures and learning to predict structure from sequence. The technical architecture combined attention mechanisms (adapted from large language models), multiple sequence alignments (using evolutionary information across species), and structural representations that encoded the geometry of folded proteins in a differentiable form amenable to gradient-based optimization.

The achievement was genuine and significant. For researchers who need structural information as an input to other work—understanding an enzyme's mechanism, designing a drug that binds a target protein, engineering a protein for industrial application—AlphaFold's predictions provide rapid access to information that previously required months or years of experimental work. The downstream productivity gains in structural biology, medicinal chemistry, and protein engineering have been real and substantial.

But the limitations of AlphaFold are equally instructive, and they define the frontier that subsequent generations of AI systems must address.

The dynamics limitation: AlphaFold predicts static structures, not dynamics. Proteins are not rigid objects but flexible molecules whose function depends on their conformational flexibility. Enzymes catalyze reactions by sampling specific conformational states; signaling proteins transmit information through conformational changes; many proteins function through intrinsically disordered regions that do not adopt stable three-dimensional structures. AlphaFold's predictions—however accurate for the static average structure—do not capture this dynamic dimension of protein function.

The novelty limitation: AlphaFold predicts structures with high accuracy for proteins similar to those in its training set, and with lower accuracy for highly novel proteins or protein complexes that are structurally distant from the training distribution. This is a standard machine learning limitation—performance degrades outside the training distribution—but it matters for specific scientific questions, particularly for the study of novel protein families or engineered proteins with non-natural properties.

The function gap: AlphaFold predicts structures, not function. The leap from knowing a protein's structure to understanding how it works, why it evolved, or how to modify it for a specific purpose remains a scientific problem that requires human expertise, biochemical experimentation, and creative thinking that current AI systems do not provide. The structure is a beginning, not an end.

AlphaFold solved the structure prediction problem and revealed how many unsolved problems were downstream of it. This is a general pattern with transformative scientific tools: they accelerate progress toward known goals and surface new challenges that had been hidden behind the old ones. The history of science is in part a history of instruments that revealed the next layer of complexity.

Generative AI in Drug Discovery: Promise and Limitation

The success of deep learning in structural biology has inspired a wave of analogous applications in drug discovery, which represents one of the largest potential markets for AI in science and has attracted disproportionate investment relative to other scientific domains.

The core drug discovery problem has two components: identifying a biological target—a protein, pathway, or cellular process—that contributes to a disease and can in principle be therapeutically modulated; and finding a small molecule or biologic that modulates that target in the desired way, with sufficient selectivity, safety, and pharmacokinetic properties to be clinically useful. AI has been deployed across both components, with varying degrees of success.

For target identification, AI systems trained on genomic, transcriptomic, proteomic, and clinical data can identify patterns of association that point toward novel therapeutic targets—genes or proteins whose dysfunction correlates with disease in ways that suggest causal involvement. This is valuable as a hypothesis-generating tool but requires extensive experimental validation, because correlation in biological datasets is often non-causal, confounded by the complexity of biological systems.

For molecule design, generative models are being used to propose candidate molecular structures with predicted activity against specific biological targets. The approach is attractive because drug-like chemical space is vast—estimates range from 10^23 to 10^60 potential small molecules—and systematic experimental exploration is impossible even with the most high-throughput screening technology. AI-generated proposals can narrow this space to candidate sets small enough for experimental validation. Several companies have emerged specifically to offer this as a service, proposing AI-generated candidates to pharmaceutical partners who then evaluate them experimentally.

The results so far are mixed in instructive ways. Several AI-generated drug candidates have entered clinical trials—this is a genuine milestone, and some have produced positive early results. But the attrition rates in AI-augmented drug discovery pipelines have not, so far, been dramatically lower than in conventional pipelines. The AI systems are effective at exploring chemical space and proposing candidates with predicted target activity; they are less effective at predicting the multi-parametric optimization problems that determine clinical success—the simultaneous optimization of activity, selectivity, metabolic stability, membrane permeability, solubility, and toxicity that defines a drug candidate.

The fundamental difficulty is that the training data available for drug discovery models is deeply non-representative of the full space of possibilities. The databases of compounds with known biological activity are heavily biased toward the kinds of molecules that have historically been made and tested—a tiny and non-random sample of possible drug-like chemical space. Models trained on this data tend to propose candidates that resemble known drugs, which is simultaneously a feature (they are likely to have drug-like properties) and a limitation (they may miss entirely novel chemotypes that would be more effective).

Materials Science: The Stability-Synthesis Gap

In materials science, analogous tools are being applied to the discovery of new battery materials, superconductors, catalysts, and structural materials. The scientific opportunity is substantial: the materials available to human civilization fundamentally constrain what technologies are possible, and the vast majority of inorganic materials that are thermodynamically stable have never been synthesized or characterized.

Google DeepMind's GNoME system, announced in late 2023, generated predictions for approximately 2.2 million potentially stable crystal structures—a tenfold increase in the number of known stable inorganic compounds. The scale of the achievement was genuinely impressive. The implications for materials discovery are potentially significant, though translating predicted stable structures into synthesizable, practical materials involves experimental challenges that the predictions do not resolve.

The core issue is what materials scientists call the stability-synthesis gap: thermodynamic stability predicts whether a material can in principle exist, but it says nothing about whether it can be made in practice. Many thermodynamically stable compounds require extreme synthesis conditions—pressures accessible only in diamond anvil cells, temperatures achievable only in specialized furnaces—that make them practically inaccessible. Others are stable only in particular atmospheric conditions, or decompose rapidly at ambient conditions despite being thermodynamically favored. And even materials that can be synthesized in principle may be synthesizable in only tiny quantities, using procedures that are not scalable.

The development pipeline from predicted stable material to practical application is therefore long and uncertain, and the AI contribution—predicting stability—addresses only the first step. The subsequent steps—synthesis development, property measurement, optimization, scale-up—remain human-intensive and expensive.

DomainAI CapabilityCurrent LimitationRemaining Human Work
Drug DiscoveryCandidate generation, target predictionMulti-parameter optimization, distribution biasExperimental validation, mechanism study
Materials ScienceStability prediction at scaleSynthesis gap, scalabilitySynthesis development, property characterization
Protein EngineeringStructure prediction, fitness predictionDynamics, complex interactionsFunction validation, in vivo testing
Chemical SynthesisReaction prediction, retrosynthesisNovel chemistry, yield optimizationExperimental execution, process development

AI in Mathematics: Verification and the Limits of Discovery

The application of AI to mathematics offers a sharp contrast to its applications in empirical sciences, and a more philosophically interesting set of questions. Mathematics is, in a certain sense, the domain where AI should be most powerful: it is a formal system, proofs are verifiable, and the space of possible mathematical statements is in principle precisely defined. It is also a domain where, so far, AI has made meaningful but more limited contributions than its enthusiasts have sometimes claimed.

The distinction between verification and discovery in mathematics is crucial. AI systems, particularly those based on formal verification frameworks like Lean and Coq, are increasingly capable of verifying that mathematical proofs are correct. This is genuinely valuable—the verification of complex proofs that human mathematicians find difficult to check is a real contribution—but it is not the same as discovering new mathematics.

DeepMind's AlphaTensor demonstrated that AI systems can discover novel algorithms for fundamental mathematical operations—finding faster matrix multiplication algorithms than the Strassen algorithm that had been state-of-the-art for decades. This is a genuine discovery in a narrow but important domain. The system was able to explore a large space of possible algorithms more systematically than human mathematicians, finding non-obvious improvements that had been missed by decades of human effort.

The harder question is whether AI can contribute to the more open-ended, conceptual work of mathematical discovery: identifying the right abstractions, formulating productive conjectures, recognizing when a promising line of inquiry is reaching a fundamental obstacle. This work involves the kind of mathematical intuition that is built through years of deep engagement with specific mathematical structures—an intuition that current AI systems, however capable at pattern recognition in well-defined spaces, do not appear to possess in any robust sense.

The most interesting current applications are at the boundary between verification and discovery. AI systems are beginning to suggest conjectures—mathematical statements that appear likely to be true based on patterns in the data—that human mathematicians then attempt to prove or refute. The conjecture generation step, which AI can assist with, and the proof discovery step, which remains primarily human, represent a productive division of cognitive labor that may prove to be the most natural mode of AI-human collaboration in mathematics.

The Formalization Bottleneck

One constraint on AI contributions to mathematics is the formalization bottleneck: most of the mathematical literature exists in informal natural language, not in formal systems that computers can reason about directly. The Lean and Coq communities have formalized only a small fraction of modern mathematics, and the translation from informal to formal is slow and painstaking.

Training AI systems on informal mathematics risks the hallucination problems that plague large language models in other domains—plausible-sounding but incorrect mathematical reasoning is harder to detect than obviously false factual claims, and has caused significant problems in several high-profile cases where AI-generated proofs were initially accepted before errors were found. The formalization of the mathematical corpus, boring and unrewarding as it is as a research task, is therefore strategically important for the long-run contribution of AI to mathematics.

The Genomics Revolution, Accelerated

Genomics offers perhaps the clearest case of genuine and substantial acceleration due to AI, in a domain where the data-intensive pattern recognition paradigm aligns well with the scientific problems at hand.

The fundamental question in genomics—how does variation in DNA sequence produce variation in biological phenotype?—involves statistical problems of extraordinary complexity. Human genomes vary at millions of positions; most phenotypes are influenced by thousands of genetic variants; the effects of variants interact in complex, context-dependent ways; and the biological mechanisms connecting genetic variation to phenotypic outcome involve multiple layers of molecular biology that are only partially understood.

Large-scale AI models, trained on genomic data at unprecedented scale, are beginning to produce meaningful contributions to several sub-problems:

Polygenic risk prediction: Polygenic risk scores, which aggregate the effects of many genetic variants to predict disease risk, have improved substantially with better statistical and machine learning methods applied to larger datasets. The clinical utility of these scores—their ability to identify individuals at elevated risk who might benefit from screening or preventive intervention—is beginning to be demonstrated in prospective studies.

Regulatory genome interpretation: The interpretation of non-coding genomic variation—which constitutes the large majority of disease-associated genetic variants but is not directly translated into protein sequence—is an area where deep learning methods are producing significant advances. Models trained to predict chromatin accessibility, transcription factor binding, and RNA splicing from DNA sequence are enabling the functional interpretation of variants that previously resisted interpretation.

Sequence design: Models like those from the Evo project demonstrate the ability to predict functional properties from genomic sequences across a wide range of organisms, enabling the design of novel biological sequences with predicted properties. This capability is being applied to the engineering of novel proteins, gene regulatory elements, and potentially entire metabolic pathways.

The limitation here, again, is the gap between statistical association and mechanistic understanding. AI systems are extremely good at identifying patterns of association in genomic data. They are much weaker at providing the mechanistic explanations that are necessary for designing effective interventions. Understanding how a particular genetic variant contributes to a disease risk through the intervening molecular biology is necessary for identifying the right therapeutic target, and this understanding requires experimental work that AI cannot substitute for.

Statistical association in genomics and mechanistic understanding are different things. AI has dramatically accelerated the generation of associations. It has not, so far, substantially accelerated the resolution of mechanisms. This asymmetry is the defining scientific challenge of AI-augmented genomics, and it will shape the pace of translation from genomic discovery to therapeutic application.

Climate Science and Earth Systems Modeling

Climate science presents a case where AI's contribution is real and important, but where the nature of the contribution is somewhat different from the pattern recognition paradigm dominant in structural biology and genomics.

The fundamental challenge in climate science is not primarily a data problem—the Earth system is thoroughly instrumented, and the volume of observational data is enormous. The fundamental challenge is a modeling problem: the Earth's climate involves coupled dynamics across atmosphere, ocean, land surface, and ice sheets, at scales ranging from turbulent eddies smaller than a meter to planetary circulation patterns spanning thousands of kilometers. Representing this multi-scale dynamics in numerical models requires approximations—parameterizations of sub-grid processes—that introduce uncertainties that accumulate over simulation time.

AI contributions to climate science are concentrated in several specific areas:

Machine learning parameterizations — learned approximations for sub-grid processes like cloud formation, convection, and ocean mixing can substitute for expensive process-based parameterizations and in some cases capture dynamics that process-based approaches miss. The challenge is stability: ML parameterizations that perform well in testing can produce numerical instabilities when embedded in full climate models, and ensuring robust behavior across the full range of conditions encountered in long climate simulations is a difficult problem.

Neural network weather prediction — systems like GraphCast and Pangu-Weather have demonstrated weather forecasting accuracy comparable to or better than operational numerical weather prediction at a fraction of the computational cost. This is a significant practical achievement with genuine operational value. Its extension to longer-timescale climate projection is more limited, because climate projection is fundamentally different from weather prediction—it requires not skill at a specific task but the ability to simulate the statistics of the climate system under conditions substantially different from those in the training data.

Climate emulators — systems that learn to reproduce the output of expensive climate models at a fraction of the computational cost, enabling ensemble simulations at scale that would be impossible with full-physics models. This enables probability distributions over future climate states that are more useful for adaptation planning than single-trajectory projections, because they quantify the uncertainty that is inherent in long-range climate projection.

The limitation is the standard AI caution about distribution shift: climate projections require models to perform well in conditions—higher atmospheric CO2, warmer oceans, altered precipitation patterns—that are substantially outside the training distribution. Physics-based models have a claim to generalization beyond the training distribution because their dynamics are grounded in conservation laws and physical principles that hold under all conditions. Machine learning models lack this grounding, and their generalization properties under distribution shift are uncertain.

Epistemological Challenges of AI-Driven Science

The foregoing survey generates a set of crosscutting epistemological challenges that apply broadly to AI-augmented scientific research. These challenges are not merely technical; they are philosophical and institutional, and they deserve engagement from the scientific community as a community rather than being left to individual researchers to navigate.

The Interpretability Imperative

Most powerful AI systems in science are black boxes: they produce predictions or outputs without providing mechanistic explanations for why they make those predictions. This creates a deep tension with the explanatory goals of science. A prediction, even a highly accurate one, is not an explanation. Science aims to understand why phenomena behave as they do, not merely to predict that they will behave in certain ways.

This tension is most consequential in domains where the scientific goal is not prediction but understanding—where the research is ultimately motivated by the desire to develop interventions, not just forecasts. A drug discovery model that predicts molecular binding affinities but does not explain the structural basis of binding is less scientifically valuable than one that does both, even if its predictive accuracy is identical. A genomic model that predicts disease risk but does not explain the biological mechanisms is less valuable for drug development than one that explains as well as predicts.

There are two responses to this tension. The first is pragmatic: accept that accurate predictions are valuable even without explanations, and use AI predictions as inputs to hypothesis-driven research that pursues mechanistic understanding by other means. The second is to invest in interpretable AI—methods that produce not just predictions but explanations that connect to the theoretical frameworks of the relevant science. Both responses are being pursued, but the pragmatic response is currently dominant, in part because interpretable AI systems are typically less accurate, and in part because the incentive structure of academic science rewards accuracy over interpretability.

The Reproducibility Dimension

Science's reproducibility crisis predates AI, but AI introduces new dimensions to it that deserve recognition. AI systems trained on large datasets can memorize rather than generalize: they can identify patterns in their training data that do not replicate in new data. Distinguishing genuine discovery from pattern overfitting requires careful experimental design and independent validation, which are not always implemented with the rigor they require in the competitive environment of academic science.

The computational reproducibility problem is more acute. In the physical sciences, reproducibility requires that other researchers can implement the same methods and obtain similar results. Reproducing the training of a large AI model requires computational resources—GPU clusters running for weeks or months, at costs reaching millions of dollars—that only a handful of institutions possess. This creates a reproducibility barrier that is qualitatively different from anything the scientific enterprise has previously confronted.

The emerging norm is to require publication of model weights and training data alongside results, enabling independent validation without requiring full retraining. This norm is being adopted unevenly: it is more established in natural language processing and computer vision than in domain-specific scientific AI, and there remain significant gaps between stated policy and actual practice at major journals.

The Homogenization Risk

A structural concern about the widespread deployment of foundation models in scientific research is homogenization: if many research groups are using the same underlying models, trained on the same datasets, their predictions will be correlated in ways that are not obvious from the surface diversity of research outputs. A systematic bias in the underlying model will propagate across many downstream applications, potentially introducing systematic error that is difficult to detect because it is not visible in any single study.

This concern has historical precedent. The adoption of the same few statistical software packages in clinical research introduced correlated methodological assumptions—about model specification, handling of missing data, and multiple testing correction—that contributed to the reproducibility crisis in medicine. The same dynamics may emerge in AI-augmented science if a small number of foundation models become the dominant infrastructure for research across multiple fields.

The scientific community is only beginning to develop the institutional norms needed to manage this risk. It will require both methodological diversity—deliberate support for approaches that use different underlying models and assumptions—and institutional mechanisms for detecting and correcting systematic biases when they occur. Scientific journals, funding agencies, and research consortia all have roles to play in establishing these mechanisms.

Epistemological RiskExamplesMitigation
Black-box predictions without explanationProtein function, genomic mechanism, drug activityInterpretable AI, hybrid mechanistic-ML models
Overfitting to training dataDrug discovery, materials predictionRigorous external validation, prospective design
Computational irreproducibilityFoundation model resultsOpen weights, training transparency
Systematic homogenization biasMulti-domain AI pipelinesMethodological diversity, adversarial evaluation
Distribution shift in projectionClimate AI, genomic riskPhysics-constrained models, uncertainty quantification

Institutional Implications: Reshaping the Research Enterprise

The transformation of scientific research by AI has profound implications for every institution that governs or supports science.

The Research University

Research universities face a particular strategic challenge. Their competitive position in science has historically rested on the quality of their faculty and graduate students—human capital that took decades to develop. AI systems that automate parts of the research process partially erode this competitive advantage in certain domains. But the capabilities that AI most struggles with—the formation of scientific judgment, the ability to ask productive questions, the conceptual creativity required for genuine theoretical advance—are exactly what research universities exist to develop.

The most durable competitive advantage of research universities in an AI-augmented science ecosystem lies in being the preeminent institutions for developing scientific judgment in human researchers. This requires rethinking curriculum and pedagogy: the rote computational and data-processing skills that graduate training has historically emphasized are increasingly automatable, and the time freed up by their automation should be reinvested in developing the critical, synthetic, and creative skills that AI cannot replicate.

Universities also face genuine challenges in data access and computational infrastructure. The most powerful AI applications in science require large, well-curated datasets and substantial computational resources—both of which are increasingly concentrated in a small number of corporate AI research labs. The policy question of how to ensure that academic researchers have access to the resources they need for competitive research is important and insufficiently discussed in policy forums.

The Crisis and Opportunity in Scientific Publishing

Scientific journals face a crisis that AI both exacerbates and potentially helps resolve. The peer review system, designed for a world of relatively modest publication volume, is under stress from rising submission rates, declining reviewer availability, and concerns about review quality. AI lowers the cost of generating plausible-sounding scientific text, increasing pressure on a system already under strain.

The deeper structural question is whether scientific publishing, as currently structured, is the right system for AI-accelerated science. The latency of peer review—months to years between submission and publication—is mismatched with the pace of AI-driven research. The preprint ecosystem has partially addressed this latency issue but creates its own challenges in terms of quality filtering. And the emphasis on novel findings over replication and synthesis creates incentives that are particularly dangerous in a domain where AI systems can generate novel-seeming results through sophisticated pattern recognition that fails under scrutiny.

Funding Architecture for the AI Era

Public science funding agencies face strategic choices about how to allocate resources in a world where some aspects of the research process are being automated and others are becoming more important and more expensive.

The case for redirecting funding toward research infrastructure—data curation and access, computational resources, open model development—is strong. The bottleneck in many AI-augmented research programs is not the AI itself but the quality and accessibility of the training data. Systematic public investment in high-quality, open scientific datasets—in genomics, structural biology, materials science, and climate science—would generate substantial public returns and ensure that the scientific community retains the ability to build and evaluate AI tools without depending entirely on commercial providers.

The case for sustained investment in fundamental research—theoretical frameworks, mechanistic understanding, conceptual advances—is equally strong and more important to protect precisely because it is most at risk of being crowded out by excitement about near-term AI applications. The history of science repeatedly demonstrates that applied research programs stall when the relevant basic science is incomplete, and that the basic science investments that ultimately prove most valuable are those whose importance was not predictable at the time they were made.

Geopolitics of Scientific AI

The development of AI for scientific discovery is not happening in a geopolitical vacuum. The United States and China are engaged in an intensifying competition for leadership in AI capabilities generally, and this competition extends to scientific AI specifically.

China's scientific AI program is substantial and well-funded. Chinese institutions have produced major contributions to protein structure prediction, materials discovery, and genomic AI. The Chinese government has explicitly identified AI for scientific research as a strategic priority, with funding commitments and institutional support that match or exceed those in the United States in specific domains.

The United States retains advantages in the underlying foundation model capabilities—the most powerful general-purpose AI systems remain primarily American in origin—and in the quality and global openness of its research university system. The openness of the American scientific ecosystem, which has historically drawn talent from around the world, remains a significant competitive advantage that restrictive policy could inadvertently erode.

The most consequential geopolitical question for scientific AI is not which country produces the best AI tools for science—that competition will produce simultaneous advances and is likely to be self-correcting over time—but which country builds the scientific capabilities necessary to use those tools most productively. The value of AlphaFold structures, of predicted material candidates, of AI-generated drug candidates, is realized only by the scientists who interpret them, design the experiments to validate them, and develop the mechanistic understanding that translates prediction into action. The human scientific infrastructure—the universities, the laboratories, the mentorship chains, the institutional culture—is ultimately what determines whether AI tools generate scientific value.

The Next Decade: A Sober Assessment

Projecting the specific scientific achievements that AI will enable over the next decade requires acknowledging deep uncertainty about the rate of progress in both AI capabilities and scientific applications. But certain structural dynamics are likely to shape the trajectory.

The integration of theory and machine learning will deepen in the most productive research programs. The most successful applications of AI in science are not those where AI replaces theoretical frameworks but those where the two are tightly coupled—where theoretical constraints inform model architecture, where AI-generated findings provide inputs to theoretical refinement, and where theory guides the design of experiments that validate AI predictions. Institutions and research programs that build this integration will be more productive than those that deploy AI as a black-box prediction service.

Multi-modal and multi-scale models will become more central to AI-augmented science. The most important scientific questions in biology and materials science involve phenomena that span multiple scales of description—from atomic interactions to macroscopic properties, from molecular mechanisms to organismal phenotypes. Single-modality models will continue to be useful for specific tasks but will be insufficient for the most important scientific questions. The development of systems that integrate information across modalities and scales is both technically demanding and scientifically essential.

Automated laboratory systems will increasingly close the loop between AI-generated hypotheses and experimental validation. Systems that can design experiments, execute them robotically, and feed results back into AI models for iterative refinement are already operational in pharmaceutical and materials research contexts. As they become more capable and more broadly deployed, the distinction between computational prediction and experimental validation will become less sharp, and the iterative optimization of both the model and the experimental design will become a continuous process rather than a sequential one.

The boundaries of AI competence will clarify. The past five years have been characterized by a mixture of genuine progress and exaggerated claims. As the technology matures and more systematic evaluations are completed, a clearer picture will emerge of which scientific problems AI can accelerate substantially, which it can contribute to modestly, and which remain essentially resistant to AI augmentation. This clarification will be valuable for resource allocation by research institutions and funders.

Conclusion: A Transformation, Not a Revolution

The headline claim that AI is revolutionizing scientific discovery is simultaneously true and misleading. It is true in the sense that AI is genuinely transforming specific aspects of the scientific process in specific domains, with real and substantial implications for the pace and character of discovery. It is misleading in the sense that the transformation is uneven across domains, that the most important aspects of science—the formation of productive questions, the development of explanatory frameworks, the exercise of scientific judgment—remain primarily human, and that the institutional and epistemological challenges created by AI-augmented science are significant and largely unresolved.

The appropriate response to this transformation is neither uncritical enthusiasm nor reflexive conservatism. It is a serious, domain-specific analysis of what AI can contribute, what it cannot, and what the institutional prerequisites are for capturing the genuine value it offers while managing the genuine risks it creates.

Science is, ultimately, the process by which human beings extend their understanding of the world. AI is a powerful set of tools for certain aspects of that process. The challenge for the institutions and individuals responsible for the scientific enterprise is to deploy those tools wisely—to invest in the capabilities that AI cannot provide, to maintain the epistemic standards that distinguish science from sophisticated pattern matching, and to ensure that the compounding of scientific knowledge remains a genuinely human enterprise rather than becoming dependent on systems whose inner workings remain opaque to the scientists who use them.

The tools are new. The work of science remains human.

Sources & References

  • Nature
  • Science
  • Cell
  • DeepMind research publications
  • PNAS (Proceedings of the National Academy of Sciences)
  • Nature Methods
  • Journal of Chemical Information and Modeling
  • Annual Review of Genomics and Human Genetics
  • Nature Machine Intelligence
  • Science Advances
  • RAND Corporation research reports
  • National Science Foundation strategic reports
  • National Institutes of Health Office of Data Science Strategy publications
  • The Lancet Digital Health
  • MIT Technology Review
  • Journal of Molecular Biology
  • Nature Climate Change
  • Geophysical Research Letters
  • European Research Council strategic documents
  • Philosophy of Science
  • Trends in Biotechnology
  • Nature Chemical Biology
  • Journal of the American Chemical Society
  • Nucleic Acids Research
  • eLife
  • PLOS Computational Biology
  • Nature Reviews Drug Discovery
  • Wellcome Trust research reports
  • Chan Zuckerberg Initiative science publications

Astrophysics and the Survey Science Revolution

The application of AI to astrophysics offers a case study where the transformation is already well advanced and where the specific character of that transformation illuminates the broader patterns at play.

Modern astrophysics has become, in large measure, a data science problem. The progression from individual telescope observations to large-scale sky surveys—culminating in facilities like the Vera C. Rubin Observatory, which will image the entire visible sky every few nights and generate petabytes of data per year—has created data volumes that are completely unmanageable by traditional human inspection and analysis. The universe contains hundreds of billions of galaxies; each night of observation from a major survey telescope produces data on millions of celestial objects; and interesting phenomena—gravitational wave sources, gamma-ray bursts, fast radio bursts, transient events of all kinds—occur unpredictably and require rapid response.

Machine learning systems have become essential infrastructure for this data environment. Classification algorithms that distinguish galaxies, stars, quasars, and transient events in survey data are routinely deployed at pipeline scale, processing the output of survey telescopes in near real time. Convolutional neural networks trained on labeled catalogs achieve classification accuracy that matches or exceeds human experts at a tiny fraction of the time and cost. And anomaly detection systems that identify objects or events that deviate from the patterns of the training distribution provide a systematic mechanism for finding the unexpected—the most scientifically valuable events that purely taxonomy-driven approaches might miss.

The gravitational wave discovery program at LIGO and Virgo provides a particularly instructive example. The detection of gravitational waves from binary black hole and neutron star mergers requires identifying signals that are deeply buried in detector noise—signals that are smaller in amplitude than the background by orders of magnitude. Machine learning methods have significantly improved the sensitivity of gravitational wave searches and the speed of candidate identification, enabling follow-up observations that are time-critical because the electromagnetic counterparts of gravitational wave events fade quickly.

The scientific significance of AI in astrophysics extends beyond detection to characterization and interpretation. Galaxy morphology classifications from deep learning have revealed statistical relationships between galaxy structure and environment at scales that would be impossible to study with human classification. Photometric redshift estimation—inferring the distance to a galaxy from its color without expensive spectroscopic measurement—has been substantially improved by machine learning, enabling the exploitation of large photometric surveys for cosmological analysis.

The astrophysics case illustrates a general truth about AI in data-intensive science: the transformation happens first at the data processing and pattern recognition layer, and then propagates upward to shape which scientific questions can be asked. The facilities produce the data; AI makes the data tractable; and the tractability of new data opens scientific questions that were previously inaccessible.

AI in Neuroscience: Mapping the Brain at Scale

Neuroscience represents one of the most ambitious frontiers for AI-assisted scientific discovery, and one of the most illustrative of both the possibilities and the limitations of the current paradigm.

The fundamental challenge in neuroscience is the problem of neural complexity: the human brain contains approximately 86 billion neurons connected by roughly 100 trillion synapses, operating at multiple levels of organization—molecular, cellular, circuit, network, and systems—that are all relevant for understanding behavior and disease. No existing experimental or analytical technique comes close to providing comprehensive observations at all these levels simultaneously, and the theoretical frameworks that would make sense of such observations are far from complete.

AI contributions to neuroscience are concentrated in areas where the pattern recognition problem is well-defined and where large datasets exist. Connectomics — the reconstruction of neural wiring diagrams from electron microscopy data—is an area where AI has been transformative. Manually tracing the connections between neurons in dense tissue sections is extraordinarily labor-intensive; automated segmentation methods trained on human-labeled data have accelerated the process by orders of magnitude and enabled the reconstruction of neural circuits at scales that would have been unthinkable without them. The reconstruction of the complete wiring diagram of a fruit fly brain—representing the first complete connectome of a brain with behavioral complexity—was achieved using AI-automated segmentation and represents a genuine scientific milestone.

For human brain research, functional MRI data analysis has benefited substantially from machine learning methods. Decoding approaches that predict cognitive states from patterns of brain activity—what the subject is thinking about, what category of stimulus they are viewing—have demonstrated that much more information is encoded in fMRI signals than conventional univariate analysis methods revealed. Large-scale brain imaging studies, enabled by international consortia that have combined data across thousands of participants, have identified reliable neural correlates of cognitive differences and psychiatric conditions that smaller studies could not detect.

The limitations are equally instructive. The gap between measuring patterns of neural activity and understanding the computational principles that implement behavior remains large. The brain is not a lookup table that maps stimuli to responses; it is a dynamical system that implements computations through the temporal evolution of activity across circuits. AI methods that identify statistical associations in brain data are beginning to constrain theories of neural computation, but they are not substituting for the theoretical work of understanding what those computations are and how they are implemented.

Synthesis: The Epistemic Architecture of AI-Augmented Science

Drawing together the domain-specific analyses in this article, several general propositions about the epistemic architecture of AI-augmented science can be stated with reasonable confidence.

AI transforms the cost structure of science before it transforms its content. The primary near-term effect of AI on scientific research is to reduce the cost—in time, money, and human labor—of specific tasks: structure prediction, compound screening, literature synthesis, data classification, image segmentation. These cost reductions are valuable and their downstream effects on scientific productivity are real. But cost reductions at specific task levels do not automatically translate into changes in the rate of genuinely novel scientific insight, which depends on the conceptual creativity, theoretical synthesis, and experimental ingenuity that AI currently struggles to provide.

The most productive applications of AI in science are collaborative rather than autonomous. The pattern that emerges across domains is that AI-human collaboration—in which AI handles pattern recognition and hypothesis generation at scale while humans provide theoretical context, experimental judgment, and mechanistic interpretation—is more productive than either AI or humans working alone. The design of research environments, institutional incentives, and training programs that enable effective AI-human collaboration in science is therefore a priority for research institutions.

Reproducibility and rigor require active investment, not passive compliance. The reproducibility challenges introduced by AI-augmented science—from overfitting to distribution shift to computational irreproducibility—require active investment by scientific institutions in the infrastructure and norms that maintain rigor. Open data, open models, independent replication studies, and adversarial evaluation of AI-generated claims are not optional extras; they are the prerequisites for an AI-augmented scientific literature that is trustworthy.

The transformation will be uneven and the timeline uncertain. The most enthusiastic predictions about AI-driven scientific acceleration tend to extrapolate from successes in well-defined benchmark tasks to progress on the open-ended problems that define the frontier of scientific knowledge. These extrapolations have historically been overoptimistic, and they are likely to continue to be. The transformation is real; its pace is uncertain; and its distribution across scientific domains and across the types of scientific work described above will be highly uneven for the foreseeable future.

The Literature Synthesis Problem

One dimension of AI in science that deserves attention but has received less than it merits is the application of AI to scientific literature synthesis. The volume of scientific publications has grown exponentially over the past several decades, and the ability of any individual scientist to maintain comprehensive awareness of the relevant literature in their field has declined correspondingly. AI systems capable of reading, summarizing, and synthesizing large bodies of scientific literature offer potential relief from this information overload.

Large language models can now produce summaries of scientific papers, answer factual questions about published research, identify connections across bodies of literature, and generate literature review sections that synthesize existing knowledge. These capabilities are being actively deployed in research contexts and are genuinely useful for certain tasks—rapid orientation to an unfamiliar subfield, identification of potentially relevant papers, first-draft synthesis of well-established findings.

The risks are substantial and should not be minimized. Large language models hallucinate: they generate plausible-sounding but false claims about scientific findings, including fabricated citations to papers that do not exist. In a scientific context, this risk is particularly dangerous because the plausibility of a hallucinated claim can make it difficult to detect without independent verification. Scientists who use AI-generated literature reviews without systematic verification are incorporating uncertain and potentially false claims into their research, creating the conditions for error propagation through the scientific literature.

The normative implication is clear: AI-assisted literature synthesis should be treated as a starting point for human verification, not as a reliable endpoint. The value lies in acceleration—identifying candidates for human review—not in replacement of human judgment about the content and significance of primary sources.

Citizen Science and AI: Democratizing Discovery

One underappreciated dimension of AI's transformation of science is its potential to democratize participation in scientific research through citizen science platforms. Historically, meaningful participation in scientific research required institutional access—a university affiliation, access to laboratory equipment, mentorship from established researchers. AI tools are beginning to reduce some of these barriers, enabling people outside traditional research institutions to make genuine contributions.

The Foldit game, which engaged citizen scientists in protein folding puzzles, produced genuine scientific contributions before AlphaFold largely solved the problem through machine learning. Galaxy Zoo engaged hundreds of thousands of citizens in classifying galaxy morphologies from survey images, producing scientifically valuable classifications at a scale that professional astronomers could not achieve alone. Seismographic networks assembled from volunteer-operated sensors have contributed to earthquake early warning systems. In each case, AI has been essential for managing the data flows, quality-filtering contributions, and aggregating results in ways that make citizen contributions scientifically useful.

As AI tools become more capable of providing guidance and feedback to non-expert contributors, the range of scientific problems that citizen science can address is likely to expand. The implications for democratizing scientific knowledge creation—and for the relationship between scientific institutions and the broader public—deserve more serious attention than they have received.

Funding Innovation for AI-Augmented Science

The funding architecture for scientific research was designed for a model of research that involved gradual, incremental progress through hypothesis-driven experiments. AI-augmented science has different resource requirements: large computational budgets, access to large proprietary datasets, and the ability to move quickly enough to iterate through many AI-generated hypotheses before competitors or circumstances render them irrelevant.

Traditional grant funding mechanisms are poorly suited to these requirements. Review processes that take six months to a year are incompatible with research cycles that AI is compressing to weeks or months. Grant budgets designed around experimental costs are misaligned with the computational costs that dominate AI-intensive research programs. And the hypothesis-by-hypothesis proposal format is incompatible with the exploratory, data-driven character of AI-assisted research where the most important discoveries may not have been hypothesized in advance.

Several funding innovations are being tried. Frontier research programs with large, flexible budgets and long time horizons—modeled on the Wellcome Sanger Institute and similar large-scale biology research programs—can accommodate the resource requirements and the unpredictability of AI-augmented science. Industry-academic partnerships, in which commercial AI capability is combined with academic scientific expertise, are producing significant results in several domains but raise governance questions about intellectual property and publication rights. And competitive prize mechanisms that reward specific scientific achievements—a solved structure, a synthesized material, a validated therapeutic hypothesis—provide incentives that are compatible with rapid AI-driven exploration.

No single funding innovation is sufficient, and the diversity of funding mechanisms available to researchers is itself valuable—it enables different research styles and risk profiles that collectively produce a more robust scientific ecosystem. The appropriate policy goal is not to standardize AI-augmented science funding but to ensure that the full range of valuable research approaches has access to funding mechanisms that fit their requirements.

Science Policy Implications: A Strategic Agenda

The transformation of scientific research by AI creates a specific set of policy challenges that are not being adequately addressed by existing science policy frameworks. The most important are the following.

Open infrastructure for scientific AI. The most powerful AI tools for scientific research are currently developed and controlled by a small number of large technology companies with commercial interests that do not always align with the public interest in open, reproducible science. Public investment in open-source AI infrastructure for science—foundation models for specific scientific domains, curated training datasets, evaluation benchmarks—would reduce this dependence and ensure that the scientific community retains the ability to build, evaluate, and improve the tools it uses. This is analogous to the public investment in scientific instrumentation and computing infrastructure that has historically been essential for enabling broad participation in frontier research.

Training the next generation of scientists for AI fluency. The scientists who will be most productive in an AI-augmented research environment are those who understand both their domain science deeply and AI methods well enough to use them critically—to know when AI outputs can be trusted, when they need to be verified, and when the AI approach is the wrong tool for the question at hand. Graduate training in most scientific fields does not yet systematically develop this dual competency. Updating graduate curricula to include meaningful AI training, while protecting the deep domain formation that remains essential, is a priority that research universities and funding agencies should address collaboratively.

International coordination on AI for scientific reproducibility. The reproducibility challenges that AI introduces in science—computational irreproducibility, overfitting, distribution shift—are global problems that cannot be adequately addressed by individual institutions or national policies alone. International coordination on standards for reporting AI-generated scientific results, requirements for open model publication, and mechanisms for cross-national independent replication would significantly improve the reliability of the AI-augmented scientific literature. The scientific community has coordinated on reproducibility standards before; the AI era requires updated standards that reflect the new dimensions of the challenge.

Protecting the space for fundamental research. In the current environment of excitement about near-term AI applications in science, the temptation to direct resources away from fundamental research and toward applied AI science programs is strong. Resisting this temptation requires explicit governance choices by funding agencies and university leadership to protect funding streams for basic research, theoretical development, and the training of scientists in foundational skills that may not be immediately applicable to AI-driven research programs but that underpin the long-run scientific capacity on which applied programs ultimately depend.

The organizations and governments that get these policy choices right will be better positioned to capture the genuine value that AI offers for scientific discovery while avoiding the risks that poorly managed AI deployment creates. Those that get them wrong may find themselves with powerful tools for generating plausible-seeming results and a degraded capacity to distinguish genuine scientific progress from sophisticated confabulation.

ShareLinkedInXEmail

Stay informed

Get notified when we publish new insights on strategy, AI, and execution.

MR
Moussa Rahmouni

Strategy & Program Manager — Founder of Stratelya & InekIA

LinkedIn →
View Profile →

Related Insights

tech-ai

AI Governance as Institutional Infrastructure: Building Enterprise Risk Architecture for the Age of Autonomous Systems

Most enterprises have built AI governance that resembles compliance departments rather than genuine risk architecture. This analysis provides an institutional f

tech-ai

Multimodal AI and the Transformation of Enterprise Knowledge Systems

The transition from language models to multimodal AI systems is not a linear upgrade — it is a qualitative shift in what AI can perceive and understand about or

tech-ai

The Agentic Layer: How Multi-Agent Orchestration Is Reshaping Enterprise Operations

AI systems are no longer merely tools that humans use — they are becoming agents that reason, plan, delegate, and execute across extended sequences of actions.

← All InsightsBook a Diagnostic