In humanity’s quest to understand itself and the universe, science not only advances the production of new knowledge but also constantly helps people see a "larger and richer" world, and understand the kind of world we truly want to enter. What’s fascinating in this process is that "science" may overthrow itself at any time, yet as a whole, it is more like a "cosmic editorial story" woven by humans. With every new discovery and every new insight, we update this story, and the world in our eyes changes accordingly—once again becoming comprehensible.
" Scientific discoveries are essentially narratives written by humans for humans; they are a form of storytelling... As the discovery process becomes automated by robotics and AI, the role of human scientists will shift to that of 'science curators'."
I appreciate the view above from this article. The knowledge humans possess about the universe and nature is merely the tip of the iceberg, and we always understand the world centered on ourselves. There is still nearly infinite space waiting to be explored by humans and AI scientists together. However, humans can take on a new role as "curators of meaning," becoming the irreplaceable bridge connecting the cold physical universe and the emotional human world. This allows us to move forward calmly into an "infinite beginning."
The author of the article shared today is Andrew White, co-founder and chief scientist of the innovative scientific research institution FutureHouse. In previous articles, I shared how FutureHouse accelerates the process of scientific discovery through AI scientists and automated laboratories.
Extended Reading: FutureHouse Founder: How to Create AI Scientists in Biology? What is the Future Work of Human Scientists?
Building AI Scientists: How FutureHouse Drives Large-Scale Scientific Intelligence | Exclusive Interview with Asimov Press
I hope today’s article inspires you.
URL: diffuse.one/infinite_discoveries
Reference Number: D2-001
Author: Andrew White
Status: Completed
Abstract: What exactly is a discovery? Are they becoming increasingly difficult to make? Here, I share some thoughts for and against the proposition of "whether discoveries are finite." I argue that scientific discoveries are endless, but they are morphing from new capabilities to narratives for human consumption.
What is a Discovery?
Here are some famous definitions of scientific discovery:
"A scientific discovery is the process or product of successful scientific inquiry. The objects of discovery can be things, events, processes, causes, properties, as well as theories, hypotheses, and their characteristics (such as their explanatory power)." — Stanford Encyclopedia of Philosophy (2022)
"Discovery is the production of new knowledge about nature, often by recognizing or identifying something new." — Oxford Reference (A Companion to the History of Modern Science)
"A scientific discovery is a complex event that requires both recognizing that a new phenomenon actually exists and understanding what it is—that is, it involves both observation and correct conceptualization." — Thomas Kuhn, The Structure of Scientific Revolutions (1962)
Here is my current working definition:
A scientific discovery is the cataloging of a principle about how the universe works. Examples include the discovery of the neutron or identifying the molecule responsible for the scent of pine trees.
The "principle" here does not necessarily mean simplifying the complexity of our understanding of the universe. Often, the opposite is true: early experiments in quantum mechanics, for instance, made our theoretical understanding of the universe more complex. Nevertheless, a principle somehow reduces the number of unexplained or unaccounted physical phenomena we observe. Like fundamental forces, natural selection, and chemical bonds, they all provide better accounting of empirical observations. If we adopt this view, it should seem clear that science has some endpoint when we've enumerated all of these principles.
Of course, there may also be a long "tail." There could be a long list of testable hypotheses that either require insanely high costs (such as a 100 TeV particle accelerator) or lie at the end of a technical chain, requiring numerous engineering breakthroughs before they can be realized—for example, obtaining physical samples from pulsars. However, this long tail also means that we should expect a decline in the efficiency of scientific output. After we identify and catalog the easy discoveries, the remaining work will naturally take longer; this natural ordering is almost inevitable.
Discovery is Getting Harder
We can see an empirical fact from multiple independent sources: scientific productivity is declining.
The time between the completion of Nobel Prize-winning work and the award itself is lengthening, meaning it is increasingly difficult to identify truly groundbreaking research.1
The cost of drug development continues to rise, indicating a steady decline in the productivity of pharmaceutical research.2
Even after controlling for the number of papers and journal quality, the number of disruptive papers published each year is decreasing.3
In inflation-adjusted US dollars, the cost of major scientific discoveries roughly doubles every decade, far outpacing the growth of research funding.?
Papers rely increasingly on highly specialized technologies, and the average number of co-authors has grown from 1 to nearly 5 over the past 50 years.?
This view was systematically elaborated by John Horgan in his popular book The End of Science, and in recent years, Bloom et al. have re-proposed similar conclusions through various quantitative indicators.?
This seems like a quite reasonable hypothesis, and the evidence from multiple sources is quite clear.
I believe there is a decrease in consensus "major" discoveries.
Is Discovery Really Finite?
However, there is a counterargument. The universe contains many emergent complex systems that are difficult to reduce to basic principles for explanation. Take ant biology, for example. Understanding all components of ant behavior—how they collaborate, why army ants from the same nest do not attack other army ant nests but do attack non-army ant species—may require many years of basic research. Biology is filled with such complex phenomena, each of which could consume the efforts of hundreds of doctoral theses. Assuming no pause in life on Earth, evolution should continue to produce such systems as well.
Another example is mathematics—there should be an infinite number of theorems, and thus an infinite number of discoveries. Since most mathematics is created by humans rather than being the result of physical observations, discoveries in mathematics may be endless. New problems will always emerge in mathematics, stemming from humans creating new structures and exploring them. Of course, we can always calculate more digits of π!
I used to work on physical simulations of particle systems—such as Ising models or Potts models. These simulations can be completed with just a few hundred lines of computer code, yet thousands of papers have been published based on observations and principles from Ising model simulations, such as multiple parallel simulations at different temperatures or the use of complex sampling procedures.
Ants, mathematics, or Ising models may seem trivial to you. Then consider human diseases. The human genome contains 3 billion base pairs, and any individual has approximately 3–4 million differences compared to a hypothetical "reference" genome. The number of combinations of 3 million out of 3 billion is vastly greater than the number of atoms in the universe. Each variation may be associated with a disease; at least in current reality, everyone will eventually die of some disease. Identifying and curing all human diseases is a task with a clear endpoint: it will be completed when humans no longer die of diseases. However, a systematic cataloging of all the effects of these mutations may lead to an astronomical number of discoveries. Then, you can repeat this process for all living things on Earth; then for past life on Earth, or simulated life in Earth’s future; further still, extending to all hypothetical organisms in all hypothetical habitable planets.
There exists this category of "labyrinth" tasks—such as simulating ant behavior, uncovering the principles of the Ising model, and describing the effects of all human genetic mutations—which provide an almost endless resource of discoveries. We are also constantly inventing new "labyrinths" by creating new complex systems, such as equivariant neural networks, normalizing flows, and Wordle—these are all recently defined topics by humans, with many discoveries still waiting to be made.
This leads me to a hypothesis: Scientific discoveries are fundamentally narratives written by humans for humans. The resources available for us to discover are infinite, and what we elevate to be a "novel scientific discovery" is a human preference decision. Discoveries are storytelling exercises. Of course, they must be supported by evidence, and it’s even better if they can be accompanied by concise equations and clear, elegant explanations.
Some scientific discoveries do help us better predict the universe—such as when the sun will rise or what effects fertilization will have. But another set only better describes some endless supply of complex systems that humans define—such as a branch of mathematics, the biology of orcas, or the properties of a new type of high-entropy alloy. The line between these two is blurry. I believe that what separates an amazing scientific discovery from a trivial restatement of an obvious fact of the universe is simply human judgment.
Information Theory
Clearly, there is a rich precedent for defining and measuring discoveries. Bayesian experimental design provides tools for evaluating hypotheses and experiments.
Note: Berger, James O. Statistical decision theory and Bayesian analysis. Springer Science & Business Media, 2013.https://link.springer.com/book/10.1007/978-1-4757-4286-2
Solomonoff induction, related to Kolmogorov complexity, offers a nice thought experiment for articulating a discovery as a finite program.
Note: Rathmanner, Samuel, and Marcus Hutter. "A philosophical treatise of universal induction." Entropy 13.6 (2011): 1076-1136.https://www.mdpi.com/1099-4300/13/6/1076
If you define discovery as a predictive model, there are many tools (such as Bayesian hierarchical modeling) to compare models. These methods are regularly used in specific fields, although their importance is usually secondary to empirical performance.
These tools are useful in certain constrained environments. For example, if you only sample from a specific distribution (such as observing a single black box function), or if you are faced with a finite set of discoveries that need to be ranked. But when the question becomes: Does a statistic on California bird populations in 1491 count as a new discovery, or merely a restatement of known population dynamics? Or: Does proving a dominant strategy in a competitive card game count as a scientific discovery due to its significance?—such methods either become uncomputable or lose their meaning.
Discussion
When I flip through recent research articles in Science magazine, almost all discoveries are examples of unknown utility that have aroused great interest among other scientists. Take this week (May 2025, Volume 388, Issue 6746) as an example: articles cover the mechanical properties of flowers, the decline of North American bird populations, observing how RNA binds multivalently via cryo-electron microscopy, and the characterization of a new type of tri-metallic cerium catalyst with high selectivity in propane dehydrogenation to propylene. These are all basic discoveries very much from the well of infinite discovery. They may be part of an ongoing path toward better control of RNA or better catalyst design, but their utility is unknown, and their rise to significance stems from the judgment of human scientists based on interest, not their expected utility for humanity.
Let’s compare this with Volume 30 from approximately 100 years ago (1909). That issue included a letter by Karl Pearson titled "The Determination of the Correlation Coefficient" (he invented the correlation coefficient), an article by Lewis (founder of modern acid-base theory), updates on the U.S. campaign against tuberculosis, and a description of a simple interferometer construction. I truly selected this issue at random, yet everything in it has astonishingly high utility. A simple counterargument is: The true value of today’s articles on cerium catalysts and the like will become apparent in 100 years. We certainly cannot prove this today, but the statistics mentioned earlier in this article strongly suggest that this will not be the case generally.
So, how to reconcile these contradictions? I believe that the rate of discovery has not decreased, nor will we run out of discoveries, but the utility of an average discovery is declining. Perhaps this is simply because we still try to fit discoveries into the framework of "one discovery per paper"; perhaps I am just cynical and nihilistic, and every generation of scientists feels that things are in decline.
Artificial Intelligence
As the discovery process becomes automated by robotics and AI, the role of human scientists will move towards that of "science curators." Our role is to decide what is interesting enough to report as a discovery.
I was deeply inspired by Akshay Venkatesh’s lecture on the evolution of the human role in mathematics.
Note: Akshay Venkatesh: (Re)imagining mathematics in a world of reasoning machineshttps://www.youtube.com/watch?v=vYCT7cw0ycw
In my opinion, he correctly captures the future of science. We will become the deciders of where to orient research automation—what questions to ask. Therefore, if you want to build an AI Scientist, make sure you have a way to capture good human taste.
Scientist-Engineer
This really makes me think: Is there a more direct way to measure the "utility" of a discovery? I believe the concept of an AI scientist-engineer (an AI scientist that makes discoveries around engineering outcomes) is a more manageable and steerable problem. For example, identifying a better catalyst for converting carbon dioxide to aviation fuel requires a number of discoveries that can then be ranked according to their progress towards the desired engineering goal.
An AI scientist-engineer in human health is another example. Instead of pursuing any discovery about human biology, the AI would be steered towards curing specific diseases via discoveries. At this point, our evaluation criterion is no longer the subjective human assessment of the discovery itself, but whether they can become a key link in the technical chain leading to a therapeutic.
Reflection
I believe there are two paths forward for automating science.
Path 1: We do our best to sample intelligently from an infinite well of discovery. It is reasonable to build preference models, focus on interesting hypotheses, and judge the success of an AI scientist based on the number of high-impact papers.
Path 2: We choose a concrete goal. A measurable outcome or change in the universe we want to effect. For example, curing diseases, creating new materials, or catalyzing new chemical reactions. Then, we judge our discoveries based on their progress towards the goal. There is not an infinite well of discovery, but instead a set of specific discoveries we need to find.
Notes:
The Nobel Prize delayhttps://arxiv.org/abs/1405.7136; The Nobel Prize time gap.https://www.nature.com/articles/s41599-022-01418-8
Eroom's lawhttps://en.wikipedia.org/wiki/Eroom%27s_law
Park, Michael, Erin Leahey, and Russell J. Funk. "Papers and patents are becoming less disruptive over time." Nature 613.7942 (2023): 138-144.https://www.nature.com/articles/s41586-022-05543-x
Estimating the cost of all discoverieshttps://diffuse.one/p/d1-003
Analysis of 10.6084/m9.figshare.17064419.v3
Bloom, Nicholas, et al. "Are ideas getting harder to find?." American Economic Review 110.4 (2020): 1104-1144.https://doi.org/10.1257/aer.20180338
|