Random Walk — Randomness, reconstructed

Warfarin, the world's most prescribed anticoagulant, was first used as rat poison. It was first isolated from a dead cow that hemorrhaged from moldy clover. Three of the 2020's most consequential inventions — GLP-1s, the transformer, and mRNA vaccines — were also results of decades-long, multi-researcher chains of discovery. The origins of these innovations vary, but they share an appearance of randomness. The first discovery in these chains was targeted at a goal entirely separate from its most impactful result (to date). Yet, when we trace these discoveries from the point of origin, we find instead a series of rational, nonrandom discoveries and re-purposing of novel science, that zig and zag into a final form.

The chain of events that led to the commercialization of the GLP-1 began in 1967, nearly half a century before the first GLP-1 was approved for weight loss in 2014. Rosalyn Yalow's radioimmunoassay enabled gut hormone detection, and Joel Habener discovered GLP-1. The discovery of the GLP-1, funded by the NIH and Veterans Administration, emerged from gene sequencing done without a therapeutic target in mind. GLP-1 became a therapeutic target in the 1980s, when it was discovered that they stimulate insulin only when blood sugar is elevated (natural safe mode against hypoglycemia).

In 1990, John Eng from the Bronx VA famously ordered Gila monster venom from a catalog, hypothesizing an animal eating a few times yearly evolved extreme metabolic peptides. Eng isolated exendin-4 (53% homology to GLP-1), patented it himself when the VA declined to patent, and presented his findings as a conference poster. At the 1996 ADA Annual Meeting, Amylin Pharmaceutical's Andrew Young saw this poster and licensed Eng's patent. This turned into an FDA-approved exenatide (Byetta) in April 2005 thanks to an Eli Lilly partnership that rescued trials from bankruptcy. Exenatide marked the entrance of GLP-1 as a drug class, increased the commercial imperative for GLP-1 R&D at pharmaceutical companies, and surfaced a weight loss signal in trials.

Parallel to Amylin, Novo Nordisk began decades of R&D after GLP-1 was validated as a therapeutic concept (1980s). Developed atop the human GLP-1 biology, Novo's internal program was a journey from liraglutide to semaglutide — and onward to Ozempic and Wegovy. Today, it is predicted that 9% of the global population will be on some GLP-1 variant. The downstream effects of this are near-impossible to quantify, as GLP-1 receptors are in trials for addiction and brain disease.

When we consider this series of events, we find contingent ingenuity (contingent in the sense of dependent upon). We know of John Eng's Gila monster because of pathways discovered decades prior, although this does not take away from his brilliance. Same too with Amylin: although they did not win commercially, their success with exenatide improved conditions by which others widely commercialized GLP-1s. For these weightloss superhero drugs, the initial funding source was governmental, and the necessary-but-insufficient pre-breakthrough was in measurement infrastructure (measure small amounts of hormone in blood). Next, GLP-1 was discovered from gene sequencing, and all that follows has been downstream of these critical, seemingly unrelated infrastructure investments. Ozempic required convergence of four independently-directed programs, none of which targeted weight loss, over five decades.

The transformer, the foundation of modern generative AI, was built during an internal research Google project, the goal of which was to improve Google Translate's neural machine translation quality and computational efficiency. The transformer was an essential building block in the large language models that have irrevocably changed the world. The foundation of conventional machine learning was built starting with the 1943 modeling of the neuron, funded by the U.S. Navy. This proto-gradient-descent was the seed for what followed. In the decades that followed, researchers ebbed and flowed with the field, with key breakthroughs (backpropagation, Long Short-Term Memory, AlexNet win, and Bahdanau attention) keeping neural networks alive through AI winters. In 2017, Google Brain researched the question of “how do we make Google Translate faster?” and produced the transformer, an architecture that eliminated recurrence with multi-head self-attention. Its direct ancestor is Bahdanau. Ashish Vaswani and his team successfully solved the business question, but “Attention is All You Need” changed the world. OpenAI, Google DeepMind, and Anthropic all build their AI systems atop the Transformer.

Once again, we see contingent ingenuity, a team focused on a specific problem, with ripple effects that would have been near-impossible to predict at the time of publication. In this instance, the single origin involved public funding, but nearly all major advancements came from free cash flow surplus flowing from the 2010 apex of Big Tech (namely, Google). When we look back at the history of the transformer, one may consider when this technology would have been discovered, save for the internal R&D funds at a private giant. The issue with counterfactuals is that we will never know, but once again directed research yields major surplus.

mRNA vaccines, including the Covid-19 vaccine, were also a result of decades of rejected grants, institutional demotion, and foundational biological research. In this scenario, it is one researcher's stubborn and heterodox view that pushes early mRNA research. Katalin Kariko had a chance encounter with another researcher at the University of Pennsylvania library, and their collaboration was formed. They published a paper in Immunity finding one substitution that prevents immune rejection of synthetic mRNA. This paper was ignored until a stem cell biologist found platform potential in this technology.

The internet was borne of a Cold War-era proposal to chop messages into addressed blocks through a distributed mesh to survive nuclear attack. Penicillin was discovered as mold drifted onto a staph dish; this paper was picked up from a literature search. Viagra initially targeted angina and hypertension. The microwave oven's origins trace to Raytheon mass-producing cavity magnetrons for Allied radar systems during World War II. Teflon came from a refrigerant R&D program at DuPont. We see these apparent random walks in the background of our lives: vulcanized rubber, the cardiac pacemaker, x-rays, saccharin (artificial sweetener), GPS, graphene, kevlar, nylon, stainless steel, minoxidil, lithium-ion batteries, fMRI, and synthetic fertilizer.

There is a randomness mirage occurring when directed research collides with directed research.

If mapped, we could see scientific progress often resembles a random walk through an idea space, ignoring dimensions for the sake of this thought exercise. However, a random walk is defined such that each subsequent step occurs with equal probability, not influenced by the previous step. The apparent randomness we trace in science emerges from many locally-directed, high-intensity paths that collide. These collisions create discontinuous jumps in value, sometimes with extreme lag, and thus the objective is refined. So, there is a randomness mirage occurring when directed research collides with directed research.

In the examples we trace, the initial research target was specific (can we trace small amounts of blood? Can we improve Google Translate? Can we create stronger military materials?). We find the goal itself changes. Thus, the research objective mutates into the state we understand today. When we ask, “why do we have fertilizer today?”, that question assumes such a question could be traced. Though the full answer could better come from: “what came of wartime (WWI) Germany seeking alternatives in their explosives supply chain?” The objective of peacetime feeding arrived later, after effective research was done in a different direction.

These innovations can be the downstream result of public funding, private funding, decades of staccato work, and individual brilliance. If we could quantify it, the optimal outcome is a large multiplier on a small initial seed of public or private funding. In the best case, $181M in initial funding led to $15.8T in measured economic value. This is the case of the successful innovations tracked in this project. However, such an ROI calculation is imperfect, as R&D is an amorphous, hard-to-track blob whose results are often failures, incremental, or interpreted as such for now.

Let's consider the levers in our equation. To improve the likelihood a system of scientific discovery creates these curious, multi-hop innovations, we can adjust the count of journeys upon which researchers embark and increase the likelihood of collisions.

We can increase the quantity of reagents in the system (fund more science, toward the end of a variety of objective functions). We can increase the quantity of hypothesis-testing grants with modest deliverables, as we find many pivotal experiments were cheap.

We can incentivize collision. The current incentives in science discourage collision density. Reproducibility, failure, and anomalies are under-rewarded in research. What might the world look like if we could encourage the sharing of scraps of science on the cutting room floor, within larger and larger groups? The bar could be lowered for publishing intermediate results and partial findings. Researchers could be incentivized if their anomalous finding is utilized by people in other fields. We could reward those who happen to be upstream of a wonderful, apparent random walk.

Then, we can fund cross-pollination directly. Imagine heating up the reaction vat, such that particles collide more. The siloing of conference culture has moved us away from this method of collision. Instead, we could subsidize cross-pollination and paid sabbaticals at institutions in adjacent fields. We can neither predict when nor guarantee that two researchers chat at a party, that a scientist excavates an abandoned paper during literature review. Instead, we can hope to induce the marginal collision — to bring to life a technology that may lay dormant for decades more.