How unfair is the coin?

Monday. January 06, 2025

I’m reviving my blog after some time away — it’s been an eventful 12 months. In February 2024, Reverie Labs, the startup I co-founded in 2017, was acquired by Ginkgo Bioworks. I’m now on leave from Ginkgo and I’ve joined Y Combinator as a Visiting Partner, giving me the chance to work with the next generation of companies. Especially in this new role, I’ve been thinking a bit about what worked, what didn’t work, and what lessons I can take forward.

We had quite the journey – 6+ years of building at the intersection of AI and drug discovery. We began as a machine learning driven software company selling SaaS tools and consulting services to pharma companies, and at acquisition we were a pharmaceutical company, developing our own in-house pipeline of drug assets and advancing them rapidly using our machine learning technology.

Reverie’s story parallels many other AI driven drug discovery companies from the 2015-2022 (i.e. pre GPT) era. Many of these companies probably started as software companies by computer scientists. My co-founder Jonah and I entered this field because we saw the incredible advancements in machine learning in computer vision and natural language from AlexNet and LSTMs, and wanted to see those advancements applied to human health. With essentially no exceptions, we all became pharmaceutical companies. In other words, the product of the company wasn’t a software suite being sold to a pharma company, but a drug to be licensed or marketed directly to patients. Many companies had (and some still have) intermediate business models where they developed partnerships with pharmaceutical companies in which they advanced the pharma’s programs and took on milestone-based payments. But essentially all of them eventually launched their own wholly owned programs and advanced them.

So why did this happen? Ahead, a few-part reflection on selling to pharma R&D, venture capital in biotech, and the math of unfair coins.

One of the earliest lessons of Reverie — we started to experience this during our early days of YC — is that selling to pharma was a highly counterintuitive process for us outsiders. Our intuition was that pharma was a multi-trillion dollar industry that spends hundreds of billions on cutting-edge R&D, and so this (incorrectly) implied there would certainly be a massive software procurement budget across these companies. In reality, pharma — like other massive industries including video gaming — doesn’t buy much software, at least not compared to the tech companies we had better intuitions about. The vast majority of pharmaceutical companies are using relatively few pieces of externally purchased software, especially for the key tasks of designing compounds. For medicinal chemists, the primary pieces of software are tools to track molecules (Benchling, CDD Vault, Dotmatics, etc), which generally look like a single enterprise contract to a company, or other software to visualize/draw molecules, which usually has a free or cheap alternative. Software to design molecules, like Schrodinger’s suite, is largely sold to computational chemists, who are much fewer in number than medicinal chemists and have relatively low buying power at most companies. In sum, it turns out the image we had in our head that the design process was deeply computationally driven was mostly not true — it was tracked in computers, but most small molecules were designed in chemists’ heads.

As a result, this meant that software for designing molecules had challenging unit economics. Oversimplifying slightly, there were usually two parties that one could sell to in pharma: IT procurement teams that were largely used to paying ~$10,000s to $100,000s for enterprise software licenses for the whole company, or business development (BD) teams that don’t buy software at all and instead in-license drug assets. With this context, it was difficult to pitch the kind of pricing we would want (and frankly need to get a venture return). We wanted to build software that would be worth millions of dollars a year to our customers, but this was difficult since it didn’t fit into either procurement model. Furthermore, the IT procurement teams could not decide themselves what design tools were needed (but they could for other tools like molecular tracking software), because that expertise lived in chemistry teams that don’t themselves procure much software. This created a painfully slow sales cycle in which it is hard to demonstrate value — BD people telling us they don’t buy software, the chemists we wanted to sell to not being empowered to buy software, the IT procurement teams being reliant on other teams to actually decide, and ultimately the budgets being very small.

So, this led to our first lesson: the importance of establishing a bottoms-up total addressable market (TAM) for a software product. In other words, how many people can you sell something to multiplied by how much they are willing to pay. This is in contrast to a top-down TAM calculation, which usually sounds something like “industry A spends $100B on R&D, so if I can make R&D 1% better, I can capture up to $1B in value”. Pharma looks really great from a top-down standpoint. From a bottoms-up standpoint, it might be more like ~10s of companies to sell $100,000s of software to, and ~100s of companies to sell $10,000s of software to. Suddenly, the TAM looks more like double digit millions instead of billions. It’s hard to create a venture-backed return with an eight or nine-figure market.

In light of this, we made the first pivot in our company’s trajectory, and became a “molecules as a service” company, as I recall us stating in our demo day pitch at the end of YC. In other words, rather than selling software to IT procurement teams that don’t want to pay us millions of dollars, we decided to use that software to design molecules and sell those to BD teams that could pay millions to billions.

With this, we entered new arrangements with pharmaceutical companies that, at the time, were very exciting. We pulled off a deal with Roche/Genentech, and it was structured like most pharma partnership deals. Though the exact numbers are confidential, usually the way these look involves an up-front payment to initiate work on specific drug discovery programs and further payments by achieving certain progress milestones. Usually this involves preclinical payments at critical stages of drug development, notably the start of lead optimization and the nomination of a candidate. Then, they involve clinical-stage milestones at the start of Phases 1/2/3 and approval. Generally, the math here looks like $5-25M up front, $5-50M preclinically, and up to 100s of millions once the molecules reach the clinic plus royalties for sales.

Sounds pretty good right? In a sense it is. These types of deals look really good at face value. A company run by two 25-year olds pulls off a deal with big pharma, 10s of millions of dollars are flowing in, and the downstream milestone numbers look big. It looks better than a software business.

But the full story is more challenging in terms of sales, headcount, and costs. First of all, it took 18 months to go from our first promising interaction to having this deal signed. This is a painful sales cycle, and it went cold several times along the way. Once we pulled it off, our company needed to expand headcount rapidly to support the deal. Our Roche deal had 3 drug discovery programs. This meant hiring medical chemists, computational chemists, and data scientists for each program, as well as expanding our core ML team to support our model development for all of them. Challenging, but not impossible, since we raised a $25M Series A and received upfront cash from Roche. Next, we had to actually ramp up our spending on small molecule chemistry to support these programs. That meant paying for significant contract research organization (CRO) teams that synthesized the molecules we designed. The amortized price per molecule we synthesized was ~$2000, plus several thousand in biological assay costs (which measure the properties of each molecule).

These costs are important because even our best AI models were not good enough that we could blindly trust them to synthesize compounds. Using the models was clearly advantageous, but there were a multitude of factors that they could not fully capture that reflect the realities of drug discovery. For example, while they were good at predicting binding and selectivity (hitting your target, not hitting an off target), they were data-limited on effectively predicting the full panel of ADMET properties that matter in drug discovery (these are properties like solubility, permeability, brain penetrance, etc). Notably, this was even true with access to Roche/Genentech’s giant combined ADMET dataset. I also want to make clear that as exciting as the recent models have been in small molecule drug discovery, fundamentally nothing has changed about this dynamic. It’s even easier to predict binding now (through models like Alphafold, Diffdock, etc), but ADMET models have made only marginal improvements. Why does this matter? Because for every one of the ~10 programs we worked on in the life of Reverie, we had multiple potent, selective inhibitors within the first 2 months of starting the programs, and then had to spend months to years optimizing ADMET profiles. It is not crazy to have programs with many picomolar (i.e. highly potent binder) chemical series that then spend half a decade in lead optimization because of ADMET. Good binding models help, but are not a panacea.

With a partnership deal, there can be additional challenges related to information asymmetry and incentive misalignment. Usually, with these types of deals, the big pharma companies bring the targets that they want the AI company to work on. Sometimes the AI company gets the ability to reject ones they don’t want to work on, but there is a massive information asymmetry that makes it hard to do this in practice. The pharma has probably had a startup-sized team study that target for multiple years before deciding to work on it. It’s hard to convince ourselves, let alone the counterparty, that we shouldn’t work on a target unless there is an IP conflict. This asymmetry creates another challenge that I’ve seen anecdotally confirmed by my other drug discovery startup friends: pharmas tend to propose their harder targets. It makes total sense from the pharma’s perspective - if you’re going to work with a third party, it might as well be something you can’t do yourself easily. This further strains the business math for the AI startups for reasons that do not have to do with the startup’s design tools. As an example, if the target biological hypothesis is wrong, no matter how good of an inhibitor the startup designs, it may have little/no impact on the disease model, and this is usually necessary to receive milestone payments.

This ultimately brought us to a critical juncture. As we proceeded with this partnership and began to gauge investor interest in milestones for our eventual Series B, it became clear that most biotech investors valued these programs at ~$0 beyond the reputation signaling that we weren’t total clowns. This seemed strange at first, but made more sense as we understood the market dynamics here. It was essentially assumed that our preclinical milestone payments would be offset by the various costs to achieve those milestones. The real money was in the much larger clinical-stage payments, but almost every aspect of clinical development was out of our hands. We had no control over how the trials would be designed, what the specific indications would be, where it would be run, what the competitive landscape would be, what the drug pricing regulation would be, and most importantly, whether the bigger pharma would choose to proceed with the molecule at all. And it’s worth knowing that pharma companies decide not to proceed with molecules for all sorts of reasons that are not a straightforward mathematical function of measured drug properties, from competitive landscape to pricing signals to trial costs to patient recruitment challenges to whether some VP woke up on the wrong side of the bed.

So, we made the final pivot. Armed with cash, knowledge from running the Roche programs, and a ZIRP-era confidence about capital markets, we hit the gas on our previously fledgling wholly owned discovery programs. The math on these, if they work, looks really good. Potential blockbuster drugs in that era could be out-licensed for $10-100M up front prior to the initiation of a human trial, or even higher if you sell after starting a Phase 1. The TAM was in the trillions, and there was a credible path to making billions. Moreover, the biotech venture market had a well-trodden process of evaluating companies here, involving extensive technical diligence of the target biology and drug assets with their teams of PhD biologists (though notably pretty non-existent diligence of the AI/software tech that we built — more on that later).

We had the team and the technology to execute on these programs, and we made a ton of progress fast. Doing this for our own programs created a forcing function to be particularly careful about evaluating the quality of our models. We knew we only had a few shots on goal. We came up with clever ways to assess out-of-distribution generalization and used active learning methods to efficiently create our own ADMET datasets and models. Since we owned the entire software stack, we built an integrated tech environment rivaling any pure tech startup to arm chemistry teams at every step of the way with ML models, including design, prioritization, synthesis, and assay analysis. I became an ML scientist, data scientist, database engineer, cloud infra engineer, backend developer, computational chemist, front end developer, and then the manager and manager of managers for all of these spaces. It was incredibly fun, deeply fulfilling, and we built the best team I’ve ever worked with and might ever work with. Hopefully I write about our tech at some point in the future, but for now most of this lives on at Ginkgo.

While this sounds great, this was a very challenging business position to be in as a startup. We were, of course, trying to make great progress on our in-house assets. But our goal was to not be a “single asset” company. We wanted to show that we had a technology stack that enabled us to rapidly advance a class of assets repeatedly. We were, after all, technologists and the goal was to create a totally new type of pharma company. These days, this is often called a “platform biotech” or a “techbio” company.

The analogy I use to explain why this is hard involves unfair coins — bear with me. In probability theory, unfair coins don’t flip heads at exactly 50% probability. Drug discovery is like an unfair coin, where heads is a universally convincing “good” outcome like good preclinical animal data in IND-enabling studies and good phase 1 human safety data. The consensus methods of drug discovery create coins that flip heads with ~10% probability. Biotech venture firms are reasonably good at evaluating how good a coin is, and moreover invest in enough coins that a few of them flip heads and produce a return. Techbio/platform companies are in the business of making a better coin mint — i.e. a machine that makes these coins, where the coins flip heads with higher probability. We believed we had built a much better coin mint by building proprietary datasets, cutting-edge ML models, design tools, and an integrated process. Unfortunately, there is not any convincing way to measure how good our coin mint is other than minting a coin and flipping it. And unfortunately, with the funding that most techbio companies have, we had the funds for essentially 1 coin flip.

This phenomenon occurs due to a complex set of factors. As I alluded to earlier, at this stage of the company, most rounds are led by biotech investors and not tech investors, and biotech investors will spend nearly all of their time with a company on diving into the molecular and biological data about a company’s targets, assets, and clinical trial strategy. They will almost exclusively focus on the lead asset. Tech investors will usually tell you to go get a biotech investor to lead or co-lead the round. Biotech investors are very smart people and have historically generated great returns. But, put another way, they are in the business of evaluating and investing in coins, not evaluating coin mints. There are usually ~0 people at the average biotech VC that have a computer science background, let alone an AI background. That said, even if they had great technology backgrounds (and don’t get me wrong, some do) it remains the case that there are no convincing metrics for the quality of the coin mint other than minting the coin and evaluating it. How does one value a drug discovery company that can advance molecules fast but the leaders consistently pick bad targets? Or what if it works great in one disease area and not another? Or on certain types of small molecules and not others? How would you even have the data to show that? Even worse, it may take 10-15 years to flip a coin and generate this data.

Let’s rewind a bit — a reasonable observer could say “sure, but the key problem is you didn’t have the evidence to show that your tech, meaning the coin mint, was actually worth millions of dollars. If the evidence were clear, the buyers would pay for it, whether they be investors or software procurers”. A fair point. So, why is it so hard to prove this? In short, there are no good benchmark eval sets for the modeling tasks that are actually bottlenecks. These would be (1) large, high quality ADMET benchmarks of molecules that look like late-stage lead-like molecules, and (2) a task that looks like “given existing molecules S and data X about them, make the optimal set of next molecules S’ and that exceeds what my human chemists would come up with”. For the latter, it is not totally clear what optimal would even mean. I’d happily co-organize a workshop with people in the techbio community to try to establish a set of benchmarks that we all agree on as valuable. In the status quo, we have an assortment of low quality datasets/benchmarks that live in the public domain, largely advanced by academic labs out of convenience because they are freely available, that essentially everyone I know in industry chuckles at as a proxy for real drug discovery tasks. I can think of nearly 0 instances where our performance on an academic benchmark increased performance on either our internal benchmarks or willingness to use models in practice. I don’t blame academia here — they’ve got to work with what they have here — the onus is on industry to establish the right tasks. This may require developing consortia, sharing data, or acquiring government funding.

So where does this put us? We are in a seemingly exciting moment in AI for drug discovery. I see big valuations, lots of capital flowing, and excitement about a fundamentally enabling new set of technologies through large language models. If you’ve made it this far, you might think I’m skeptical of this moment given the above. Sort of. I absolutely think that there is a fundamentally exciting new technology paradigm that means we should revisit all of our assumptions about what is possible and what isn’t. Workflows relevant to drug discovery that were annoying or impossible in the past like extracting data out of patents might now be easy, organizing historical data that was in a messy form may be possible, and the models themselves allow rich representations of sequential inputs, which is especially relevant to the design of biologics and DNA/RNA. I am very excited about our ability to pick better targets through automated information extraction from literature, clinical data coming from companies like Tempus, or large-scale biological perturbation experiments that companies like Recursion and Ginkgo are doing (as well as therapeutics startups like Noetik, Insitro, Isomorphic, and Xaira). Target discovery, trial design, and recruiting the right patients are probably bigger problems than making drugs against known targets, especially given the biologics and RNA therapeutic modalities that are increasingly popular, and I buy that AI can make a difference in those spaces. For those less familiar, ~90% of drugs fail in humans in the clinic after they succeed in efficacy and safety in lower species preclinically — this is usually some mix of bad translation of the efficacy from rats/dogs/monkeys to humans, toxicity that we can’t model, and recruiting the wrong patients.

But largely due to the unavailability of high quality labeled data about ADMET properties (or for biologics, “developability” data as its called), I don’t currently see a fundamental paradigm change in the dynamics that caused the previous companies to evolve in the way they did, at least for “drug design” companies. Don’t get me wrong, I’d love to be very wrong about this. I’m rooting for companies like Evolutionary Scale and Chai Discovery that are setting out to build software-driven businesses, as well as companies like Cradle.bio that seem to have some promising traction in selling software directly. Maybe one of them will figure out a business model I couldn’t or make a technical breakthrough that changes the business dynamics meaningfully. On the other hand, companies like Insitro, Xaira, and (probably) Isomorphic are capitalized to mint several coins before running out of cash. Perhaps they can actually prove that they built a better coin mint and unlock a better future. Regardless, it’s been motivating seeing the incredible amount of open source science happening in this domain, and I strongly believe that will be a key component of everyone ultimately winning here.

So what’s next for me? At YC, I’m excited to work with companies across a wide variety of verticals. These days, most YC companies are AI companies, and I’m keen to see how some of the lessons I developed apply in other spaces. Of course, I’m also going to be working extra closely with all of YC’s healthcare and biology companies, who are working on some of the highest impact applications of technology.

I do think we are entering a golden era of advancement for humanity. We have a powerful set of tools to unlock productivity, search through the depths of humanity’s existing knowledge, and create new knowledge where it didn’t already exist. I want to see this also create a golden era of advancement for our understanding of science and human health, where the potential to improve people’s lives is incredibly high, and ultimately to deliver better therapies for patients. But, the road to achieve that won’t be easy. Good outcomes usually aren’t easy.

Thanks to Lucy Nam, Jonah Kallenbach, Kristin Tsuo, and Rahul Gupta for reading drafts of this.