Reimagining Drug Design in the Era of AI #emtechdigital

Daphne Koller is the Founder and CEO, . will integrate cutting-edge machine learning techniques with the ground-breaking innovations that have occurred in life sciences that enable the creation of the large, high-quality data sets. They will collect and use a range of very large data sets to train ML (machine learning) models to help address key problems in the drug discovery and development process.

It is becoming consistently more challenging to develop new therapeutics: clinical trial success rates hover around the mid-single-digit range; the pre-tax R&D cost to develop a new drug (once failures are incorporated) is estimated to be greater than $2.5B; and the rate of return on drug development investment has been decreasing linearly year by year, and some analyzes estimate that it will hit 0% before 2020. One explanation for this phenomenon is that drug development is now intrinsically harder: Many (perhaps most) of the “low-hanging fruit” — druggable targets that have a significant effect on a large population — have been discovered. If so, then the next phase of drug development will need to focus on drugs that are more specialized — whose effects may be context-specific, and which apply only to a subset of patients.

Koller believes that the problem is one of prediction. By reducing the number of failed attempts that are made, then the cost of drug discovery will go down. The $2.5 billion cost of a successful drug has to pay for hundreds of failed attempts to find drugs.

plans to collect and use a range of very large data sets to train ML models that will help address key problems in the drug discovery and development process. To enable the machine learning, they will use high-quality data that has already been collected, but they will also invest heavily in the creation of their own datasets using high throughput experimental approaches, datasets that are designed explicitly with machine learning in mind from the very start. The ML models that are developed will then help guide subsequent experiments, providing a tight, integration of in silico and in vitro methods (an  paradigm).

There are two big challenges. One is data and the other is people.

Datasets for AI need to be very large (100 million to 1+ billion pictures or cases). However, biology data is often not large enough for AI training.

New technology is expanding the amount of biological data. Trends indicate there will be 2 billion genome sequences combined with rich phenotype data by 2025.

Daphne Koller had another recent presentation which is on Youtube.

SOURCES- Insitro, Live reporting by Brian Wang of at EmTech Digital 2019.