Translational Review

Artificial Intelligence in Drug Discovery: from target to clinical signal

Machine learning now touches every stage of pharmaceutical research, from target identification to molecular design and trial readiness. This review maps where the methods carry evidentiary weight, where they remain investigational, and why governance that separates what a model may learn from what it may decide is the precondition for moving any computational result toward a patient facing study.

Dr. Julian Borges, MD · June 30, 2026 · 7 minute read

The attrition problem AI is asked to solve

Computational methods are recruited against a pipeline defined by cost, duration, and failure concentrated late in development.

The economics of drug development frame the entire field. A new therapeutic typically requires more than a decade of work and a development cost measured in billions, and the large majority of candidates that enter clinical testing never reach approval. Most failures concentrate in two areas, insufficient efficacy and unanticipated toxicity, both of which reflect incomplete biological understanding at the moment a molecule is selected.

Machine learning enters precisely here, as a family of methods for extracting structure from data that is too large or too high dimensional for manual analysis. A comprehensive survey in Nature Reviews Drug Discovery catalogued applications spanning target validation, prognostic biomarker discovery, and digital pathology, while remaining explicit about the constraints, namely the limited interpretability and repeatability of model generated results and the persistent shortage of systematic high dimensional data (Vamathevan and colleagues, 2019). That tension between demonstrated capability and disciplined validation organises the sections that follow, and it connects directly to my active research program in governed clinical artificial intelligence.

Where machine learning enters drug discovery

A stage by stage view clarifies which tasks are mature and which remain investigational.

It is more accurate to speak of many narrow applications than of one general capability. Each pipeline stage poses a distinct computational task, with its own data regime and its own standard of proof. The table below maps the principal entry points.

Table 1. Machine learning tasks mapped to the drug discovery pipeline
Pipeline stage	Computational task	Representative output
Target identification	Association mining across genomic, transcriptomic, and clinical data	Prioritised, mechanistically plausible target hypotheses
Structure determination	Protein and complex structure prediction	Atomic models for proteins without an experimental structure
Hit generation	Generative chemistry and virtual screening	Novel candidate molecules with predicted activity
Lead optimisation	Property, selectivity, and toxicity prediction	Ranked, synthesisable analogues
Clinical development	Patient stratification and trial enrichment	Biomarker defined subpopulations

Maturity varies sharply across these rows. Structure prediction has become a dependable instrument, whereas clinical stratification remains an area where most claims are still retrospective and await prospective confirmation.

III

Structure prediction and generative molecular design

The most consequential advances reframed structure and design from bottlenecks into routine inputs.

The clearest demonstration that deep learning can deliver atomic accuracy came when AlphaFold predicted three dimensional protein structures from sequence alone at a level competitive with experimental methods, closing a problem that had remained open for half a century (Jumper and colleagues, 2021). The subsequent model extended prediction to the joint structure of proteins together with small molecules, nucleic acids, and ions, the precise interactions that matter for binding and selectivity (Abramson and colleagues, 2024).

Generation followed prediction. By adapting diffusion models to protein backbone geometry, RFdiffusion enabled de novo design of structures and binders, several of which were experimentally validated, including a designed binder whose cryogenic electron microscopy structure matched the computational model almost exactly (Watson and colleagues, 2023). Taken together, these methods shift the limiting factor from the availability of a structure toward the quality of the biological hypothesis being pursued.

Table 2. Representative deep learning methods for structure and design
Method	Year	Capability	Reference
AlphaFold	2021	Atomic accuracy single chain structure prediction from sequence	10.1038/s41586-021-03819-2
RFdiffusion	2023	De novo protein backbone and binder design	10.1038/s41586-023-06415-8
AlphaFold 3	2024	Joint structure of proteins with ligands, nucleic acids, and ions	10.1038/s41586-024-07487-w

From in silico hit to clinical signal

Two programmes mark the path from a computational prediction to evidence in living systems.

The decisive question is whether computational predictions survive contact with biology. An early and instructive case was the identification of halicin, where a neural network trained to predict antibacterial activity surfaced a structurally unconventional antibiotic from a repurposing library, subsequently shown to be active against resistant pathogens in murine models (Stokes and colleagues, 2020). The result was notable because the molecule was chemically distant from known antibiotics, a region that human intuition tends to overlook.

More consequential for human evidence is rentosertib, an inhibitor of TNIK, a target itself nominated by generative methods, developed for idiopathic pulmonary fibrosis. In a randomised, blinded, placebo controlled phase 2a trial (registration NCT05938920), the highest dose arm recorded a mean forced vital capacity change of plus 98.4 ml (95 percent confidence interval 10.9 to 185.9) over 12 weeks, against minus 20.3 ml for placebo, with a tolerability profile comparable across arms (Xu, Ren and colleagues, 2025). The cohort is small and the readout preliminary, yet it is among the first controlled human results for a molecule and target pair originating in generative chemistry.

Table 3. Two translational exemplars and their level of evidence
Programme	Computational modality	Stage of evidence	Source
Halicin	Discriminative screen over chemical libraries	Preclinical, efficacy in murine models	10.1016/j.cell.2020.01.021
Rentosertib	Generative chemistry, generatively nominated target	Randomised phase 2a, forced vital capacity signal	10.1038/s41591-025-03743-2

Governance, failure modes, and decision authority

The reliability of a computational result is inseparable from the controls placed around its use.

The recurring lesson is that predictive performance reported in a paper does not transfer automatically into a sound decision inside a live programme. Several failure modes are now well characterised. Data leakage inflates retrospective accuracy. Distribution shift degrades a model when the chemical or patient space at deployment differs from the training distribution. Automation bias leads teams to over trust a ranked list. And irreproducibility, the constraint emphasised in the Nature Reviews Drug Discovery survey, undermines independent validation (Vamathevan and colleagues, 2019).

A governance posture that addresses these begins by separating learning authority from decision authority. A model may rank, predict, and propose. The decision to synthesise a compound, to dose a participant, or to advance a candidate remains with accountable human review operating behind explicit stage gates, with traceable logs at each transition. This separation is the organising principle of the externally governed learning architecture under development in this research program, and it is what allows a computational pipeline to be audited rather than merely trusted.

Table 4. Failure modes and corresponding governance controls
Failure mode	Mechanism	Governance control
Data leakage	Contamination between training and evaluation data	Temporal holdout validation and provenance logging
Distribution shift	Deployment space differs from the training distribution	Applicability domain checks and continuous monitoring
Automation bias	Uncritical trust in model rankings	Human decision gate with recorded rationale
Irreproducibility	Unstable, undocumented, or unshared pipelines	Versioned artefacts and reproducible execution

None of this slows discovery. It makes the output defensible, which is the only form of speed that matters once a candidate approaches a human study.

Research Context

Connection to the DrugSynthAI research line

This note documents the conceptual foundation of an active pre seed research line. The governance model described here corresponds to United States provisional patent 63/975,551 (Externally Governed Learning Systems), and the generative discovery pipeline corresponds to United States provisional patent 64/018,624 (DrugSynthAI Discovery), a multi agent architecture for de novo molecular design directed at genetic disease, together with the associated USPTO trademark DrugSynthAI in Class 042. The broader publication record is indexed under ORCID 0009-0001-9929-3135. This is pre seed research and development output. It is not a commercial offering and not clinical guidance.

Drug Discovery Generative Chemistry Clinical AI AI Governance Translational Medicine

Dr. Julian Borges, MD is a board certified endocrinologist and clinician scientist, Editor in Chief of an international peer reviewed journal, with a research focus on governed clinical artificial intelligence and genomic medicine. Full credentials are on the About page.

Last updated: June 30, 2026