Translational Review

Artificial Intelligence in Drug Discovery: from target to clinical signal

Machine learning now touches every stage of pharmaceutical research, from target identification to molecular design and trial readiness. This review maps where the methods carry evidentiary weight, where they remain investigational, and why governance that separates what a model may learn from what it may decide is the precondition for moving any computational result toward a patient facing study.

I

The attrition problem AI is asked to solve

Computational methods are recruited against a pipeline defined by cost, duration, and failure concentrated late in development.

The economics of drug development frame the entire field. A new therapeutic typically requires more than a decade of work and a development cost measured in billions, and the large majority of candidates that enter clinical testing never reach approval. Most failures concentrate in two areas, insufficient efficacy and unanticipated toxicity, both of which reflect incomplete biological understanding at the moment a molecule is selected.

Machine learning enters precisely here, as a family of methods for extracting structure from data that is too large or too high dimensional for manual analysis. A comprehensive survey in Nature Reviews Drug Discovery catalogued applications spanning target validation, prognostic biomarker discovery, and digital pathology, while remaining explicit about the constraints, namely the limited interpretability and repeatability of model generated results and the persistent shortage of systematic high dimensional data (Vamathevan and colleagues, 2019). That tension between demonstrated capability and disciplined validation organises the sections that follow, and it connects directly to my active research program in governed clinical artificial intelligence.

II

Where machine learning enters drug discovery

A stage by stage view clarifies which tasks are mature and which remain investigational.

It is more accurate to speak of many narrow applications than of one general capability. Each pipeline stage poses a distinct computational task, with its own data regime and its own standard of proof. The table below maps the principal entry points.

Table 1. Machine learning tasks mapped to the drug discovery pipeline
Pipeline stageComputational taskRepresentative output
Target identificationAssociation mining across genomic, transcriptomic, and clinical dataPrioritised, mechanistically plausible target hypotheses
Structure determinationProtein and complex structure predictionAtomic models for proteins without an experimental structure
Hit generationGenerative chemistry and virtual screeningNovel candidate molecules with predicted activity
Lead optimisationProperty, selectivity, and toxicity predictionRanked, synthesisable analogues
Clinical developmentPatient stratification and trial enrichmentBiomarker defined subpopulations

Maturity varies sharply across these rows. Structure prediction has become a dependable instrument, whereas clinical stratification remains an area where most claims are still retrospective and await prospective confirmation.

III

Structure prediction and generative molecular design

The most consequential advances reframed structure and design from bottlenecks into routine inputs.

The clearest demonstration that deep learning can deliver atomic accuracy came when AlphaFold predicted three dimensional protein structures from sequence alone at a level competitive with experimental methods, closing a problem that had remained open for half a century (Jumper and colleagues, 2021). The subsequent model extended prediction to the joint structure of proteins together with small molecules, nucleic acids, and ions, the precise interactions that matter for binding and selectivity (Abramson and colleagues, 2024).

Generation followed prediction. By adapting diffusion models to protein backbone geometry, RFdiffusion enabled de novo design of structures and binders, several of which were experimentally validated, including a designed binder whose cryogenic electron microscopy structure matched the computational model almost exactly (Watson and colleagues, 2023). Taken together, these methods shift the limiting factor from the availability of a structure toward the quality of the biological hypothesis being pursued.

Table 2. Representative deep learning methods for structure and design
MethodYearCapabilityReference
AlphaFold2021Atomic accuracy single chain structure prediction from sequence10.1038/s41586-021-03819-2
RFdiffusion2023De novo protein backbone and binder design10.1038/s41586-023-06415-8
AlphaFold 32024Joint structure of proteins with ligands, nucleic acids, and ions10.1038/s41586-024-07487-w
IV

From in silico hit to clinical signal

Two programmes mark the path from a computational prediction to evidence in living systems.

The decisive question is whether computational predictions survive contact with biology. An early and instructive case was the identification of halicin, where a neural network trained to predict antibacterial activity surfaced a structurally unconventional antibiotic from a repurposing library, subsequently shown to be active against resistant pathogens in murine models (Stokes and colleagues, 2020). The result was notable because the molecule was chemically distant from known antibiotics, a region that human intuition tends to overlook.

More consequential for human evidence is rentosertib, an inhibitor of TNIK, a target itself nominated by generative methods, developed for idiopathic pulmonary fibrosis. In a randomised, blinded, placebo controlled phase 2a trial (registration NCT05938920), the highest dose arm recorded a mean forced vital capacity change of plus 98.4 ml (95 percent confidence interval 10.9 to 185.9) over 12 weeks, against minus 20.3 ml for placebo, with a tolerability profile comparable across arms (Xu, Ren and colleagues, 2025). The cohort is small and the readout preliminary, yet it is among the first controlled human results for a molecule and target pair originating in generative chemistry.

Table 3. Two translational exemplars and their level of evidence
ProgrammeComputational modalityStage of evidenceSource
HalicinDiscriminative screen over chemical librariesPreclinical, efficacy in murine models10.1016/j.cell.2020.01.021
RentosertibGenerative chemistry, generatively nominated targetRandomised phase 2a, forced vital capacity signal10.1038/s41591-025-03743-2
V

Governance, failure modes, and decision authority

The reliability of a computational result is inseparable from the controls placed around its use.

The recurring lesson is that predictive performance reported in a paper does not transfer automatically into a sound decision inside a live programme. Several failure modes are now well characterised. Data leakage inflates retrospective accuracy. Distribution shift degrades a model when the chemical or patient space at deployment differs from the training distribution. Automation bias leads teams to over trust a ranked list. And irreproducibility, the constraint emphasised in the Nature Reviews Drug Discovery survey, undermines independent validation (Vamathevan and colleagues, 2019).

A governance posture that addresses these begins by separating learning authority from decision authority. A model may rank, predict, and propose. The decision to synthesise a compound, to dose a participant, or to advance a candidate remains with accountable human review operating behind explicit stage gates, with traceable logs at each transition. This separation is the organising principle of the externally governed learning architecture under development in this research program, and it is what allows a computational pipeline to be audited rather than merely trusted.

Table 4. Failure modes and corresponding governance controls
Failure modeMechanismGovernance control
Data leakageContamination between training and evaluation dataTemporal holdout validation and provenance logging
Distribution shiftDeployment space differs from the training distributionApplicability domain checks and continuous monitoring
Automation biasUncritical trust in model rankingsHuman decision gate with recorded rationale
IrreproducibilityUnstable, undocumented, or unshared pipelinesVersioned artefacts and reproducible execution

None of this slows discovery. It makes the output defensible, which is the only form of speed that matters once a candidate approaches a human study.

Research Context

Connection to the DrugSynthAI research line

This note documents the conceptual foundation of an active pre seed research line. The governance model described here corresponds to United States provisional patent 63/975,551 (Externally Governed Learning Systems), and the generative discovery pipeline corresponds to United States provisional patent 64/018,624 (DrugSynthAI Discovery), a multi agent architecture for de novo molecular design directed at genetic disease, together with the associated USPTO trademark DrugSynthAI in Class 042. The broader publication record is indexed under ORCID 0009-0001-9929-3135. This is pre seed research and development output. It is not a commercial offering and not clinical guidance.