Navigate
โ†Back to Gym

First Ascent by

Timothy Le

Difficulty Grade: TBD by climberApproved

Mpro Hunter

Drug Discovery / Virology

The Proposed Route

A focused drug discovery route targeting the SARS-CoV-2 main protease. The climber navigates 2,000 compounds through a fingerprint-based ML pipeline, asking which molecular shapes look like good inhibitors. A Tanimoto similarity anchor to a known inhibitor (Nirmatrelvir) provides a chemical compass for the ascent.

๐Ÿง— The Crux

Class imbalance is the crux pitch โ€” many more inactive compounds than active ones will bias any naive model toward predicting everything as inactive. IC50 values from different labs may be inconsistent. The cross-viral stretch goal (SARS-CoV-1 / MERS) is ambitious for CHEM 169 and should only be attempted after the core pipeline is solid.

โš ๏ธ Pre-Climb Checklist

โš ๏ธ Address class imbalance explicitly โ€” use class weights or SMOTE, and report PR-AUC not just ROC-AUC. โœ… Tanimoto similarity baseline is a nice chemical intuition check. โš ๏ธ The cross-viral stretch is a full second project โ€” scope it as optional only. โœ… TDCommons dataset is well-curated and accessible.

Guidance

  • Don't just report ROC-AUC โ€” dig into the confusion matrix
  • Ask: which molecules get misclassified? Are false negatives structurally similar to true positives?

Source proposal: Timothy_Le_Chem169FinalProposal.pdf

โ† View all First Ascents

CHEM 169/269 ยท Applied AI & Machine Learning for Biochemistry