First Ascent by
Timothy Le
Mpro Hunter
Drug Discovery / Virology
The Proposed Route
A focused drug discovery route targeting the SARS-CoV-2 main protease. The climber navigates 2,000 compounds through a fingerprint-based ML pipeline, asking which molecular shapes look like good inhibitors. A Tanimoto similarity anchor to a known inhibitor (Nirmatrelvir) provides a chemical compass for the ascent.
๐ง The Crux
Class imbalance is the crux pitch โ many more inactive compounds than active ones will bias any naive model toward predicting everything as inactive. IC50 values from different labs may be inconsistent. The cross-viral stretch goal (SARS-CoV-1 / MERS) is ambitious for CHEM 169 and should only be attempted after the core pipeline is solid.
โ ๏ธ Pre-Climb Checklist
โ ๏ธ Address class imbalance explicitly โ use class weights or SMOTE, and report PR-AUC not just ROC-AUC. โ Tanimoto similarity baseline is a nice chemical intuition check. โ ๏ธ The cross-viral stretch is a full second project โ scope it as optional only. โ TDCommons dataset is well-curated and accessible.
Guidance
- Don't just report ROC-AUC โ dig into the confusion matrix
- Ask: which molecules get misclassified? Are false negatives structurally similar to true positives?
Source proposal: Timothy_Le_Chem169FinalProposal.pdf
CHEM 169/269 ยท Applied AI & Machine Learning for Biochemistry