Navigate
โ†Back to Gym

First Ascent by

Eve Zhang

Difficulty Grade: TBD by climberApproved

The BACE-1 Face

Drug Discovery / Medicinal Chemistry

The Proposed Route

A clean, well-scoped drug discovery problem: predict BACE-1 inhibitor activity from Morgan fingerprints, comparing a logistic regression baseline to a Random Forest. The scaffold-based split stretch goal adds a chemically meaningful generalization test โ€” can the model recognize a new scaffold it has never seen?

๐Ÿง— The Crux

BACE-1 is a well-trodden benchmark โ€” the danger is producing a result that looks good on paper (high ROC-AUC on random split) but doesn't generalize. The scaffold split stretch is where this route gets interesting and difficult. Class imbalance is a secondary crux.

โš ๏ธ Pre-Climb Checklist

โœ… Dataset is publicly available and well-curated (MoleculeNet). โœ… Pipeline is well-scoped for CHEM 169. โš ๏ธ Random split ROC-AUC will likely be high โ€” don't stop there. โš ๏ธ If doing scaffold split, use RDKit's MurckoScaffold decomposition and report the performance drop explicitly.

Guidance

  • Scaffold-based split is the most interesting part โ€” prioritize it
  • Gap between random-split AUC and scaffold-split AUC = memorization vs generalization
  • That gap is a finding worth reporting โ€” lead with it in the writeup

Source proposal: Zhang_Eve_CHEM169_Final_Project_Proposal.pdf

โ† View all First Ascents

CHEM 169/269 ยท Applied AI & Machine Learning for Biochemistry