Navigate
Back to Gym

First Ascent by

William An K Do

Difficulty Grade: TBD by climberApproved

The MD→ML Energy Predictor

Molecular Dynamics / Machine Learning

The Proposed Route

A two-pitch route connecting OpenMM molecular dynamics to machine learning. Pitch 1: run MD simulations on xenon cubes, dipeptides, and small peptides, extracting energy data across simulation frames. Pitch 2: train an ML model to predict peptide energy from structure, using dipeptide simulation data for training and small peptides for testing. The route asks: can we learn a surrogate model for molecular energy?

🧗 The Crux

The ML model must learn meaningful structure-energy relationships from limited dipeptide data. Key challenges: (1) feature engineering — how to represent peptide structure for the model; (2) generalization — will a model trained on dipeptides predict energies for longer peptides? (3) data pipeline — extracting and formatting simulation frames as training examples.

⚠️ Pre-Climb Checklist

✅ OpenMM tested on Colab — simulations run successfully. ✅ ML task now clearly defined: predict energy from structure. ⚠️ Have local Anaconda fallback ready if Colab becomes unstable. ⚠️ Start with simple features (atom counts, bond counts) before attempting 3D coordinates. ⚠️ Track train/test split carefully — dipeptides for training, small peptides for testing.

Guidance

  • The simulation→ML pipeline is exactly what this course is about
  • Feature choice is critical: start simple (composition, size) then add structural features
  • Compare predictions to OpenMM ground truth — report MAE and R²
  • Stretch goal comparison (frames vs peptides) is a nice ablation study
  • Tripeptide extension tests generalization beyond training distribution

Source proposal: William_AnKDo_CHEM_269_FinalProposal_v2.pdf

← View all First Ascents

CHEM 169/269 · Applied AI & Machine Learning for Biochemistry