First Ascent by
William An K Do
The MD→ML Energy Predictor
Molecular Dynamics / Machine Learning
The Proposed Route
A two-pitch route connecting OpenMM molecular dynamics to machine learning. Pitch 1: run MD simulations on xenon cubes, dipeptides, and small peptides, extracting energy data across simulation frames. Pitch 2: train an ML model to predict peptide energy from structure, using dipeptide simulation data for training and small peptides for testing. The route asks: can we learn a surrogate model for molecular energy?
🧗 The Crux
The ML model must learn meaningful structure-energy relationships from limited dipeptide data. Key challenges: (1) feature engineering — how to represent peptide structure for the model; (2) generalization — will a model trained on dipeptides predict energies for longer peptides? (3) data pipeline — extracting and formatting simulation frames as training examples.
⚠️ Pre-Climb Checklist
✅ OpenMM tested on Colab — simulations run successfully. ✅ ML task now clearly defined: predict energy from structure. ⚠️ Have local Anaconda fallback ready if Colab becomes unstable. ⚠️ Start with simple features (atom counts, bond counts) before attempting 3D coordinates. ⚠️ Track train/test split carefully — dipeptides for training, small peptides for testing.
Guidance
- The simulation→ML pipeline is exactly what this course is about
- Feature choice is critical: start simple (composition, size) then add structural features
- Compare predictions to OpenMM ground truth — report MAE and R²
- Stretch goal comparison (frames vs peptides) is a nice ablation study
- Tripeptide extension tests generalization beyond training distribution
Source proposal: William_AnKDo_CHEM_269_FinalProposal_v2.pdf
CHEM 169/269 · Applied AI & Machine Learning for Biochemistry