First Ascent by
Siddhartha Gupta
The Disordered Wall
Structural Biology / Protein ML
The Proposed Route
A route that ventures off the well-mapped face of ordered protein domains and into the loose, unpredictable terrain of intrinsically disordered regions. The climber asks whether the handholds that work on solid rock (structure-derived features, pLDDT confidence) become useless on the crumbling IDR face โ and whether mutation effect scores follow different distributions in disorder.
๐ง The Crux
The dataset intersection between ProteinGym and DisProt is unknown โ it could be small, messy, or both. The ML task needs sharper definition: is this a classification problem (IDR vs. ordered), a regression (mutation effect score), or a comparative analysis? Without a clear predictive target, the route lacks a summit.
โ ๏ธ Pre-Climb Checklist
โ ๏ธ Identify the ProteinGym ร DisProt intersection size before committing to the full pipeline โ if overlap is <50 proteins, scope down early. โ ๏ธ Define the ML task explicitly: what is being predicted, and with what labels? โ Familiarity with Chai/Boltz/AlphaFold is a genuine asset for the stretch goal.
Guidance
- Nail down before coding: what exactly is the model predicting?
- Current framing is exploratory โ needs a predictive task
- Formulate one clean supervised task (e.g., predict DMS fitness from mutation + region type)
Source proposal: Siddhartha_Gupta_Research_Proposal_final.pdf
CHEM 169/269 ยท Applied AI & Machine Learning for Biochemistry