Navigate
โ†Back to Gym

First Ascent by

Siddhartha Gupta

Difficulty Grade: TBD by climberApproved

The Disordered Wall

Structural Biology / Protein ML

The Proposed Route

A route that ventures off the well-mapped face of ordered protein domains and into the loose, unpredictable terrain of intrinsically disordered regions. The climber asks whether the handholds that work on solid rock (structure-derived features, pLDDT confidence) become useless on the crumbling IDR face โ€” and whether mutation effect scores follow different distributions in disorder.

๐Ÿง— The Crux

The dataset intersection between ProteinGym and DisProt is unknown โ€” it could be small, messy, or both. The ML task needs sharper definition: is this a classification problem (IDR vs. ordered), a regression (mutation effect score), or a comparative analysis? Without a clear predictive target, the route lacks a summit.

โš ๏ธ Pre-Climb Checklist

โš ๏ธ Identify the ProteinGym ร— DisProt intersection size before committing to the full pipeline โ€” if overlap is <50 proteins, scope down early. โš ๏ธ Define the ML task explicitly: what is being predicted, and with what labels? โœ… Familiarity with Chai/Boltz/AlphaFold is a genuine asset for the stretch goal.

Guidance

  • Nail down before coding: what exactly is the model predicting?
  • Current framing is exploratory โ€” needs a predictive task
  • Formulate one clean supervised task (e.g., predict DMS fitness from mutation + region type)

Source proposal: Siddhartha_Gupta_Research_Proposal_final.pdf

โ† View all First Ascents

CHEM 169/269 ยท Applied AI & Machine Learning for Biochemistry