π Sent!
You made it to the top. Submit your work above!
Submission
Deliverables
Submit your completed notebook (.ipynb) with:
- Automated pipeline script with at least 3-5 cycles (Exercise 1)
- ipTM and pLDDT plots across all cycles (Exercise 1)
- Final metrics table (Exercise 2)
- Amino acid composition shift plot (Exercise 2)
- Reflection answers (Exercise 2)
File naming: lastname_firstname_R025.ipynb
Exercise 2: Summit Log and Reflection
Goal: Tie it all together. Present your results and reflect on what you built.
A. Final Metrics Table
Fill in your complete results:
| Cycle | Model Used | ipTM | pTM | pLDDT (B) | Sequence Changed? |
|---|---|---|---|---|---|
| 0 | Initial (poly-G) | N/A | |||
| 1 | MPNN β Boltz | Yes | |||
| 2 | SolMPNN β Boltz | Yes | |||
| 3 | ... | Yes | |||
| 4 | ... | Yes | |||
| 5 | ... | Yes |
B. Amino Acid Composition Shift
Create your own plot showing how amino acid composition changed across cycles.
Minimum requirements:
- Include at least cycle 0, cycle 1, and your final cycle
- Normalize by sequence length (percent or fraction)
- Label axes and cycle names clearly
- Save the figure as
aa_composition_shift.png
You may use any plotting style (stacked bars, grouped bars, heatmap, etc.) as long as it is readable.
C. Reflection Questions
Answer in your notebook (~10β15 sentences total):
-
How did the automated pipeline compare to the manual process? What was easier/harder?
-
Did metrics improve monotonically, or did you see fluctuations? What might explain the pattern?
-
What was the biggest engineering challenge in automating the loop?
-
If you had more compute time, what would you change about your pipeline?
-
Protein Hunter uses the phrase "diffusion hallucination." In your own words, what does it mean for a model to "hallucinate" a protein structure?
-
How does this connect to what you learned in earlier routes (functions, data manipulation, pandas)?
Success check:
- Final metrics table is complete
- Amino acid composition plot shows changes across cycles
- Reflection questions are answered
Exercise 1: Build the Automated Pipeline
Goal: Build a fully automated Protein Hunterβstyle pipeline using open-source tools on free GPU compute.
The Big Picture
In the manual route, you did:
poly-G β AF3 β ProteinMPNN β Boltz-2 β SolubleMPNN β AF3 β ...
Now automate it. Your pipeline should:
- Start from a poly-G binder sequence of a chosen length
- Predict the complex structure using an open-source diffusion model
- Redesign the binder sequence using ProteinMPNN/SolubleMPNN
- Re-predict the structure with the new sequence
- Repeat for N cycles (aim for 5)
- Log ipTM, pLDDT, and sequence at each cycle
- Visualize the improvement trajectory
Important: For the first prediction step, you must use a diffusion-based structure prediction model. Non-diffusion models like ESMFold lack the diffusion denoising process that enables hallucination from placeholder tokens.
Step 1: Choose Your Platform
| Platform | Free GPU | Time Limit | Best For |
|---|---|---|---|
| Google Colab | T4 (15 GB) | ~12 hrs/day | Quick prototyping |
| Kaggle | P100 / T4Γ2 (16 GB) | 30 hrs/week | Longer runs |
| Lightning AI | Various | 22 hrs/month free | Studio environment |
Connect to a GPU in Colab: Go to Runtime β Change runtime type β Hardware accelerator β GPU (T4)
Step 2: Open-Source Diffusion-Based Predictors
| Model | Install | Notes |
|---|---|---|
| Boltz-1 | pip install boltz β GitHub | MIT license; open-source |
| OpenFold 3 | GitHub | Apache 2.0; AF3 reproduction |
| Chai-1 | GitHub | Academic license |
Step 3: The Pipeline Script
Build this yourself from scratch. Do not copy full scripts from the route page.
Required structure (pseudocode level):
initialize poly-G binder
for each cycle:
predict structure for target + current binder (diffusion model)
redesign binder sequence from predicted structure (MPNN/SolMPNN)
choose next binder sequence
log cycle metrics (ipTM, pTM, pLDDT, sequence)
generate summary plots + final table
Your code must include:
- A reusable function for each stage (prediction, redesign, logging)
- A loop that runs at least 3 cycles (target 5)
- A persistent log written to disk (
.csvor.json) - Clear error handling for at least one failure mode (missing file, timeout, OOM, etc.)
Step 4: Implementation Tips
Refer to official docs for installation and command syntax:
Recommended workflow:
- Ask your chatbot to generate a first draft pipeline from your own pseudocode
- Run one cycle first and debug it
- Scale to 3-5 cycles only after single-cycle run works
- Keep a short debugging log in your notebook
Memory issues? HSA is a large protein (609 aa). If you run into out-of-memory errors:
- Reduce
BINDER_LENGTHto 70β80 residues- Use ESMFold for intermediate validation (faster, less memory)
- Split the work across sessions
Success check:
- I automated the structure β sequence β structure loop
- I ran at least 3β5 full cycles
- I plotted ipTM and pLDDT across all cycles
- I documented my metric trends (improving, flat, or fluctuating)
References
-
Cho, Y., Rangel, G., Bhardwaj, G., & Ovchinnikov, S. (2025). Protein Hunter. bioRxiv. https://doi.org/10.1101/2025.10.10.681530
-
Dauparas, J. et al. (2022). ProteinMPNN. Science, 378(6615), 49β56. https://doi.org/10.1126/science.add2187
-
ProteinMPNN code: https://github.com/dauparas/ProteinMPNN
-
Boltz-1 code: https://github.com/jwohlwend/boltz
-
Protein Hunter code: https://github.com/yehlincho/Protein-Hunter
Why this route exists
In The Hallucination Ascent (Manual), you ran each step of the Protein Hunter pipeline by hand β AlphaFold Server, ProteinMPNN, Boltz-2, SolubleMPNN. You understand what each tool does and why the cycling works.
Now it's time to automate. Real protein design campaigns don't stop at 2-3 manual cycles. They run dozens or hundreds of iterations, testing different starting points, temperatures, and redesign strategies. That requires automation.
This route is about engineering: taking the conceptual loop you understand and turning it into code that runs end-to-end without manual intervention.
By the end, you can:
- Build an automated structure β sequence cycling pipeline
- Use open-source structure prediction tools (Boltz-1, OpenFold 3)
- Run ProteinMPNN programmatically from Python
- Log and visualize design metrics across many cycles
- Debug GPU memory and API rate-limit issues
Route: The Hallucination Ascent (Automated)
- RouteID: R025
- Wall: Protein Design (W07)
- Grade: 5.12a
- Routesetter: Abhiram
- Time: ~2-3 hours
- Prerequisites: The Hallucination Ascent (Manual)
This is a project-level route. You will write real automation code, deal with GPU memory limits, and debug API calls. Multi-session work is expected.
π§ Base Camp
Start here and climb your way up!