π Sent!
You made it to the top. Submit your work above!
Submission
Deliverable
Submit a Colab notebook containing your work for this route. Your notebook will likely include preprocessing, JSON generation for AF3 batch upload, score parsing, and more β that's all great.
At minimum, your notebook must have:
- Your
pairwise_scores.csvloaded and displayed (sorted by ipTM, descending) - Two visualizations comparing positive vs decoy pairs (one for ipTM, one for PAE)
- Answers to the interpretation questions from Exercise 5 (in a markdown cell)
Your CSV should have 12 rows (6 positive + 6 decoy pairs) with columns: pair_id, type, iptm, ptm, pae_off_diag_avg.
Exercise 5: Visualization and Interpretation
Visualizations
Create two plots comparing positive vs decoy pairs:
- ipTM comparison β bar chart or box plot showing ipTM by pair type
- PAE comparison β bar chart or box plot showing PAE (off-diagonal) by pair type
Your chatbot can help with the plotting code (seaborn or matplotlib work great).
Interpretation
In a markdown cell, answer:
- What's the average ipTM for positive pairs vs decoy pairs?
- What's the average PAE (off-diagonal) for positive pairs vs decoy pairs?
- Is there clear separation between the two groups on both metrics?
- Which pair scored best (high ipTM, low PAE)? Which scored worst?
- Based on these results, do ipTM and PAE seem useful for distinguishing real interactions from random pairs?
Exercise 4: Parse All Scores
Now that you have all 12 job outputs, build your pairwise score table.
Your goal: Create a CSV with one row per pair containing:
pair_id(extracted from filename)type(positive or decoy)iptmptmpae_off_diag_avg(average of the off-diagonal values inchain_pair_pae_min)
Starter code β parsing one JSON file:
import json
with open('pos_01_summary.json') as f:
d = json.load(f)
iptm = d['iptm']
ptm = d['ptm']
pae_matrix = d['chain_pair_pae_min']
pae_off_diag_avg = (pae_matrix[0][1] + pae_matrix[1][0]) / 2
Your job: loop through all 12 files and build a table.
π °οΈ Path A (Colab): Upload your summary JSON files to Colab. Write Python code to loop through them, parse each one, and build a DataFrame. Your chatbot can help.
π
±οΈ Path B (Local agent): This is where coding agents really shine. Point Claude Code (or similar) at your extracted folder and ask it to find all summary_confidences_0.json files, extract the scores, and output a merged CSV. Done in seconds β no Colab upload needed.
Save your final table as pairwise_scores.csv.
Exercise 3: Batch Submission
You've submitted 2 jobs by hand. Now let's submit the remaining 10 more efficiently.
AlphaFold Server lets you upload multiple job requests at once using a JSON file. Instead of manually entering sequences one by one, you'll:
- Convert your CSV data into the JSON format AF3 expects
- Upload that JSON file to create all 10 jobs as drafts
- Submit each draft (still requires clicking, but no more copy-pasting sequences)
πΊ Video walkthrough: Coming soon β follow the written steps below (see the batch upload section)
Step 1: Build the JSON file
Write Python code to convert your CSV into a JSON file that AF3 Server accepts.
Resources:
- AlphaFold Server JSON format docs
- The
*_job_request.jsonfrom your manual jobs β use it as a template!
Each job in the JSON needs: a name, two protein chains with sequences, and the required dialect/version fields. Use your chatbot to help write the conversion code.
Important: Exclude the 2 pairs you already submitted manually.
Step 2: Submit each draft
Important: Uploading a JSON file creates drafts, not running jobs. You must submit each draft individually:
- Upload your JSON file to AF3 Server (this creates 10 drafts)
- Click on each draft β Continue β Preview Job β Confirm and Submit
- Repeat for all 10 drafts
- Monitor progress on the AF3 Server dashboard
- Download results when jobs complete
Yes, this is tedious. Click, click, click, click...
Annoyed yet? Route 37C (worth 8 routes of extra credit) challenges you to build a browser automation tool that handles this clicking for you. If the repetition is driving you crazy, channel that frustration into engineering.
Downloading and organizing your outputs
When you select completed jobs on AF3 Server and click Download, you get one big zip containing all selected jobs. Each job folder inside has MSA files (huge), templates, and multiple model seeds.
The problem: These zips are 50-150 MB each β mostly MSA files you don't need.
Choose your path:
π °οΈ Path A: Colab (familiar, guided)
- Download the zip to your local machine
- Extract it (double-click on Mac/Windows)
- You'll see folders like
pw_001_pos_01_.../,pw_002_pos_02_.../, etc. - Inside each folder, find
*_summary_confidences_0.jsonβ ignore_1,_2,_3,_4(we only need model 0) - Copy just those 12 small JSON files (~350 bytes each) to a folder
- Upload that folder to Colab
- Write Python code to parse the filenames and extract scores (your chatbot can help)
π ±οΈ Path B: Local coding agent (powerful, real-world)
Use Claude Code, Cursor, or another AI coding agent to do the extraction locally:
- Download and extract the zip
- Point your agent at the folder and ask it to:
- Find all
*_summary_confidences_0.jsonfiles - Extract
iptm,ptm, andchain_pair_pae_minfrom each - Parse the pair ID from the filename
- Output a merged CSV
- Find all
This is how professionals handle messy data wrangling β the agent does it in seconds.
New to coding agents? Check out the AI-Assisted Coding wall (W11) for setup guides.
Exercise 2: Explore AF3 Outputs
Once your 2 manual jobs complete, download the results and learn what's inside before scaling up.
What's in each job folder?
AF3 produces 5 model seeds (numbered 0β4) per job β these are independent predictions with slightly different results. For this route, we'll use model 0 only to keep things simple.
| File pattern | What it contains | Size |
|---|---|---|
*_summary_confidences_0.json | Key metrics β ipTM, pTM, chain_pair_pae_min | ~350 bytes |
*_full_data_0.json | Detailed data β full PAE matrix, per-residue pLDDT | ~500 KB |
*_model_0.cif | 3D structure file | ~165 KB |
*_job_request.json | Your input (useful as batch template) | ~1 KB |
msas/ folder | Multiple sequence alignments | Huge (MB each) |
templates/ folder | Structural templates | Large |
You only need *_summary_confidences_0.json β ignore the _1, _2, _3, _4 variants and all the other files.
Get the summary files
- Download your completed jobs from AF3 Server (you'll get one zip file)
- Extract it locally β you'll see folders for each job
- Inside each folder, find
*_summary_confidences_0.json(ignore_1,_2, etc. β we only need model 0) - For these 2 test files, you can manually copy them to Colab or explore locally
Find the key metrics
Open each JSON file and locate:
iptmβ the interface predicted TM-score (0β1). Higher = more confident the chains interact.chain_pair_pae_minβ a 2Γ2 matrix of minimum PAE between chains. The off-diagonal values ([0][1]and[1][0]) show inter-chain confidence. Lower = better.
Compare positive vs decoy
Look at your two pairs:
- Does the positive pair have higher ipTM than the decoy?
- Does the positive pair have lower off-diagonal PAE?
This comparison gives you intuition before you scale up to all 12 pairs.
Note: The full PAE matrix lives in
full_data_0.jsonunder thepaekey, if you want to dig deeper later.
Exercise 1: Manual Submission
Submit 2 jobs by hand to learn the AlphaFold Server interface.
πΊ Video walkthrough: Coming soon β follow the written steps below
Steps
- Go to AlphaFold Server
- Pick one positive pair (e.g.,
POS_01) and one decoy pair (e.g.,DEC_01) from your CSV - Grab the sequences from
sequence_aandsequence_bβ either from your Colab notebook or by opening the CSV in Excel - For each pair: create a new job with 2 protein chains and paste the sequences
- Name your jobs clearly using UniProt IDs (e.g.,
POS_01_P9WHU1_P9WHT9) - Submit and wait for results (~5β10 min)
Exercise 0: Setup and Files
You are given one CSV file for this route:
This is your run sheet β a table of protein pairs to submit to AlphaFold3 in pairwise mode. Each row is one AF3 job: two proteins that you'll predict as a complex. Sequences are included β no need to fetch from UniProt.
What are Rv IDs? The rv_a and rv_b columns contain Rv numbers (e.g., Rv2109c). These are systematic gene identifiers for Mycobacterium tuberculosis H37Rv β the reference strain used in TB research. Every gene in the genome gets an "Rv" number based on its position. Think of them like street addresses for Mtb genes.
The file contains 22 pairs split into two categories:
POS_*rows (10 pairs): Known interactors β proteins with strong experimental evidence of physical interaction in the STRING databaseDEC_*rows (12 pairs): Random decoys β proteins with no known interaction, serving as negative controls
For this route, you will submit 12 pairs: POS_01 through POS_06 and DEC_01 through DEC_06. This keeps the workload manageable while still giving you a mix of positives and negatives to compare.
Your goal is to see whether AF3's ipTM scores can distinguish real interactions from background noise.
π How was this dataset built?
The positive pairs come from the STRING database for Mycobacterium tuberculosis (taxid 83332). We selected pairs with combined_score β₯ 900 (STRING scores range 0β1000, so 900+ indicates high-confidence interactions). This gave us ~1,857 unique pairs total, from which we sampled for this exercise.
Decoy pairs are random pairs of proteins that are not expected to interact in real biology. We selected proteins with low experimental evidence (β€100 in STRING) and paired them randomly from the Mtb proteome. These serve as negative controls β if AF3 gives them high scores, that's a false positive.
Dataset curation: Sarah VeskimΓ€gi
What to do right now
- Download
r037A_pairwise_jobs_with_sequences.csvand load it in a Google Colab notebook - Explore the columns β sequences are in
sequence_aandsequence_b - Filter to the 12 pairs you'll submit:
POS_01βPOS_06andDEC_01βDEC_06 - Move on to Exercise 1
Route goal: Build a clear pairwise baseline of ipTM scores. In Route 37B, you will compare this baseline against pooled-AF3 results.
Intro
In this route, you will explore a core biological question:
Which proteins physically interact with each other?
We have talked a lot about proteins as individual molecules, but in real cells most functions come from protein-protein interactions (PPIs):
- enzyme + regulator complexes
- transport and signaling assemblies
- multi-subunit molecular machines
For this exercise, you will use AlphaFold3 as a computational tool to test candidate PPIs in pairwise mode (one pair at a time).
These two routes (37A/37B) are inspired by:
- Todor et al. (2026), Predicting the protein interaction landscape of a free-living bacterium with pooled-AlphaFold3 DOI: 10.1038/s44320-026-00189-7
Concept sketch
Protein A + Protein B --(AF3 prediction)--> predicted complex
|
v
interface confidence (ipTM)
If a pair gets high interface confidence, that is evidence supporting interaction. But pairwise runs can also produce spurious high scores, so this route is intentionally a baseline.
In Route 37B, you will compare this baseline against pooled competition (real pairs mixed with decoys), which is usually more realistic and stricter.
The goal is not to blindly trust one score, but to build scientific judgment under noisy evidence.
Suggested prompt exploration
"I am new to protein-protein interactions. Explain PPIs in simple terms, then explain what AF3 pairwise prediction is measuring, what ipTM means, and why high ipTM does not automatically prove a real biological interaction."
"Teach me the minimum PPI background I need for this route: what true positives and decoys are, why we run both, and what kinds of false positives can appear in pairwise AF3 results."
"I uploaded the pooled-AF3 paper (Todor et al., 2026). Summarize it for a CHEM 169 student, focusing on: (1) pairwise vs pooled strategy, (2) why pooled runs reduce false positives, (3) the size-bias correction formula, and (4) why performance is good but imperfect."
Route 037A: Pairwise AF3 Baseline
- RouteID: 037A
- Wall: Protein-Protein Interactions (W10)
- Grade: 5.11a
- Routesetter: Adrian + Sarah V.
- Time: ~40-60 minutes (+ queue time)
- You'll need: Instructor-provided pair files and AF3 account.
π§ Base Camp
Start here and climb your way up!