π Sent!
You made it to the top. Submit your work above!
Submission
Deliverable
Submit a Colab notebook containing your work for this route. Your notebook will likely include pooled job JSON generation, score extraction, size correction calculations, and more β that's all great.
At minimum, your notebook must have:
- Pooled scores table with size-corrected ipTM for each pair
- Pairwise vs pooled comparison table showing how scores/ranks shift between the two methods
- Heatmap or matrix visualization of pooled interaction scores
- Answers to the interpretation questions from Exercise 5 (in a markdown cell)
You'll need your pairwise_scores.csv from Route 37A for the comparison.
Exercise 5: Visualization and Interpretation
Visualizations
Create these plots in your notebook:
- Heatmap of size-corrected pooled ipTM scores (interaction matrix)
- Comparison plot showing pairwise vs pooled scores for shared pairs (scatter plot or bar chart)
- Distribution plot comparing positive pairs vs background (do positives score higher?)
Interpretation
In a markdown cell, answer:
- Why can pairwise-only runs inflate false positives?
- How does pooling change interaction confidence patterns?
- Why do we apply size correction to ipTM?
- Looking at your results: do the known positive pairs score higher than background under pooled scoring?
- Did any pairs score high in pairwise but drop under pooling? What might that mean?
The Beta:
AF3 metrics (ipTM, minPAE) don't perfectly distinguish true PPIs from non-interactors. Even among known positives, scores span a wide range β some true interactions score low, some non-interactions score high. This is why we teach evaluation under uncertainty, not binary classification. The goal isn't a perfect classifier; it's learning to interpret noisy evidence.
Exercise 4: Evaluate Performance and Imperfection
Using known positives + background pairs:
- Choose one or more thresholds
- Report recovery of known positives
- Report number of called positives total
- Discuss at least one false positive and one missed positive
Important: this is not a perfect classifier. The goal is evaluation and interpretation under noisy scores.
Exercise 3: Pairwise vs Pooled Comparison
Merge scores for pairs observed in both modes.
Required columns:
protein_aprotein_biptm_pairwiseiptm_pooled_observediptm_pooled_correcteddelta_pairwise_vs_pooledrank_pairwiserank_pooled_corrected
Tasks:
- Compute score/rank shifts
- Identify pairs that drop strongly under pooling
- Identify pairs that remain robust under pooling
Exercise 2: Pooled Scores and Size Correction
Larger proteins tend to score higher on ipTM just by chance. To fairly compare interactions, we apply size correction.
The formula (from Todor et al.)
import numpy as np
# Protein lengths (amino acids)
len_a = 350 # example
len_b = 420 # example
# Size correction
expected_iptm = -0.036 + 0.00447 * np.sqrt(len_a + len_b)
size_corrected_iptm = observed_iptm - expected_iptm
Tasks
- Extract observed pooled ipTM for each protein pair
- Look up protein lengths (from the input CSV or UniProt)
- Compute size-corrected scores using the formula above
- Build a corrected score table
Visualization
Create a heatmap showing size-corrected ipTM scores between proteins. This gives you an interaction matrix β bright spots indicate predicted interactions.
Exercise 1: Run Pooled AF3 Jobs
You'll submit up to 10 pooled jobs using instructor-prepared pool files. Each pool contains multiple proteins β AF3 will predict which ones interact.
What's a pooled job?
Unlike pairwise (2 proteins), a pooled job has 5-10 proteins competing for interactions. Real interactors should still find each other, but spurious hits get diluted.
Submitting pools
πΊ Video walkthrough: Coming soon β follow the written steps below
- Use the pool JSON files provided in the input folder
- Upload to AlphaFold Server via batch upload
- Submit each draft (yes, still requires clicking through each one)
Annoyed yet? Route 37C (worth 8 routes of extra credit) challenges you to build a browser automation tool that handles this clicking for you. If the repetition is driving you crazy, channel that frustration into engineering.
Downloading and organizing outputs
Same process as Route 37A:
- Download completed jobs (you'll get a big zip)
- Extract locally
- Find
*_summary_confidences_0.jsonin each job folder (model 0 only) - For pooled jobs, you'll extract pairwise scores between all proteins in the pool
π °οΈ Path A (Colab): Copy summary files to a folder, upload to Colab, parse with Python.
π ±οΈ Path B (Local agent): Point Claude Code at the extracted folder and let it extract/merge the scores.
Exercise 0: Inputs and Dependencies
Required inputs:
- r037B_pooled_jobs_long_with_sequences.csv β protein assignments per pool, sequences included
- r037B_pool_files (folder) β one CSV per pool
- r037B_positive_pair_pool_coverage.csv β which pools contain each positive pair
- r037_positive_pairs_exp_gt800_with_sequences.csv β known positive pairs, sequences included
- Your pairwise baseline scores from Route 37A (
pairwise_scores.csv)
π Where does this data come from?
Same source as Route 37A: positive pairs from the STRING database for Mycobacterium tuberculosis (taxid 83332), filtered to combined_score β₯ 900.
The pooled design places each positive pair into multiple pools alongside decoy proteins. This creates competition β if a pair scores high even when surrounded by decoys, that's stronger evidence than pairwise alone.
Dataset curation: Sarah VeskimΓ€gi
Intro
This route compares two strategies:
- Pairwise AF3 (from Route 37A)
- Pooled AF3 with decoy competition
You will test whether pooled scoring reduces spurious high-confidence calls and how size correction affects interpretation.
The inspiration
These two routes (37A/37B) are inspired by:
- Todor et al. (2026), Predicting the protein interaction landscape of a free-living bacterium with pooled-AlphaFold3 DOI: 10.1038/s44320-026-00189-7
The key insight from Todor et al.: when you run AF3 on a pool of proteins (not just a pair), real interactors still find each other, but spurious pairwise hits get diluted by competition. Pooling also reveals a size bias in ipTM scores β larger proteins tend to score higher β which is why we apply size correction.
Route 037B: Pooled AF3 and Pairwise-vs-Pooled Comparison
- RouteID: 037B
- Wall: Protein-Protein Interactions (W10)
- Grade: 5.11c
- Routesetter: Adrian + Sarah V.
- Time: ~60-90 minutes
- You'll need: Instructor pooled files + lengths table + positive controls + R037A outputs.
π§ Base Camp
Start here and climb your way up!