🎉 Sent!
You made it to the top. Submit your work above!
Submission
Deliverable
Submit your exploration answers and extracted data. Your submission might include additional analysis, screenshots, or experiments — that's all great.
At minimum, your submission must have:
- Answers to the 5 codebase exploration questions (Exercise 2)
- The merged CSV from your data extraction challenge (Exercise 4)
- The prompts you used for both tasks
Mission checklist
- Explored a real codebase with Claude Code
- Answered exploration questions
- Extracted and merged data from multiple files
- Documented your prompts
Exercise 4: Data Extraction Challenge
Now apply everything to extract real data from messy files.
Option A: Practice dataset (standalone)
Download the practice files: sample_experiment_data.zip
This zip contains 10 JSON files with experiment results in nested format. Your task:
> I have experiment results in /path/to/extracted/folder.
> Each JSON file has: sample_id, metadata.date, results.measurement, results.error
> Please:
> 1. Find all JSON files
> 2. Extract sample_id, date, measurement, and error from each
> 3. Output a merged CSV sorted by date
Option B: AlphaFold3 outputs (if you've done Routes 37A/37B)
If you have AF3 results from the protein-protein interactions routes:
> I have AF3 results in /path/to/extracted/folder.
> Each job folder contains a summary_confidences_0.json file (ignore _1, _2, _3, _4).
> Please:
> 1. Find all the model 0 summary files
> 2. Extract iptm, ptm, and chain_pair_pae_min from each
> 3. Parse the pair ID from the folder name (look for pos_XX or dec_XX)
> 4. For pae_off_diag_avg, average the off-diagonal values [0][1] and [1][0]
> 5. Output a merged CSV with: pair_id, type, iptm, ptm, pae_off_diag_avg
Why this matters
What would take 30+ minutes of manual work becomes a 30-second conversation. The agent:
- Finds the right files automatically
- Handles nested JSON structures
- Parses metadata from filenames
- Outputs clean, analysis-ready data
Trust but verify: Spot-check a few rows manually to confirm the extraction worked correctly.
Exercise 3: Data Extraction Patterns
Beyond navigation, agents excel at extracting data from messy file structures.
Pattern 1: Extract from multiple files
> I have JSON files in /path/to/data/. Each file has "sample_id",
> "measurement", and "timestamp". Extract these from all files
> and create a single CSV.
Pattern 2: Parse filenames for metadata
Often the filename itself contains important info:
> The filenames are like "exp_2024_03_15_sample_A_trial_1.json".
> Parse the date, sample name, and trial number from each filename
> and include them as columns in the output.
Pattern 3: Navigate nested structures
> The JSON files have nested structure: results.scores.accuracy
> and results.scores.precision. Extract both values from each file.
Pattern 4: Filter while extracting
> Extract all rows where status is "completed" and score > 0.8.
> Skip any files that don't have a valid score field.
When to use agents for data work
✅ Good for:
- Extracting from many files
- Parsing messy formats
- Format conversions
- Quick exploratory analysis
⚠️ Be careful with:
- Complex statistical analysis (verify the approach)
- Production pipelines (review the code)
- Anything where subtle bugs would be hard to notice
Exercise 2: Exploration Challenge
Use Claude Code to answer these questions about your repository:
- Structure: "What are the main directories and what does each contain?"
- Entry point: "Where does the code start executing? What's the main file?"
- Dependencies: "What external libraries does this project use?"
- Specific search: "Find all functions that handle [topic relevant to the repo]"
- Understanding: "Explain what [specific file] does in simple terms"
Tips for better exploration
Be specific about what you're looking for:
> Find all functions that make HTTP requests
> Which files import the database module?
> Show me the error handling patterns in this codebase
Ask for explanations at the right level:
> Explain this like I'm new to Python
> Give me a technical deep-dive on how the caching works
> Summarize in 2-3 sentences
Record your prompts and answers — you'll submit these.
Exercise 1: Understanding Project Structure
When you open a new codebase, the first question is: "What's going on here?"
Start with the big picture
> Give me an overview of this project. What does it do and how is it organized?
The agent will:
- Read the README
- Look at the directory structure
- Scan key files
- Give you a coherent summary
Drill down
> What's in the src/ directory?
> Explain the purpose of each file in src/
> What are the main entry points?
Search for specific things
> Find all Python files that contain the word "database"
> Where is the configuration file?
> Show me all test files
Understand code
> Read src/utils.py and explain what each function does
> How does the authentication flow work?
> What design patterns does this codebase use?
This is dramatically faster than manually opening files and trying to understand them.
Exercise 0: Setup
Prerequisites
- Claude Code installed and working — complete R038A first
- Git — check with
git --version
Clone a test repository
Pick something to explore:
# Option A: A popular Python library
git clone https://github.com/psf/requests
cd requests
# Option B: A web framework
git clone https://github.com/pallets/flask
cd flask
# Option C: Any project you're curious about
git clone https://github.com/[owner]/[repo]
cd [repo]
Start Claude Code
claude
You're now ready to explore.
What you'll learn
This route covers two core "reading" skills:
- Codebase navigation — understanding project structure, finding files, searching code
- Data extraction — pulling structured data from messy file collections
Both use the same underlying ability: directing the agent to read, interpret, and summarize information.
Intro
Coding agents are exceptionally good at reading — understanding codebases, finding information, and extracting data from messy files.
Instead of manually opening files, grepping for patterns, and piecing together how things work, you can just ask:
"What does this project do and how is it organized?"
"Find all the JSON files and extract the scores into a CSV."
The agent reads files, understands structure, and gives you coherent answers or clean data.
Real-world application: In Routes 37A/37B, you download messy AlphaFold3 outputs with dozens of files per job. Instead of manually hunting for the right JSON files, you can ask the agent: "Find all summary_confidences_0.json files and extract the ipTM scores." What takes 30 minutes manually becomes a 30-second conversation.
Note: These skills work with any coding agent (Cursor, Copilot, Aider) — not just Claude Code.
Route 038B: Reading — Explore and Extract
- RouteID: 038B
- Wall: AI-Assisted Coding (W11)
- Grade: 5.9
- Routesetter: Adrian
- Time: ~40-50 minutes
- You'll need: Claude Code installed (R038A), a repository to explore, practice data files
🧗 Base Camp
Start here and climb your way up!