Submission

Deliverable

Submit your exploration answers and extracted data. Your submission might include additional analysis, screenshots, or experiments — that's all great.

At minimum, your submission must have:

Answers to the 5 codebase exploration questions (Exercise 2)
The merged CSV from your data extraction challenge (Exercise 4)
The prompts you used for both tasks

Mission checklist

Explored a real codebase with Claude Code
Answered exploration questions
Extracted and merged data from multiple files
Documented your prompts

Exercise 4: Data Extraction Challenge

Now apply everything to extract real data from messy files.

Option A: Practice dataset (standalone)

Download the practice files: sample_experiment_data.zip

This zip contains 10 JSON files with experiment results in nested format. Your task:

> I have experiment results in /path/to/extracted/folder.
> Each JSON file has: sample_id, metadata.date, results.measurement, results.error
> Please:
> 1. Find all JSON files
> 2. Extract sample_id, date, measurement, and error from each
> 3. Output a merged CSV sorted by date

Option B: AlphaFold3 outputs (if you've done Routes 37A/37B)

If you have AF3 results from the protein-protein interactions routes:

> I have AF3 results in /path/to/extracted/folder.
> Each job folder contains a summary_confidences_0.json file (ignore _1, _2, _3, _4).
> Please:
> 1. Find all the model 0 summary files
> 2. Extract iptm, ptm, and chain_pair_pae_min from each
> 3. Parse the pair ID from the folder name (look for pos_XX or dec_XX)
> 4. For pae_off_diag_avg, average the off-diagonal values [0][1] and [1][0]
> 5. Output a merged CSV with: pair_id, type, iptm, ptm, pae_off_diag_avg

Why this matters

What would take 30+ minutes of manual work becomes a 30-second conversation. The agent:

Finds the right files automatically
Handles nested JSON structures
Parses metadata from filenames
Outputs clean, analysis-ready data

Trust but verify: Spot-check a few rows manually to confirm the extraction worked correctly.

Exercise 3: Data Extraction Patterns

Beyond navigation, agents excel at extracting data from messy file structures.

Pattern 1: Extract from multiple files

> I have JSON files in /path/to/data/. Each file has "sample_id",
> "measurement", and "timestamp". Extract these from all files
> and create a single CSV.

Pattern 2: Parse filenames for metadata

Often the filename itself contains important info:

> The filenames are like "exp_2024_03_15_sample_A_trial_1.json".
> Parse the date, sample name, and trial number from each filename
> and include them as columns in the output.

Pattern 3: Navigate nested structures

> The JSON files have nested structure: results.scores.accuracy
> and results.scores.precision. Extract both values from each file.

Pattern 4: Filter while extracting

> Extract all rows where status is "completed" and score > 0.8.
> Skip any files that don't have a valid score field.

When to use agents for data work

✅ Good for:

Extracting from many files
Parsing messy formats
Format conversions
Quick exploratory analysis

⚠️ Be careful with:

Complex statistical analysis (verify the approach)
Production pipelines (review the code)
Anything where subtle bugs would be hard to notice

Exercise 2: Exploration Challenge

Use Claude Code to answer these questions about your repository:

Structure: "What are the main directories and what does each contain?"
Entry point: "Where does the code start executing? What's the main file?"
Dependencies: "What external libraries does this project use?"
Specific search: "Find all functions that handle [topic relevant to the repo]"
Understanding: "Explain what [specific file] does in simple terms"

Tips for better exploration

Be specific about what you're looking for:

> Find all functions that make HTTP requests
> Which files import the database module?
> Show me the error handling patterns in this codebase

Ask for explanations at the right level:

> Explain this like I'm new to Python
> Give me a technical deep-dive on how the caching works
> Summarize in 2-3 sentences

Record your prompts and answers — you'll submit these.

Exercise 1: Understanding Project Structure

When you open a new codebase, the first question is: "What's going on here?"

Start with the big picture

> Give me an overview of this project. What does it do and how is it organized?

The agent will:

Read the README
Look at the directory structure
Scan key files
Give you a coherent summary

Drill down

> What's in the src/ directory?
> Explain the purpose of each file in src/
> What are the main entry points?

Search for specific things

> Find all Python files that contain the word "database"
> Where is the configuration file?
> Show me all test files

Understand code

> Read src/utils.py and explain what each function does
> How does the authentication flow work?
> What design patterns does this codebase use?

This is dramatically faster than manually opening files and trying to understand them.

Exercise 0: Setup

Prerequisites

Claude Code installed and working — complete R038A first
Git — check with git --version

Clone a test repository

Pick something to explore:

# Option A: A popular Python library
git clone https://github.com/psf/requests
cd requests

# Option B: A web framework
git clone https://github.com/pallets/flask
cd flask

# Option C: Any project you're curious about
git clone https://github.com/[owner]/[repo]
cd [repo]

Start Claude Code

claude

You're now ready to explore.

What you'll learn

This route covers two core "reading" skills:

Codebase navigation — understanding project structure, finding files, searching code
Data extraction — pulling structured data from messy file collections

Both use the same underlying ability: directing the agent to read, interpret, and summarize information.

Intro

Coding agents are exceptionally good at reading — understanding codebases, finding information, and extracting data from messy files.

Instead of manually opening files, grepping for patterns, and piecing together how things work, you can just ask:

"What does this project do and how is it organized?"

"Find all the JSON files and extract the scores into a CSV."

The agent reads files, understands structure, and gives you coherent answers or clean data.

Real-world application: In Routes 37A/37B, you download messy AlphaFold3 outputs with dozens of files per job. Instead of manually hunting for the right JSON files, you can ask the agent: "Find all summary_confidences_0.json files and extract the ipTM scores." What takes 30 minutes manually becomes a 30-second conversation.

Note: These skills work with any coding agent (Cursor, Copilot, Aider) — not just Claude Code.

Route 038B: Reading — Explore and Extract

RouteID: 038B
Wall: AI-Assisted Coding (W11)
Grade: 5.9
Routesetter: Adrian
Time: ~40-50 minutes
You'll need: Claude Code installed (R038A), a repository to explore, practice data files