π§ Start Here
Scroll down to complete this route
Route 005: The Gear Shop (Imports and Installs)
- RouteID: 005
- Wall: Getting Comfortable with Python
- Grade: 5.7
- Routesetter: Srikar + Adrian
- Time: ~30-45 minutes
Why this route exists
Python comes with a "standard backpack" of tools, but for specialized science (like simulating proteins or training a machine learning model) you need to explicitly fetch some extra gear.
In Python, this gear comes in Libraries (or Packages).
- Standard Gear (Built-in): random, math, datetime. (You have these already).
- Pro Gear (Must Install): RDKit (Chemistry), Pandas (Data), Biopython.
You don't write complex code from scratch. You go to the shop (PyPI), buy the gear (pip install), and put it in your backpack (import) to start using it.
Youβll learn that you can either put in your backpack a whole other backpack (collection) of tools (ie. an entire module, or sub-module, or sub-sub-module), or you can also be very specific about exactly what tool (i.e. function) you need, and only fetch that one.
What you'll build
You will set up a complete Cheminformatics environment.
- Simulate a DNA mutation using a built-in tool.
- Crash your code on purpose (to recognize a missing library).
- Install the RDKit chemistry library.
- Visualize a molecule of Caffeine to prove your gear works.
Exercise 0: The Mutation Simulator (Built-in Gear)
Goal: Use the random library to simulate a real mutation event (where the base actually changes).
Concept: What is a Library? A library is a toolbox of pre-built functions. Instead of writing a random number generator from scratch (math heavy!), you import random and use the tools inside.
The Beta (Tools you need):
import random: Unlocks the toolbox.random.randint(a, b): Returns a random integer betweenaandb. Use this to pick an index.- Example:
index = random.randint(0, 10)
- Example:
random.choice(list): Returns a random item from a list. Use this to pick a base.- String slicing/f-strings: Remember
seq[:pos] + new_char + seq[pos+1:]?
Do: Write a script that does the following steps (convert this pseudocode to Python):
- Import the
randomlibrary. - Define a DNA string
dna_seq = "ATGCGTACGTTAGC". - Use
random.randint(...)to programmatically pick a random indexpos.- * Hint: The range must be from
0tolen(dna_seq)-1, because **Python starts counting from zero, not from one!**
- * Hint: The range must be from
- Identify the
original_baseat that position. - Pick a
new_basefrom["A", "C", "T", "G"].- Crux: Make sure
new_baseis NOT the same asoriginal_base. (Hint: You can use awhileloop: "While new is same as old, pick again").
- Crux: Make sure
- Construct the
mutated_seq. - Print three things:
- The Original Sequence.
- The Mutated Sequence.
- The summary: "Mutation at index X: [Old] -> [New]"
Belay Check: Run your cell 5 times. Ensure you never see a "mutation" like A β A.
Exercise 1: The Crash (Controlled Fall)
Goal: Try to use a tool that you don't have yet. In this course weβll want to draw, analyze, manipulate, etc. small molecules. The standard Python tool or library for this is RDKit. But Colab doesn't come with RDKit pre-installed.
Do: Run this code:
import rdkit
The Result: β You should see a red error along the lines of: ModuleNotFoundError: No module named 'rdkit'
Stop & Look: Memorize this error. You will likely run into it in the future. It doesn't mean your code is wrong. It means you need to go to the shop. You cannot import what you haven't installed.
Exercise 2: The Purchase (pip install)
Goal: Fix the error by installing the library.
The Beta:
- pip: The Python package manager (the shop clerk).
- ! (The Exclamation Point): In Google Colab, putting ! before a command tells the computer "This is a system command, not Python code." We use it to install software.
- -q: "Quiet mode." Keeps the output clean.
Do:
- Run this cell to install RDKit:
print("Installing RDKit... this takes a few seconds.")
!pip install -q rdkit
print("Install complete!")
- (Wait for the green checkmark/completion).
- The Belay Check: Try the import again.
import rdkit
print(f"RDKit version: {rdkit.__version__}")
- Success: No error. It prints a version number (e.g., '2023.03.1').
Exercise 3: The Mansion (Submodules & Wayfinding)
Goal: Learn how to find a specific tool inside a massive library when the obvious way fails.
The Trap: You installed rdkit. It is in your backpack. But if you try to use the "Make Molecule" function directly from the front door, it fails.
Do:
- Run this code:
# Try to use the tool directly
mol = rdkit.MolFromSmiles("C")
You should see this: Result: AttributeError: module 'rdkit' has no attribute 'MolFromSmiles'
The Beta: How do I find the tool? You might be thinking: "How on earth was I supposed to know where that function lives?" Some libraries are like mansions. rdkit is just the lobby. The actual tools are hidden in specific rooms (submodules).
In 2026, we don't memorize the floor plan. We use two strategies:
- The "Flashlight" (Tab Autocomplete):
- In Colab, type
rdkit.and wait (or hit Tab). A list pops up. - Type
rdkit.Chand hit Tab. You'll seerdkit.Chem. That looks promising.
- In Colab, type
- The "Guide" (Ask the AI):
- This is the fastest way. Click the Gemini/Chat button in your notebook and ask:
- "How do I create a molecule from a SMILES string using rdkit in Python?"
- The AI will instantly tell you: "You need to import
Chemfromrdkit."
Do (The Fix): Use the hint above to fix your code.
- Import the
Chemsubmodule fromrdkit(from rdkit import Chem). - Use
mol = Chem.MolFromSmiles("CC(=O)O")to create an acetic acid molecule (Thatβs called a SMILES string. It represents a compound). - Use
display(mol)to prove it worked.
Belay Check:
- Takeaway: If you get an
AttributeErroron a library you know you installed, you are probably in the lobby, not the kitchen. Ask the AI where the tool lives.
Exercise 3a: The Mansion (Entering the Room)
Goal: Understand that big libraries have "rooms" (submodules) and you have to enter them to find tools.
The Trap: You installed rdkit. But if you try to use it from the front door, it fails.
import rdkit
mol = rdkit.MolFromSmiles("C") # Error!
Result: AttributeError. The main lobby doesn't have the chemistry tools.
The Beta (Structure): RDKit is a mansion.
rdkitis the Estate.rdkit.Chemis the Kitchen (where the chemistry happens).rdkit.Chem.Drawis the Art Studio (where visualization happens).
Do (Enter the Room): To fix the error, you need to import the Kitchen specifically.
- Run:
from rdkit import Chem - Now use the tool inside the kitchen:
mol = Chem.MolFromSmiles("C") - Print
type(mol)to prove it worked.
Exercise 3b: The Tool Belt (Precision Imports)
Goal: Learn to reach deeper into the library to grab specific tools.
The Concept: Sometimes the "Room" (Chem) is still too big. You don't want to drag the whole kitchen around just to use the Blender. You can import specific tools from inside a room.
The Syntax:
- Level 1 (Whole Estate):
import rdkit(Too vague). - Level 2 (The Room):
from rdkit import Chem(Better). - Level 3 (The Specific Tool):
from rdkit.Chem import Descriptors(Best for specific tasks).
The "Dot" Logic: When you write from rdkit.Chem, the dot means "inside."
rdkit.Chemmeans: "Go tordkit, then go insideChem."
Do: We need the Math Lab (Descriptors) and the Sketchpad (Draw) for the next challenge.
- Import the specific tools using the dot syntax:
from rdkit.Chem import Descriptors
from rdkit.Chem import Draw
- Belay Check: Verify you grabbed them
print(Descriptors)
print(Draw)
(You should see output saying <module 'rdkit.Chem.Descriptors' ...>).
Exercise 4: The Send (Analysis)
Goal: Combine all your imports to Scrape, Draw, and Analyze a molecule.
Step 1: The Scavenger Hunt
- Go to PubChem.
- Find a molecule (e.g., Capsaicin, Penicillin).
- Copy its SMILES string.
Step 2: The Code You have your Tool Belt loaded from Ex 3b. Now use it.
Do: Write a script that:
- Uses
Chem.MolFromSmiles("YOUR_STRING")to create the molecule. Assign it to a variable called βmolβ - Uses
Descriptors.MolWt(mol)to calculate its molecular weight. - Uses
Draw.MolToImage(mol)to show the picture.
Step 3: The Bonus Move (Explore the Gear) You now have the Descriptors tool belt. What else is in there?
- In your code cell, type
Descriptors.(don't forget the dot) and hit Tab. - A list of available calculations should pop up.
- Pick one that sounds cool (e.g.,
HeavyAtomCount,NumValenceElectrons,TPSA). - Add a line to your code to calculate and print this new property.
The Beta (Docs):
- Hint: Most descriptor functions work just like
MolWtβthey just take(mol)as input. - Example:
n_atoms = Descriptors.HeavyAtomCount(mol)
Belay Check:
- Did you get the image?
- Did you get the mass?
- Did you successfully calculate a third "mystery" property?
Deliverables:
Please submit the following two items:
1. A completed Jupyter notebook (.ipynb)
- The notebook should run top-to-bottom without errors.
- It should include your code and any brief comments you added while working.
- Please follow this file naming convention β lastname_firstname_RID_005_code.ipynb
- The RID stands for "Route ID". This would be route #005.
How to download from Google Colab:
- In Colab, click File β Download β Download .ipynb
- This will save the notebook to your computer.
2. A short logbook entry (plain text, ~5-10 sentences):
- Briefly describe:
- what was tricky or confusing
- what helped you get unstuck
- one thing you learned about working with real data
- File naming convention β lastname_firstname_RID_005_logbook.txt
- Focus on clarity and completeness.
Submission
Submit your files by uploading them to this Google form: SUBMIT LINK
Please upload both:
- your .ipynb notebook
- your logbook file
Make sure filenames follow the naming conventions above.
We will fine-tune our submission system as the course moves along. Thank you for your patience as a valued member of the CHEM 169/269 Climbing Gym.
π Route Complete!
Great work!