CHEM 169/269 Climbing Gym

Route 005: The Gear Shop (Imports and Installs)

RouteID: 005
Wall: Getting Comfortable with Python
Grade: 5.7
Routesetter: Srikar + Adrian
Time: ~30-45 minutes

Why this route exists

Python comes with a "standard backpack" of tools, but for specialized science (like simulating proteins or training a machine learning model) you need to explicitly fetch some extra gear.

In Python, this gear comes in Libraries (or Packages).

Standard Gear (Built-in): random, math, datetime. (You have these already).
Pro Gear (Must Install): RDKit (Chemistry), Pandas (Data), Biopython.

You don't write complex code from scratch. You go to the shop (PyPI), buy the gear (pip install), and put it in your backpack (import) to start using it.

You’ll learn that you can either put in your backpack a whole other backpack (collection) of tools (ie. an entire module, or sub-module, or sub-sub-module), or you can also be very specific about exactly what tool (i.e. function) you need, and only fetch that one.

What you'll build

You will set up a complete Cheminformatics environment.

Simulate a DNA mutation using a built-in tool.
Crash your code on purpose (to recognize a missing library).
Install the RDKit chemistry library.
Visualize a molecule of Caffeine to prove your gear works.

Exercise 0: The Mutation Simulator (Built-in Gear)

Goal: Use the random library to simulate a real mutation event (where the base actually changes).

Concept: What is a Library? A library is a toolbox of pre-built functions. Instead of writing a random number generator from scratch (math heavy!), you import random and use the tools inside.

The Beta (Tools you need):

import random: Unlocks the toolbox.
random.randint(a, b): Returns a random integer between a and b. Use this to pick an index.
- Example: index = random.randint(0, 10)
random.choice(list): Returns a random item from a list. Use this to pick a base.
String slicing/f-strings: Remember seq[:pos] + new_char + seq[pos+1:]?

Do: Write a script that does the following steps (convert this pseudocode to Python):

Import the random library.
Define a DNA string dna_seq = "ATGCGTACGTTAGC".
Use random.randint(...) to programmatically pick a random index pos.
- * Hint: The range must be from 0 to len(dna_seq)-1, because **Python starts counting from zero, not from one!**
Identify the original_base at that position.
Pick a new_base from ["A", "C", "T", "G"].
- Crux: Make sure new_base is NOT the same as original_base. (Hint: You can use a while loop: "While new is same as old, pick again").
Construct the mutated_seq.
Print three things:
- The Original Sequence.
- The Mutated Sequence.
- The summary: "Mutation at index X: [Old] -> [New]"

Belay Check: Run your cell 5 times. Ensure you never see a "mutation" like A → A.

Exercise 1: The Crash (Controlled Fall)

Goal: Try to use a tool that you don't have yet. In this course we’ll want to draw, analyze, manipulate, etc. small molecules. The standard Python tool or library for this is RDKit. But Colab doesn't come with RDKit pre-installed.

Do: Run this code:

import rdkit

The Result: → You should see a red error along the lines of: ModuleNotFoundError: No module named 'rdkit'

Stop & Look: Memorize this error. You will likely run into it in the future. It doesn't mean your code is wrong. It means you need to go to the shop. You cannot import what you haven't installed.

Exercise 2: The Purchase (pip install)

Goal: Fix the error by installing the library.

The Beta:

pip: The Python package manager (the shop clerk).
! (The Exclamation Point): In Google Colab, putting ! before a command tells the computer "This is a system command, not Python code." We use it to install software.
-q: "Quiet mode." Keeps the output clean.

Do:

Run this cell to install RDKit:

print("Installing RDKit... this takes a few seconds.")
!pip install -q rdkit
print("Install complete!")

(Wait for the green checkmark/completion).
The Belay Check: Try the import again.

import rdkit
print(f"RDKit version: {rdkit.__version__}")

Success: No error. It prints a version number (e.g., '2023.03.1').

Exercise 3: The Mansion (Submodules & Wayfinding)

Goal: Learn how to find a specific tool inside a massive library when the obvious way fails.

The Trap: You installed rdkit. It is in your backpack. But if you try to use the "Make Molecule" function directly from the front door, it fails.

Do:

Run this code:

# Try to use the tool directly
mol = rdkit.MolFromSmiles("C")

You should see this: Result: AttributeError: module 'rdkit' has no attribute 'MolFromSmiles'

The Beta: How do I find the tool? You might be thinking: "How on earth was I supposed to know where that function lives?" Some libraries are like mansions. rdkit is just the lobby. The actual tools are hidden in specific rooms (submodules).

In 2026, we don't memorize the floor plan. We use two strategies:

The "Flashlight" (Tab Autocomplete):
- In Colab, type rdkit. and wait (or hit Tab). A list pops up.
- Type rdkit.Ch and hit Tab. You'll see rdkit.Chem. That looks promising.
The "Guide" (Ask the AI):
- This is the fastest way. Click the Gemini/Chat button in your notebook and ask:
- "How do I create a molecule from a SMILES string using rdkit in Python?"
- The AI will instantly tell you: "You need to import Chem from rdkit."

Do (The Fix): Use the hint above to fix your code.

Import the Chem submodule from rdkit (from rdkit import Chem).
Use mol = Chem.MolFromSmiles("CC(=O)O") to create an acetic acid molecule (That’s called a SMILES string. It represents a compound).
Use display(mol) to prove it worked.

Belay Check:

Takeaway: If you get an AttributeError on a library you know you installed, you are probably in the lobby, not the kitchen. Ask the AI where the tool lives.

Exercise 3a: The Mansion (Entering the Room)

Goal: Understand that big libraries have "rooms" (submodules) and you have to enter them to find tools.

The Trap: You installed rdkit. But if you try to use it from the front door, it fails.

import rdkit
mol = rdkit.MolFromSmiles("C")  # Error!

Result: AttributeError. The main lobby doesn't have the chemistry tools.

The Beta (Structure): RDKit is a mansion.

rdkit is the Estate.
rdkit.Chem is the Kitchen (where the chemistry happens).
rdkit.Chem.Draw is the Art Studio (where visualization happens).

Do (Enter the Room): To fix the error, you need to import the Kitchen specifically.

Run: from rdkit import Chem
Now use the tool inside the kitchen: mol = Chem.MolFromSmiles("C")
Print type(mol) to prove it worked.

Exercise 3b: The Tool Belt (Precision Imports)

Goal: Learn to reach deeper into the library to grab specific tools.

The Concept: Sometimes the "Room" (Chem) is still too big. You don't want to drag the whole kitchen around just to use the Blender. You can import specific tools from inside a room.

The Syntax:

Level 1 (Whole Estate): import rdkit (Too vague).
Level 2 (The Room): from rdkit import Chem (Better).
Level 3 (The Specific Tool): from rdkit.Chem import Descriptors (Best for specific tasks).

The "Dot" Logic: When you write from rdkit.Chem, the dot means "inside."

rdkit.Chem means: "Go to rdkit, then go inside Chem."

Do: We need the Math Lab (Descriptors) and the Sketchpad (Draw) for the next challenge.

Import the specific tools using the dot syntax:

from rdkit.Chem import Descriptors
from rdkit.Chem import Draw

Belay Check: Verify you grabbed them

print(Descriptors)
print(Draw)

(You should see output saying <module 'rdkit.Chem.Descriptors' ...>).

Exercise 4: The Send (Analysis)

Goal: Combine all your imports to Scrape, Draw, and Analyze a molecule.

Step 1: The Scavenger Hunt

Go to PubChem.
Find a molecule (e.g., Capsaicin, Penicillin).
Copy its SMILES string.

Step 2: The Code You have your Tool Belt loaded from Ex 3b. Now use it.

Do: Write a script that:

Uses Chem.MolFromSmiles("YOUR_STRING") to create the molecule. Assign it to a variable called “mol”
Uses Descriptors.MolWt(mol) to calculate its molecular weight.
Uses Draw.MolToImage(mol) to show the picture.

Step 3: The Bonus Move (Explore the Gear) You now have the Descriptors tool belt. What else is in there?

In your code cell, type Descriptors. (don't forget the dot) and hit Tab.
A list of available calculations should pop up.
Pick one that sounds cool (e.g., HeavyAtomCount, NumValenceElectrons, TPSA).
Add a line to your code to calculate and print this new property.

The Beta (Docs):

Hint: Most descriptor functions work just like MolWt—they just take (mol) as input.
Example: n_atoms = Descriptors.HeavyAtomCount(mol)

Belay Check:

Did you get the image?
Did you get the mass?
Did you successfully calculate a third "mystery" property?

Deliverables:

Please submit the following two items:

1. A completed Jupyter notebook (.ipynb)

The notebook should run top-to-bottom without errors.
It should include your code and any brief comments you added while working.
Please follow this file naming convention → lastname_firstname_RID_005_code.ipynb
- The RID stands for "Route ID". This would be route #005.

How to download from Google Colab:

In Colab, click File → Download → Download .ipynb
This will save the notebook to your computer.

2. A short logbook entry (plain text, ~5-10 sentences):

Briefly describe:
- what was tricky or confusing
- what helped you get unstuck
- one thing you learned about working with real data
File naming convention → lastname_firstname_RID_005_logbook.txt
Focus on clarity and completeness.

Submission

Submit your files by uploading them to this Google form: SUBMIT LINK

Please upload both:

your .ipynb notebook
your logbook file

Make sure filenames follow the naming conventions above.

We will fine-tune our submission system as the course moves along. Thank you for your patience as a valued member of the CHEM 169/269 Climbing Gym.