Navigate
Back to Gym
← Back to Wall

Morgan the Finger Printer

Route ID: R010 • Wall: W04 • Released: Jan 5, 2026

5.10c
ready

🧗 Start Here

Scroll down to complete this route

Route: Morgan the Finger Printer

  • RouteID: R010
  • Wall: Small Molecule Representations
  • Grade: 5.10c
  • Route-setter: Shivansh
  • Time: ~40 minutes
  • Dataset: steroid_dataset.csv

Why this route exists

Before we can cluster, search, or model chemical space, we need a way to represent molecules numerically. Cheminformatics solves this using fingerprints: fixed-length binary vectors that encode structural features. Different fingerprint families capture different chemistry — circular neighborhoods, key patterns, or topological paths — and those differences matter for downstream tasks.

In this route, we'll compute several fingerprints and visualize how they change molecular relationships.

What you'll build

During this climb, you will:

  1. Generate Morgan, MACCS, and Topological fingerprints for a set of molecules.
  2. Compare molecules using Tanimoto similarity matrices.
  3. Visualize similarity patterns as heatmaps.
  4. Inspect bits from Morgan fingerprints to reveal the substructures that activate them.

These skills form the foundations of chemical similarity search, QSAR models, and representation learning.


Exercise 0: The Knot Check (Setup & Syntax)

Goal: Make sure our tools are clipped in correctly before climbing.

Checklist:

  • Install RDKit in Colab
  • Import required libraries
  • Load your list SMILES strings from dataset: steroid_dataset
  • Convert SMILES → RDKit Mol objects
  • Display molecules to verify parsing

Belay Check:

  • All SMILES parse to Mol objects (no None)
  • Molecule count matches your dataset

If something fails here, don't continue upward.


Exercise 1: The Estimator (Fingerprint Generation)

Goal: Compute multiple different fingerprints for each molecule.

We will compute:

  1. Morgan (circular) fingerprints
    • radii: 1, 2, 3
    • size: ~1024 bits
    • uses bitInfo for introspection (used later)
  2. MACCS Keys
    • fixed ~166 bit key-based fingerprint
  3. RDKit Topological
    • encodes path-based graph features

Route Beta (Tools):

AllChem.GetMorganFingerprintAsBitVect(m, radius, nBits, bitInfo)
MACCSkeys.GenMACCSKeys(m)
RDKFingerprint(m)

Belay Check:

Confirm for at least one molecule:

  • Fingerprints are bit vectors (not objects)
  • Bit length matches expectations

Exercise 2: The Classifier (Tanimoto Similarity)

Goal: Compare structural similarity using Tanimoto similarity, defined for bit vectors as:

T(A, B) = |A ∩ B| / |A ∪ B|

Task:

For each fingerprint family:

  • Compute an N × N pairwise similarity matrix

where N = number of molecules.

Belay Check:

  • Similarity matrix is square
  • Diagonal entries = 1.0
  • Values in [0,1]

If diagonals are not 1.0, recheck your vector types (some users accidentally feed count vectors!)


Exercise 3: The Parser (Bit-Level Introspection)

Goal: Inspect Morgan fingerprint bits and discover which substructures caused them to fire.

This is the genuinely insightful step — fingerprints are no longer magic black boxes.

Beta (Tools):

Morgan calls can optionally populate:

bitInfo = {}
AllChem.GetMorganFingerprintAsBitVect(m, radius, nBits, bitInfo=bitInfo)

bitInfo maps: bit → (atom_id, radius)

Tasks:

  1. Identify 3–5 bits that appear frequently across molecules.
  2. Use bitInfo to retrieve atom environments for those bits.
  3. Use RDKit drawing utilities to visualize each substructure.

Belay Check:

For each inspected bit:

  • The drawn substructure appears without errors

Exercise 4: The Send (Similarity Heatmaps + Interpretation)

Goal: Visualize molecular relationships and compare fingerprints.

Tasks:

  1. Convert each similarity matrix into a heatmap.
  2. Label axes with molecule names or SMILES.
  3. Compare clustering patterns across:
    • Morgan (r=1)
    • Morgan (r=2)
    • Morgan (r=3)
    • MACCS
    • Topological

Submission

Submit your files by uploading them to the following Google Form:

SUBMIT LINK

Please upload both:

  • your .ipynb notebook
  • your logbook file

Make sure filenames follow the naming conventions above.

🎉 Route Complete!

Great work!