Navigate
←Back to Gym
← Back to Wall

The Sweet Path

Route ID: R012 β€’ Wall: W04 β€’ Released: Jan 5, 2026

5.10b
ready

πŸ§— Start Here

Scroll down to complete this route

Route 012: The Sweet Path

  • RouteID: R012
  • Wall: Small Molecule Representations
  • Grade: 5.10b
  • Routesetter: Erika + Abhijit + Sarah
  • Time: 30 min (?)
  • Dataset: Hexose sugars (provided)

Why this project exists

Stereochemistry is one of the most important and most subtle aspects of molecular structure. Two molecules can have the same atomic formula and connectivity, yet behave very differently due to how atoms are arranged in three-dimensional space.

Sugars are a clean and intuitive system for exploring this idea. In this project, you’ll work with simple six-carbon sugars (hexoses) to understand how stereochemistry is encoded, visualized, and sometimes lost when working with computational tools like RDKit.

Prerequisite

By the end, you will be able to:

  • Interpret SMILES representations of stereoisomers
  • Convert SMILES strings into RDKit molecule objects
  • Visualize sugars in both 2D and 3D
  • Compare D/L stereoisomer pairs
  • Identify limitations of RDKit for representing stereochemistry
  • Reason about which molecular properties change (or do not change) between enantiomers

Background

Sugars are small organic molecules built from carbon chains or rings decorated with hydroxyl (–OH) groups. The number of carbons defines their class:

  • Pentoses have five carbons (e.g., ribose)
  • Hexoses have six carbons (e.g., glucose, galactose)

The orientation of hydroxyl groups in three-dimensional space is crucial. Molecules that share the same atomic formula but differ only in the spatial arrangement of their atoms are called stereoisomers.

A classic example is D-glucose versus L-glucose. Both have the formula C₆H₁₂O₆, but differ in the orientation of hydroxyl groups around chiral carbons. This difference leads to dramatically different biological behavior: D-glucose is a primary energy source for living cells, while L-glucose is far less biologically active.

So, for true D/L enantiomer pairs (mirror images), all achiral scalar properties are identical, and the differences appear only in chiral or 3D context-dependent properties, such as:

  • Optical rotation
  • Interactions with enzymes and other chiral proteins
  • Some signed or environment-dependent 3D shape descriptors

Exercise 1: Load the Dataset

Goal: Set up your environment and load the sugar data.

  1. Download the dataset of hexose sugars - LINK.
  2. Install and import the RDKit library.
  3. Load the dataset into your notebook.

Success check:

  • RDKit imports without errors.
  • The dataset loads correctly and is visible in your notebook.

Exercise 2: Explore SMILES Representations

Goal: Understand how sugars are encoded as SMILES strings.

  1. Print the SMILES strings for each molecule in the dataset.

QUESTION:

  • What is a SMILES string, and how does it encode molecular structure?

Exercise 3: Convert to RDKit Molecule Objects and Visualize Molecules

Goal: Translate SMILES strings into RDKit molecules and compare sugars visually in 2D and 3D.

  1. Parse each SMILES string into an RDKit Mol object.
  2. Draw each molecule in 2D.
  3. Render each molecule in 3D.
    1. For each molecule, display:
      1. Name
      2. Structure
      3. SMILES string
      4. Show D/L stereoisomer pairs side-by-side in 3D.
    2. Set `force_mirror=True` where appropriate.

QUESTIONS: - Does RDKit draw Fischer projections? - Why are Fisher projections used?

Exercise 4: Stereochemistry in SMILES

Goal: Understand how stereochemistry is encoded textually.

  1. Examine the SMILES strings for chiral sugars.
  2. In your notebook, answer
    1. How does SMILES encode stereochemistry?

Hint - Look for `@` symbols.

Exercise 5: Flip Stereochemistry

Goal: Manually manipulate stereochemical information.

  1. Take the SMILES string for D-Glucitol.
  2. Edit the stereochemical descriptors to generate L-Glucitol.
  3. Print:
    1. Both SMILES strings
    2. Both molecular structures

Success check:

  • The two molecules are mirror images in 3D.

Exercise 6: Analyze Molecular Properties

Goal: Compute simple properties from SMILES using RDKit and understand what changes β€” and what doesn’t β€” between stereoisomers.

  1. For all the D/L sugar pairs from the dataset.
    1. Compute at least 5 molecular properties for each molecule.
  2. Compare the values for each D/L pair side by side.
  3. Plot or tabulate the results clearly.
  4. In your notebook, answer
    1. Can RDKit distinguish between D- and L-enantiomers using standard descriptors?
    2. In what real-world chemical or biological contexts would this limitation be important?

Example code for accessing descriptors

descriptor_dict = {'XLogP': Crippen.MolLogP(mol),
'HBD': Lipinski.NumHDonors(mol),
'HBA': Lipinski.NumHAcceptors(mol),
'TPSA': rdMolDescriptors.CalcTPSA(mol),
'MolWt': Descriptors.MolWt(mol),
'ExactMolWt': rdMolDescriptors.CalcExactMolWt(mol),
'RotBonds': Lipinski.NumRotatableBonds(mol),
'RingCount': rdMolDescriptors.CalcNumRings(mol),
'FormalCharge': int(sum(a.GetFormalCharge() for a in mol.GetAtoms())),
'HeavyAtoms': rdMolDescriptors.CalcNumHeavyAtoms(mol),
'FracCSP3': rdMolDescriptors.CalcFractionCSP3(mol),
'IsomericSMILES': Chem.MolToSmiles(mol, isomericSmiles=True)}

Success check:

  • Descriptor values are identical (or numerically indistinguishable) for each D/L pair.
  • If any differences appear, explain why they might be artifacts or errors.

Submission

Submit your files by uploading them to this google form: SUBMIT LINK

Please upload both:

  • your .ipynb notebook
  • your logbook file

Make sure filenames follow the naming conventions above.

πŸŽ‰ Route Complete!

Great work!