π§ Start Here
Scroll down to complete this route
Route 012: The Sweet Path
- RouteID: R012
- Wall: Small Molecule Representations
- Grade: 5.10b
- Routesetter: Erika + Abhijit + Sarah
- Time: 30 min (?)
- Dataset: Hexose sugars (provided)
Why this project exists
Stereochemistry is one of the most important and most subtle aspects of molecular structure. Two molecules can have the same atomic formula and connectivity, yet behave very differently due to how atoms are arranged in three-dimensional space.
Sugars are a clean and intuitive system for exploring this idea. In this project, youβll work with simple six-carbon sugars (hexoses) to understand how stereochemistry is encoded, visualized, and sometimes lost when working with computational tools like RDKit.
Prerequisite
- Complete R009: From SMILES to Molecular Properties to learn how to use SMILES and the functions of RDKit
By the end, you will be able to:
- Interpret SMILES representations of stereoisomers
- Convert SMILES strings into RDKit molecule objects
- Visualize sugars in both 2D and 3D
- Compare D/L stereoisomer pairs
- Identify limitations of RDKit for representing stereochemistry
- Reason about which molecular properties change (or do not change) between enantiomers
Background
Sugars are small organic molecules built from carbon chains or rings decorated with hydroxyl (βOH) groups. The number of carbons defines their class:
- Pentoses have five carbons (e.g., ribose)
- Hexoses have six carbons (e.g., glucose, galactose)
The orientation of hydroxyl groups in three-dimensional space is crucial. Molecules that share the same atomic formula but differ only in the spatial arrangement of their atoms are called stereoisomers.
A classic example is D-glucose versus L-glucose. Both have the formula CβHββOβ, but differ in the orientation of hydroxyl groups around chiral carbons. This difference leads to dramatically different biological behavior: D-glucose is a primary energy source for living cells, while L-glucose is far less biologically active.
So, for true D/L enantiomer pairs (mirror images), all achiral scalar properties are identical, and the differences appear only in chiral or 3D context-dependent properties, such as:
- Optical rotation
- Interactions with enzymes and other chiral proteins
- Some signed or environment-dependent 3D shape descriptors
Exercise 1: Load the Dataset
Goal: Set up your environment and load the sugar data.
- Download the dataset of hexose sugars - LINK.
- Install and import the RDKit library.
- Load the dataset into your notebook.
Success check:
- RDKit imports without errors.
- The dataset loads correctly and is visible in your notebook.
Exercise 2: Explore SMILES Representations
Goal: Understand how sugars are encoded as SMILES strings.
- Print the SMILES strings for each molecule in the dataset.
QUESTION:
- What is a SMILES string, and how does it encode molecular structure?
Exercise 3: Convert to RDKit Molecule Objects and Visualize Molecules
Goal: Translate SMILES strings into RDKit molecules and compare sugars visually in 2D and 3D.
- Parse each SMILES string into an RDKit Mol object.
- Draw each molecule in 2D.
- Render each molecule in 3D.
- For each molecule, display:
- Name
- Structure
- SMILES string
- Show D/L stereoisomer pairs side-by-side in 3D.
- Set `force_mirror=True` where appropriate.
- For each molecule, display:
QUESTIONS: - Does RDKit draw Fischer projections? - Why are Fisher projections used?
Exercise 4: Stereochemistry in SMILES
Goal: Understand how stereochemistry is encoded textually.
- Examine the SMILES strings for chiral sugars.
- In your notebook, answer
- How does SMILES encode stereochemistry?
Hint - Look for `@` symbols.
Exercise 5: Flip Stereochemistry
Goal: Manually manipulate stereochemical information.
- Take the SMILES string for D-Glucitol.
- Edit the stereochemical descriptors to generate L-Glucitol.
- Print:
- Both SMILES strings
- Both molecular structures
Success check:
- The two molecules are mirror images in 3D.
Exercise 6: Analyze Molecular Properties
Goal: Compute simple properties from SMILES using RDKit and understand what changes β and what doesnβt β between stereoisomers.
- For all the D/L sugar pairs from the dataset.
- Compute at least 5 molecular properties for each molecule.
- Compare the values for each D/L pair side by side.
- Plot or tabulate the results clearly.
- In your notebook, answer
- Can RDKit distinguish between D- and L-enantiomers using standard descriptors?
- In what real-world chemical or biological contexts would this limitation be important?
Example code for accessing descriptors
descriptor_dict = {'XLogP': Crippen.MolLogP(mol),
'HBD': Lipinski.NumHDonors(mol),
'HBA': Lipinski.NumHAcceptors(mol),
'TPSA': rdMolDescriptors.CalcTPSA(mol),
'MolWt': Descriptors.MolWt(mol),
'ExactMolWt': rdMolDescriptors.CalcExactMolWt(mol),
'RotBonds': Lipinski.NumRotatableBonds(mol),
'RingCount': rdMolDescriptors.CalcNumRings(mol),
'FormalCharge': int(sum(a.GetFormalCharge() for a in mol.GetAtoms())),
'HeavyAtoms': rdMolDescriptors.CalcNumHeavyAtoms(mol),
'FracCSP3': rdMolDescriptors.CalcFractionCSP3(mol),
'IsomericSMILES': Chem.MolToSmiles(mol, isomericSmiles=True)}
Success check:
- Descriptor values are identical (or numerically indistinguishable) for each D/L pair.
- If any differences appear, explain why they might be artifacts or errors.
Submission
Submit your files by uploading them to this google form: SUBMIT LINK
Please upload both:
- your .ipynb notebook
- your logbook file
Make sure filenames follow the naming conventions above.
π Route Complete!
Great work!