First Ascent by
Bethany Ann Sawyer
Twenty Moves on Thin Holds
Organometallic Chemistry / Catalysis
The Proposed Route
A short, elegant problem on a small wall: 20 aryl halides, hand-built by the climber, interrogated for what their 2D fingerprint reveals about their reactivity with palladium. The route is intimate and exploratory โ less about the summit, more about reading the rock.
๐ง The Crux
The dataset must be self-constructed โ if the data is poorly sourced or inconsistently labeled, the route collapses before it begins. N=20 is the defining constraint: regression will have almost no statistical power, and any model that 'works' may be fitting noise.
โ ๏ธ Pre-Climb Checklist
โ ๏ธ Dataset does not yet exist โ student must build it before coding begins. Ensure data sources are reliable (literature kobs or ฮGโก values). โ ๏ธ N=20 means Rยฒ will be noisy โ this is expected and should be treated as a feature, not a bug. No baseline comparison model specified.
Guidance
- Small dataset is the point โ explore ML limits with scarce data
- Build the 20-compound dataset carefully with reliable literature values
- Interrogate residuals: which compounds does the model get wrong, and why?
- The finding isn't the Rยฒ โ it's the gap between chemical intuition and model output
Source proposal: Bethany_Ann_Sawyer_Chem269_FinalProjectProposal.pdf
CHEM 169/269 ยท Applied AI & Machine Learning for Biochemistry