Navigate
โ†Back to Gym

First Ascent by

Bryan Rinde

Difficulty Grade: TBD by climberApproved

The Breathomics Ridge

Clinical Metabolomics / Tabular Deep Learning

The Proposed Route

A high-stakes route connecting the invisible chemistry of exhaled breath to the clinical reality of cystic fibrosis. The climber navigates hundreds of VOC features using Random Forest and TabNet โ€” a deep learning architecture with built-in attention โ€” to find the handful of biomarkers that distinguish infected lungs from healthy ones.

๐Ÿง— The Crux

The proxy dataset (fuel emissions profiles, Stamatis & Barsanti 2022) is a significant mismatch with the target application (CF breath). Results from the proxy may not translate to clinical breath data. TabNet is complex to implement correctly and interpret clinically. The 'needle in a haystack' framing is accurate โ€” but finding it requires that the haystack is the right one.

โš ๏ธ Pre-Climb Checklist

โš ๏ธ Be explicit in your notebook that the Stamatis dataset is a methodological stand-in, not a clinical proxy โ€” the pipeline is being validated, not the biomarkers. โš ๏ธ If own CF data is available and IRB-cleared, flag this to instructor immediately โ€” it changes the whole ascent. โœ… F-ratio + PCA + RF pipeline is well-structured. โš ๏ธ TabNet: use the pytorch-tabnet library and start with the tutorial notebook before integrating into your pipeline.

Guidance

  • Key contribution: RF vs TabNet comparison
  • Compare which VOCs each model highlights โ€” do they agree?
  • Do TabNet attention weights recover same features as RF importance?
  • That comparison is the scientific finding

Source proposal: Bryan_Rinde_CHEM269_Proposal.docx

โ† View all First Ascents

CHEM 169/269 ยท Applied AI & Machine Learning for Biochemistry