Director-level data leader building enterprise master data, ontology, and Agentic AI governance systems for regulated R&D. From FAIR strategy and knowledge graphs to AI-ready data products across the discovery-to-commercial lifecycle.
I’m a Chemical Engineer turned digital leader with a Ph.D. from Queen’s University and 10+ years bridging pharmaceutical R&D, data science, and enterprise AI.
I lead FAIR Data Strategy and Digital Connectivity at Takeda Pharmaceutical — defining master data, ontology, and data catalog strategy across R&D. I’ve deployed production Agentic AI systems for ontology curation, built knowledge-graph-aligned data models for cell therapy, and own FAIR Studio, an industry-alliance product (Pistoia Alliance) that operationalizes FAIR assessment for pharmaceutical R&D organizations.
My focus has evolved from process modeling and digital twin development into AI strategy and data governance — building the enterprise-grade data foundations that make AI trustworthy in regulated environments. That means master data management, semantic layers, governance frameworks, and the Agentic AI systems that keep them current and auditable at scale.
Three roles across pharma R&D, biotech, and academic research.
Public repositories focused on scientific ML, drug discovery, and pharmaceutical AI.
A 4-agent LangGraph pipeline that autonomously curates biomedical ontology terms from pharmaceutical R&D documents. Extracts named entities with GPT-5.2, maps them across 10 ontologies (BioPortal + EBI OLS4), detects conflicts, and routes governance decisions to named domain stewards.
Python package and 5-article series implementing the RDA FAIR Maturity Model (41 indicators) and Pistoia Alliance FAIR Maturity Matrix (L0–L5 × 7 dimensions) for pharmaceutical R&D. Includes manual assessment, gap analysis, and remediation roadmaps.
PINNs, data-driven dynamics, hybrid physics-ML models, and uncertainty quantification for biopharmaceutical process development.
AI-powered drug discovery: molecular property prediction, SMILES encoding, and graph neural networks for molecular design.
NLP fundamentals to fine-tuned BERT models on scientific text, with applications in pharmaceutical literature mining.
LangChain-powered system for summarizing and querying multiple PDFs — applicable to regulatory document analysis.
Industry alliances, academic partnerships, and open-source contributions to the pharmaceutical data governance community.
Product owner and driving contributor for FAIR Studio, an industry-wide platform developed under the Pistoia Alliance for operationalizing FAIR data assessments at scale. An alliance product used across member pharma R&D organizations to embed governance into digital workflows.
Forged strategic research partnerships with MIT, BYU, Brown University, and Purdue University to advance Physics-Informed Neural Networks (PINNs), mechanistic modeling, and in-silico–first development capabilities for pharmaceutical R&D.
Author and maintainer of open-source tools for the pharma AI community: OntoCurator Agent (multi-agent ontology curation), FAIR Data Toolkit (automated FAIR assessment), and ScientificML (PINNs and mechanistic modeling).
Four pillars spanning data governance, AI, engineering science, and cloud infrastructure.
Thoughts on master data, knowledge graphs, FAIR data, Agentic AI, and pharmaceutical R&D data strategy.
A practitioner’s view on why the row-and-column gold record breaks down for medicinal products — and what to build instead. Covers ISO IDMP, IDMP-O, bitemporal graphs, agentic entity resolution, and a deployable reference architecture.
Why the next generation of pharma data architecture treats meaning as a first-class artifact. Covers OWL, SHACL, SKOS, R2RML, data mesh federation, production exemplars from Bayer COLID and Roche EDIS, and a five-step migration pattern for moving your hub to a read model.
How to design an audit-ready operating model where agents make bounded governance proposals, human stewards adjudicate the risky residue, and PROV-O provenance records every decision. Covers Article 14 oversight, GAMP 5 v2, NIST AI RMF, and a deployable curation pipeline.
A practical implementation pattern for pharma AI risk that connects ISO/IEC 23894 to ICH Q9(R1), ISO/IEC 42001, NIST AI RMF crosswalks, and FAIR plus ontology curation controls. Focuses on evidence-ready operations over governance theater.
A practical 90-day playbook for data leaders to baseline FAIR maturity, identify one high-leverage quick win, and ship a defensible 12-month roadmap before the first quarterly review.
How a four-agent LangGraph pipeline automates biomedical term extraction, ontology mapping, conflict detection, and governance routing — while preserving human oversight at every critical decision point.
A practical guide to embedding physical laws into neural network architectures for process development in pharmaceutical continuous manufacturing.
Article in ProgressA practitioner’s deep-dive into FAIR data maturity frameworks. Combines the RDA FAIR Maturity Model (41 indicators) and Pistoia Alliance FAIR Maturity Matrix (L0–L5 × 7 dimensions) with a complete walkthrough on a CAR-T viability dataset and a semi-automated Python scorer.
A decade after their publication, the FAIR principles have quietly become the backbone of modern data strategy. What they actually mean, why they matter more than ever in the age of AI, and how to put them to work—with a 5-step starter plan and a full ecosystem map.
Open to collaboration on scientific ML, Agentic AI, and pharmaceutical data science projects.