From Ontology to MDM: How Semantic Layers Are Replacing Hub-and-Spoke Master Data Architectures in Pharma

01 — The Problem

The Gap Between MDM-as-Sold and MDM-as-Needed

Pharma has mature, well-funded MDM capability. And almost nobody downstream trusts it completely.

Walk into almost any large pharma’s enterprise architecture function today and you will find a mature, well-funded Master Data Management capability. Customer, product, organization, site, study, asset — each has a hub, a steward, a workflow, an integration layer, and a roadmap of “domains still to onboard.” It looks like a solved problem.

Then ask anyone downstream — a clinical data manager, a regulatory writer, a pharmacovigilance analyst, an R&D data scientist — whether they trust it. The answer is almost always the same: for some things, in some systems, sometimes.

The gap is not a tooling failure. The hubs do what hubs do. The gap is architectural: the row-and-column “golden record” is the wrong unit of mastering for an industry whose data exchange surface is a stack of evolving, machine-processable, semantically rich standards. Per Gartner research cited via Dataversity, around 75% of MDM programs fail to meet their business objectives — a figure that has reportedly worsened since 2015.¹

In pharma, the failure is usually not “we couldn’t deduplicate customers.” It’s “we built a hub, and then we still had to maintain a parallel translation layer for every regulatory submission, every clinical standard, every research consumer.” That parallel translation layer is the de facto semantic layer. The question facing pharma enterprise architecture in 2026 is whether to keep treating it as an integration tax — or to promote it to the primary architectural primitive and demote the hub to a derived view.

The core argument: Stop maintaining translation tiers. Promote them. The semantic layer is not a new technology — every piece of the stack has been a stable W3C Recommendation for over a decade. The production exemplars (Bayer COLID since 2019, Roche EDIS since 2017) have been running long enough to call them proven. What is new is the willingness to make the semantic layer the actual master.

02 — Definition

Semantic Layer: A Working Definition

A semantic layer is logical, not physical. It holds the meaning of the data, not the data itself — and that distinction is the whole architectural argument.

Strip it back to the architectural definition: “A semantic layer is a piece of enterprise data architecture designed to simplify interactions between complex data storage systems and business users… The semantic layer provides an intuitive interface that converts that data into meaningful business terms.”²

The crucial property is logical, not physical. A semantic layer does not hold the data; it holds the meaning of the data, expressed as a model that downstream consumers — humans, BI tools, agents, regulators — query against. The data itself can live in a lake, a warehouse, a relational system, a graph store, or all of those at once.

The lineage is older than most pharma teams realize. The first commercial semantic layers were the BusinessObjects “Universe” and the MicroStrategy “Semantic Graph” in the 1990s.³ Looker’s LookML (2012) made “semantics as code” — Git-versioned, peer-reviewed model definitions — into a mainstream practice.³ Today’s iteration is the universal or headless semantic layer: a tool-agnostic platform (dbt Semantic Layer, Cube, AtScale, Power BI semantic models) that defines logic once and serves it through APIs to every downstream consumer.³ Gartner explicitly positions semantic layers as a structural component of AI-era analytics architecture.⁴

Primitive	Holds	Optimized for	Failure mode
MDM hub	Reference records (gold)	Operational reconciliation	Lossy denormalization, brittle survivorship
Data warehouse / lake	Facts and history	Analytics throughput	Schema sprawl, no shared meaning
Knowledge graph	Entities + typed edges + provenance	Compositional reasoning	Governance scale, query latency
Semantic layer	Definitions, mappings, constraints	Consistent interpretation across consumers	Drift if not version-controlled

The key insight: The semantic layer is not a substitute for the others — it is the contract that binds them. It says: “Wherever this entity lives, this is what it means, these are the rules it must satisfy, and these are the names it answers to across the standards we care about.”

03 — The Pharma Case

Why Pharma Is Unusually Suited to Semantic Layers

Almost every industry has standards. Pharma has a stack of them, each authoritative for a different slice of the same entity, each maintained by a different body, each updated on its own cadence.

The multi-standard reality of pharma is not a problem to be solved — it is the reason a semantic layer outperforms a hub. A hub-and-spoke MDM has to either flatten all of this into a denormalized row (and lose the semantics that make the standards regulatorily binding) or replicate the relationships in custom mapping tables that drift over time. A semantic layer treats every standard as a named, versioned view onto a common underlying model.

The same molecule, viewed through the CDISC lens, appears as STUDYID.SUBJID.EXTRT; viewed through the IDMP lens, as an MPID with linked substances; viewed through the OMOP lens, as a concept in the Drug Exposure domain. One model. Many projections.

Standard	Governing Body	Scope	Regulatory binding
CDISC SDTM / ADaM / CDASH	CDISC	Clinical study data	FDA, PMDA (required)⁵
OMOP CDM	OHDSI	Observational / RWD	Open community standard⁸
HL7 SPL	HL7 / FDA	Product labeling	FDA-binding¹⁰
EMA eAF	EMA	Regulatory applications	Mandatory for CAPs from Sep 1, 2026¹¹
ISO IDMP	ISO / EMA	Product identification	EMA SPOR services
MedDRA / RxNorm / NCIt / UNII	ICH / NLM / NCI / FDA	Terminologies	Overlay every layer

Historical precedent — BRIDG Model (ISO 14199, Dec 2024): The Biomedical Research Integrated Domain Group model — jointly governed by CDISC, HL7, ISO, NCI, and FDA — demonstrated that one harmonized semantic model can underpin many surface standards.¹² Its real legacy is proof of concept: one model, many standards. NCI/CBIIT now notes the model is “no longer actively maintained” as a living artifact and should be treated as a foundational reference.¹³

04 — The Stack

The W3C Stack That Does the Actual Work

Four W3C Recommendations, all stable for over a decade, give you everything needed to build a declarative, version-controllable, queryable semantic layer from end to end.

OWL 2

Web Ontology Language

Provides formal class semantics: subClass, equivalentClass, disjointness, property characteristics, and the reasoning substrate behind any nontrivial enterprise ontology. OWL defines what entities exist.

W3C Recommendation — 11 December 2012¹⁴

SKOS

Simple Knowledge Organization System

Represents controlled vocabularies, thesauri, and taxonomies in RDF. The right tool for MedDRA-style hierarchies and value sets. SKOS defines how entities are labeled and arranged.

W3C Recommendation — 18 August 2009¹⁵

SHACL

Shapes Constraint Language

Validates RDF graphs against declarative constraints. In a semantic layer, SHACL is what makes “this CDISC dataset is conformant” or “this IDMP submission is complete” into a query, not a code path. SHACL defines what is true of entities.

W3C Recommendation — 20 July 2017¹⁶

R2RML

RDB to RDF Mapping Language

Projects existing relational systems into the semantic layer without rebuilding them. This is the key to incremental migration: your MDM hub becomes one input among several. R2RML defines how relational data projects into the model.

W3C Recommendation — 27 September 2012¹⁷

What this means in practice: OWL defines what entities exist. SKOS defines how they are labeled and arranged. SHACL defines what is true of them. R2RML defines how to project the relational systems you already have. The same query can resolve an entity across a clinical warehouse (via R2RML), a regulatory submission (via IDMP shapes), and a SharePoint document library (via SKOS tagging). There is no comparable native primitive in any hub-and-spoke MDM product.

05 — Operating Model

The Federation Angle: Data Mesh as the Operating Model

A semantic layer owned by a central team and consumed by everyone else is just a centralized warehouse with a fancier query language. The pattern that holds up under pharma’s organizational scale is a federation.

Zhamak Dehghani’s data mesh thesis articulates the operating model in four principles: “domain-oriented decentralized data ownership and architecture, data as a product, self-serve data infrastructure as a platform, and federated computational governance.”¹⁸¹⁹

For pharma, the mapping is direct. Each functional domain — discovery, translational, clinical, regulatory, manufacturing, commercial, safety — owns its slice of the model. Each publishes data products with explicit contracts. A central platform team operates the substrate (the graph store, the mapping engine, the SHACL validator, the SKOS service). A federated governance body (semantic council, ontology board — call it what you like) arbitrates cross-domain alignment.

Architectural primitive	Is about…	Role in the federated model
Data mesh	Ownership and contract	Who owns which slice and what they guarantee
Semantic layer	Meaning	What entities mean and how they relate across domains
MDM hub	Reference-record reconciliation	A derived asset serving legacy operational systems — not the master

The mistake the last decade made was treating MDM as if it could do all three at once. These three primitives are complementary, not alternative. The hub does not disappear — it just stops being the master.

06 — Production Evidence

Pharma Exemplars Already in Production

This is not theoretical. Three pharma-authored implementations are openly documented and worth looking at.

Bayer

COLID — Corporate Linked Data

Fully operational across all Bayer divisions since January 2019. Provides persistent, globally unique URIs for corporate metadata assets, an RDF data model, and a SPARQL endpoint for consumers. Published under BSD-3-Clause license — to date, the most concrete example of a major pharma replacing a traditional MDM-style metadata registry with a semantic-layer architecture.²⁰

In production since Jan 2019 • Open source (BSD-3-Clause)

Roche

EDIS + Lynx

EDIS (Enhanced Data and Insight Sharing) launched in 2017 as a company-wide program to transform Roche’s data management strategy. The Roche Dataset Portal’s metamodel is “entirely specified using FAIR standards and community vocabularies.”²¹ The companion Lynx system is a knowledge-graph engine for reference data integration across Roche’s semantic ecosystem.²²

EDIS launched 2017 • Lynx: SEMANTiCS 2021

Pistoia Alliance

FAIR Implementation Project

Cross-pharma project with deliverables including the FAIR Toolkit, the FAIR Maturity Matrix (v1.1, March 2025), and FAIR-aligned submission frameworks for in vitro pharmacology and bioassay metadata.²³ The FAIR Maturity Matrix provides a defensible target state for a semantic-layer program and lets you measure progress without inventing metrics from scratch.

FAIR Maturity Matrix v1.1 — March 2025

MELLODDY (IMI) — the federation proof of concept: The European IMI MELLODDY project (Bayer, GSK, Novartis, Janssen, and six other pharmas) built an industry-scale federated machine-learning platform for drug discovery without sharing the underlying data. The architectural lesson is the same one the semantic-layer pattern enforces: the unit of sharing is contracts and meaning, not raw rows.

07 — Migration Pattern

Migration Pattern: How a Hub Becomes a Read Model

The hardest question is not “should we build a semantic layer?” — that is a settled bet. It’s “what happens to the MDM hub we already have?”

The wrong answers are “rip it out” (politically and operationally untenable) and “ignore the new architecture” (which is how parallel-stack rot starts). The pattern that works, in roughly this order:

Step

1

Stand up the semantic layer alongside the hub

Pick one anchor domain — products and substances is usually the right starting point, given the IDMP-O foundation already exists. Define the ontology in OWL, the value sets in SKOS, the constraints in SHACL. Use R2RML to project the existing MDM hub into the semantic layer, so the hub becomes one source among several rather than the source of truth.

OWL ontology SKOS value sets SHACL constraints R2RML mapping

Step

2

Make the semantic layer authoritative for new uses

New downstream consumers (regulatory submission generators, AI agents, FAIR data products, cross-domain analytics) read from the semantic layer, never the hub directly. This is the moment the architectural center of gravity actually moves.

Regulatory submission generators AI agents FAIR data products

Step

3

Demote the hub to a write-side cache

The hub continues to serve legacy operational systems that need a flat API. But the hub’s content is now derived from the semantic layer — facts asserted in the graph project down into hub rows, not the other way around. The survivorship rules that used to be the hub’s secret sauce become declarative SHACL constraints, version-controlled and auditable.

Legacy system bridge SHACL replaces survivorship rules

Step

4

Federate ownership

As more domains stand up their own slice of the semantic model, the central team’s job shifts from “owning the master records” to “operating the platform and arbitrating cross-domain alignment.” This is when data mesh principles stop being slogans and start being how the system actually runs.

Semantic council Domain ownership Cross-domain invariants

Step

5

Project into external standards as views

CDISC Define-XML, IDMP submission XML, FDA SPL, EMA eAF — each becomes a generated artifact, produced by querying the semantic layer through the appropriate shapes. New regulatory standards require a new projection, not a re-platforming. EMA eAF goes mandatory for CAPs on 1 September 2026¹¹ — that is the near-term forcing function.

CDISC Define-XML IDMP submission XML FDA SPL EMA eAF (Sep 2026)

The economic case writes itself once a single regulatory submission is materially cheaper to generate. The integration tax in pharma R&D is structurally large and increasingly avoidable — the marginal cost of integrating a new system is now genuinely lower than maintaining a custom hub mapping, particularly with LLMs handling schema-mapping tasks.²⁴

08 — Pitfalls

Implementation Pitfalls Worth Naming

A few failure modes that consume budget without producing outcomes.

Treating the ontology as an IT artifact

An enterprise ontology is regulated reference data with its own change-control regime. It needs versioning (SemVer is fine), release notes, deprecation policies, and an explicit governance body. Building it as a JIRA project that “the data team owns” produces an ontology nobody outside the team trusts. Fund the ontologists; staff a semantic council that includes regulatory, clinical, and commercial domain experts.

Conflating “semantic layer” with “BI tool semantic model”

A Power BI semantic model is a semantic layer in the BI-tool-specific sense — useful, but scoped to a single consumption tool. The enterprise semantic layer is upstream of every BI tool, every AI agent, every regulatory submission generator. The two coexist; they are not interchangeable.

Skipping SHACL

Constraints are the part of the stack that turns a semantic model from an ER diagram into something operationally trustworthy. Without SHACL (or equivalent shape language), the model is documentation; with it, the model is a contract enforceable by the platform.

Centralizing the council

Federated governance is the principle. A single central body that adjudicates every change is the anti-pattern. Each domain should be able to evolve its own slice within agreed cross-domain invariants; the central role is to maintain the invariants and arbitrate when domains disagree.

Underestimating the talent gap

Ontology engineering is a discipline. Hiring a “knowledge graph engineer” without a clear distinction between data engineering and semantic engineering produces a team that builds graphs which look right and reason wrong. The job descriptions need to differentiate; the training pipeline rarely does.

09 — Looking Ahead

Where the Next 24 Months Go

Three things are converging fast enough to make 2026–2027 the inflection year for this transition in pharma.

Regulatory pressure for machine-processable submissions

CDISC requirements are decade-old. IDMP rollout continues. EMA eAF becomes mandatory for Centrally Authorised Products on 1 September 2026.¹¹ SPL is already FDA-binding.¹⁰ Each is a forcing function for structured, semantically grounded internal data — and each is materially cheaper to serve from a semantic layer than from a hub plus a translation tier.

Foundation models reshape the mapping economy

Schema mapping, entity resolution, and ontology alignment — the historic bottlenecks in semantic-layer rollouts — are exactly the tasks where LLMs match or exceed prior state-of-the-art. The economics of building and maintaining a semantic layer at pharma scale have shifted; the marginal cost of integrating a new system is now genuinely lower than maintaining a custom hub mapping.

Data mesh moves from talking point to operating model

The “data-as-product” framing has reached the point where major pharma organizations are restructuring data teams around it. Combined with semantic-layer thinking, this gives the federated ownership model that scales — without surrendering coherence.

The architectural conclusion: The master data layer of the next pharma stack is not a hub. It is an ontology, governed as a regulated artifact, expressed in W3C-standard primitives, owned by federated domains, and serving the rest of the enterprise — including the legacy MDM hub itself — as a logical, version-controlled view. The hub does not disappear. It just stops being the master.

Closing Thought

Stop Maintaining Translation Tiers. Promote Them.

Pharma has been pretending for a long time that a row in a hub is the same thing as the meaning of a medicinal product, a clinical study, a substance, or a site. It never was. The semantic layer is not a new technology — every piece of the stack has been a stable W3C Recommendation for a decade or more, and the production exemplars (Bayer COLID since 2019, Roche EDIS since 2017) have been running long enough to call them proven.

What is new is the willingness — driven by regulatory math, by AI economics, and by the cumulative weight of MDM programs that didn’t deliver — to actually make the semantic layer the master. The teams that get to that architectural state first will spend the rest of the decade integrating new systems by mapping, not by migrating. The teams that don’t will keep paying the integration tax.

Stop maintaining translation tiers. Promote them.

Back to Portfolio Related: Product Mastering & KGs → Related: FAIR Data Principles → Related: ISO/IEC 23894 in Pharma R&D →

References

24 References

1Knight, M. Common Master Data Management (MDM) Pitfalls. Dataversity, 11 July 2025 — citing a DGIQ Conference presentation attributing the figure to Gartner research. dataversity.net
2IBM. What is a semantic layer? IBM Think Topics. ibm.com
3Databricks. Semantic Layer Architecture: Components, Design Patterns, and AI Integration. Databricks Blog. databricks.com
4Gartner. Rethink Semantic Layers to Support the Future of Analytics and AI, 8 April 2025. gartner.com
5CDISC. Foundational Standards (SDTM, ADaM, CDASH, SEND). cdisc.org
6CDISC. Define-XML. cdisc.org
7U.S. Food and Drug Administration. Study Data Technical Conformance Guide — Technical Specifications Document. Docket FDA-2014-D-0092. fda.gov
8OHDSI. Data Standardization. ohdsi.org
9Hripcsak, G., Duke, J. D., Shah, N. H., et al. Observational Health Data Sciences and Informatics (OHDSI): Opportunities for Observational Researchers. Studies in Health Technology and Informatics 216:574–578, 2015. doi.org
10U.S. Food and Drug Administration. Structured Product Labeling Resources. fda.gov
11European Medicines Agency. EU Electronic Application Forms (eAF). ema.europa.eu
12HL7 International. BRIDG Model (Biomedical Research Integrated Domain Group). confluence.hl7.org
13National Cancer Institute / CBIIT. BRIDG Model Documentation — note that the model “is no longer actively maintained.” cbiit.github.io
14W3C. OWL 2 Web Ontology Language Document Overview (Second Edition). W3C Recommendation, 11 December 2012. w3.org
15W3C. SKOS Simple Knowledge Organization System Reference. W3C Recommendation, 18 August 2009. w3.org
16W3C. Shapes Constraint Language (SHACL). W3C Recommendation, 20 July 2017. w3.org
17W3C. R2RML: RDB to RDF Mapping Language. W3C Recommendation, 27 September 2012. w3.org
18Dehghani, Z. How to Move Beyond a Monolithic Data Lake to a Distributed Data Mesh. martinfowler.com, 20 May 2019. martinfowler.com
19Dehghani, Z. Data Mesh Principles and Logical Architecture. martinfowler.com, 3 December 2020. martinfowler.com
20Bayer Group. COLID — Corporate Linked Data (open-source documentation). bayer-group.github.io
21Pistoia Alliance FAIR Toolkit. FAIR Data by Design — Roche EDIS / Roche Dataset Portal Case Study. fairtoolkit.pistoiaalliance.org
22Fernández, J. D., Lasierra, N. (Roche). Lynx: A FAIR Knowledge Graph Engine for Reference Data Integration. SEMANTiCS 2021 EU Conference. semantics.cc
23Pistoia Alliance. FAIR Implementation Project (FAIR Toolkit, FAIR Maturity Matrix v1.1, March 2025). pistoiaalliance.org
24Chilukuri, S., Fleming, E., Westra, A. Digital in R&D: The $100 Billion Opportunity. McKinsey & Company. mckinsey.com

From Ontology to MDM:How Semantic Layers Are ReplacingHub-and-Spoke Master Data Architectures in Pharma

The Gap Between MDM-as-Sold and MDM-as-Needed

Semantic Layer: A Working Definition

Why Pharma Is Unusually Suited to Semantic Layers

The W3C Stack That Does the Actual Work

The Federation Angle: Data Mesh as the Operating Model

Pharma Exemplars Already in Production

Migration Pattern: How a Hub Becomes a Read Model

Implementation Pitfalls Worth Naming

Where the Next 24 Months Go

Stop Maintaining Translation Tiers. Promote Them.

24 References

From Ontology to MDM:
How Semantic Layers Are Replacing
Hub-and-Spoke Master Data Architectures in Pharma