Skip to content
FonteumThe Graph
DataResearchCare CompareThe DifferAttestAPI
See the proof
  • Data
  • Research
  • Care Compare
  • The Differ
  • Attest
  • API
See the proof
SOURCES · IDENTITY GRAPH

How Fonteum reconciles provider records across sources.

CCN is the 100% anchor identifier across all 15 CCN-keyed datasets in the snapshot manifest. NPI is partial (~10% via OIG LEIE intersection); full NPPES coverage lands Sprint 2. TIN, DEA, and state Medicaid IDs are dated commitments on the roadmap below.

We publish identifier coverage honestly — including where it's missing — because acquirer data teams will diligence it anyway. Same brand discipline as /trust/data-provenance and the data-availability flags on /sanctions.

Current-state coverage matrix

Per-identifier coverage across the Fonteum corpus, with explicit Sprint roadmap commitments where coverage is partial or pending.

IdentifierDefinitionCoverageSource familiesNotes
CCNCMS Certification Number — 6-character TEXT, leading zeros preserved100%POS + Care Compare 8 + NH-depth 4 + LEIE-via-name-matchAnchor identifier across our corpus. Every CCN-keyed query joins all 23 federal source families.
NPINational Provider Identifier — 10-digit TEXT~10% (LEIE intersection only)OIG LEIEFull NPI mapping requires NPPES ingestion (Sprint 2 priority Q3 2026). LEIE provides NPI for 8,551 of 68,055 exclusion records (~10%); the rest carry name + state.
TINTaxpayer ID — 9-digit TEXT, masked in most public sourcesNot currently ingested(Sprint 2: PECOS Ordering & Referring)Available via CMS PECOS Ordering & Referring file (Sprint 2 priority Q3 2026). Public PECOS exposes TIN for ordering/referring providers; full TIN coverage requires the §108B PECOS ingestion to land.
DEADrug Enforcement Administration registration — 9-character TEXTNot currently ingested(Sprint 3: DEA Active Registrants)DEA Active Registrants Q4 2026 target. Subject to DEA data-distribution licensing review; not all DEA distributions are redistributable.
Medicaid Provider IDPer-state Medicaid program ID — format varies by stateNot currently ingested(Sprint 4: state-specific)State-by-state ingestion. CA, NY, TX prioritized for Sprint 4 (Q1 2027). Each state has its own Medicaid Management Information System (MMIS) with distinct data formats and access policies.
HCP-OneKey IDIQVIA proprietary individual-physician identifierNot applicable—Proprietary identifier requiring IQVIA license. Fonteum does not ingest licensed reference data; Fonteum is anchored on public-record sources only.

Sample crosswalk — CCN 015009

One worked example showing what a single facility looks like when joined across the CCN-keyed datasets in the snapshot manifest. Verifiable via the audit-pack export endpoint at /api/v1/audit-pack/export?ccn=015009 (requires API key — see /pilot-intake).

CCN015009· Anchor identifier
Facility nameBURNS NURSING HOME, INC.· As reported in CMS POS
StateAL
Facility typeskilled-nursing-facility
POS record1 (release 2026-05-07)
Care Compare NH1 (release 2026-05-07)
PBJ daily staffing4 reporting quarters
NH deficienciesN citations (per snapshot)
NH penaltiesM records (per snapshot)
SNF Owners (Phase-1)1 record · ownership_pct missing per Health Affairs 2024 baseline
SFF statusflagged 85/441 active/candidate corpus
NPInot currently mapped (LEIE intersection N/A)
TINnot ingested (Sprint 2 PECOS)
DEAnot ingested (Sprint 3)

Per-field provenance ships as a fourteen-field contract on every audit-pack record: (source, source_url, dataset_id, snapshot, methodology, last_checked, confidence, availability, pipeline_version, doi, license, coverage_start, coverage_end, slsa_provenance_url).

JSON snippet — first NDJSON line of /api/v1/audit-pack/export?ccn=015009
{"schema_type":"fonteum-audit-pack-export","schema_version":"1.0","ccn":"015009","format":"ndjson","methodology_version":"v2026.05.0","emitted_at":"<iso-timestamp>"}
{"line_type":"audit_pack_record","ccn":"015009","facility_type":"skilled-nursing-facility","facility_name":"BURNS NURSING HOME, INC.","state":"AL","pos_qa_status":"ok",
  "fields":{"sff_status":{"value":"<status>","provenance":{"source":"CMS Special Focus Facility list","snapshot_date":"2026-05-07","data_availability":"available"}},
            "ownership_pct":{"value":null,"provenance":{"source":"CMS SNF All Owners","snapshot_date":"<iso>","data_availability":"missing-by-design","caveat":"Per Health Affairs March 2024: 82.40% of top-10-chain ownership_pct missing in CMS data"}}}}

Crosswalk methodology in brief

CCN-anchored joins. Every facility-keyed query in the Fonteum corpus joins on the 6-character CCN with leading zeros preserved as TEXT (not coerced to integer — alphanumeric CCNs would lose information on int-cast). The CCN is the 100% anchor across POS + Care Compare 8 + NH-depth 4.

Name + state fallback. Sources without CCN (OIG LEIE individual-provider records, restricted-source name lists) join via fuzzy name + state pairing. The fuzzy match runs through the §sprint1-nh-depth-ingest-b PE/REIT entity registry pattern: substring match against a curated alias list with per-entity confidence scoring. Match results are flagged in the audit-pack with data_availability: "name-matched" so downstream consumers can opt out of fuzzy joins.

Edge cases.CCN reuse after facility closure (a CCN can be re-issued after termination) is handled by joining on (CCN, snapshot_date) tuples — every audit-pack record is anchored to a snapshot date, so two sequential occupants of a CCN appear as distinct records, not a merged history. Facility name variants (legal name vs DBA, casing differences, "Inc." vs "Incorporated") are normalized via the §sprint1-export hydration service before comparison.

Full per-field provenance contract: /methodology. Per-source license + redistribution posture: /trust/data-provenance.

Roadmap

Dated commitments. Each entry is subject to data-source availability and licensing review; we update this page on Sprint completion (or earlier if a milestone slips).

Sprint 2 · Q3 2026NPPES ingestion → full NPI ↔ CCN crosswalk across all individual-provider source families
Sprint 2 · Q3 2026PECOS Ordering & Referring → TIN coverage for ordering/referring providers (where public)
Sprint 3 · Q4 2026DEA Active Registrants (subject to data-distribution licensing review)
Sprint 4 · Q1 2027State Medicaid Provider IDs — California, New York, Texas initial; remaining states phased

How acquirers and integration partners consume this

  • REST API today. GET /api/v1/audit-pack/export?ccn=015009 returns the full record with all available identifiers + per-field provenance. Authenticate via the API access flow at /pilot-intake.
  • Delta Sharing (Sprint 2 Q3 2026). Parquet-native crosswalk table with same provenance contract.
  • Snowflake Secure Data Share (Sprint 2 Q3 2026). Same data, exposed natively via Snowflake-native distribution.
  • SFTP delivery. Available on request — see /integrations.

Related

  • · /sources — Source registry index
  • · /sources/cadence — Per-source refresh cadence
  • · /trust/data-provenance — Per-source license + redistribution posture
  • · /audit-pack — Compliance-grade export deliverable
  • · /methodology — Per-field provenance contract
  • · /integrations — Delta Sharing, Snowflake, S3 roadmap
Fonteum
Products
The DifferAttestAPIFHIR API
Data
Care CompareResearchData catalogSources
Company
AboutPressEditorial policyCorrections
Legal
Privacy policyTerms of serviceMedical disclaimer

Reviewed by Jennifer Montecillo, MD, medical reviewer. Non-practicing medical reviewer.

© 2026 Fonteum, Inc. All rights reserved.

The U.S. healthcare graph AI can cite — every fact carries its source.

Request access→