Managing Master & Reference Data

Overview

Master Data Management (MDM) is essential for maintaining data quality, consistency, and governance across your organization. Foundation provides built-in capabilities to categorize and manage your most critical data assets through Master and Reference data product categories, enabling you to establish clear data ownership, lineage, and quality standards.

This guide explains how to implement effective master data management practices using Foundation's data product categorization features.


Understanding Master vs Reference Data

Master Data

Master data represents the core business entities that are shared across your organization and require consistent management. These are your critical business objects that multiple systems and processes depend on.

Common examples include:

  • Customer records (accounts, contacts, demographics)

  • Product information (SKUs, specifications, pricing)

  • Employee data (personnel records, organizational structure)

  • Asset information (equipment, facilities, inventory)

  • Supplier and vendor details

Characteristics:

  • High business value and impact

  • Shared across multiple domains and use cases

  • Requires strict governance and quality controls

  • Changes infrequently but needs careful management

  • Single source of truth for the business entity

Reference Data

Reference data consists of standardized lookup values, codes, and classifications that provide context and consistency across your data ecosystem. This data categorizes and validates other data.

Common examples include:

  • Country and currency codes

  • Status codes (order status, account status)

  • Product categories and hierarchies

  • Industry classifications

  • Units of measurement

  • Time zones and calendar data

Characteristics:

  • Relatively static and standardized

  • Used for validation and categorization

  • Often industry-standard or regulatory-defined

  • Lower volume but high reuse across systems

  • Provides consistent terminology


Why Categorize Data Products

Properly categorizing data products as Master or Reference types provides several governance and operational benefits:

Data Governance

  • Establishes clear ownership and accountability for critical data

  • Enables appropriate access controls and security policies

  • Supports compliance with data protection regulations

  • Creates audit trails for sensitive business entities

Data Quality

  • Focuses quality improvement efforts on high-value data products

  • Establishes an expectation for stricter validation rules for master data

  • Stronger data quality rules reduce data duplication and inconsistency

  • Facilitates data stewardship activities

Operational Efficiency

  • Helps users quickly identify authoritative data sources

  • Improves data discovery through meaningful categorization

  • Guides integration and architecture decisions

  • Supports impact analysis when changes are needed

Decision Intelligence

  • Ensures AI models use trusted, high-quality data

  • Provides context for analytics and reporting

  • Enables consistent business metrics across the organization


Creating Master and Reference Data Products in Foundation

Prerequisites

Before starting, ensure you have:

  • The right permissions

  • Existing data products created in Foundation

  • Understanding of your organization's critical business entities

  • Defined data governance policies and ownership structure

Step 1: Identify Critical Entities

Review your existing data products and identify which ones represent master or reference data:

For Master Data, ask:

  • Is this a core business entity used across multiple domains?

  • Does this data require strict governance and quality controls?

  • Would inconsistencies in this data significantly impact business operations?

  • Is this the authoritative source for this entity?

For Reference Data, ask:

  • Does this provide standardized codes or classifications?

  • Is this used primarily for validation or categorization?

  • Is this data relatively static and standardized?

  • Does this support data consistency across systems?

If you identify some of them need to be joined together to create the full picture for that entity, then you will need to create a new Data Product and can proceed to step 2. If, on the other hand, there are Data Products that meet the requirements of Master Data and just need stronger quality controls or a change in ownership, you can go straight to step 3 or step 4.

Step 2: Build the Master or Reference Data Product with the Right Transformations

To learn how to create data products, please visit Creating and Managing Data Products

To learn how to use the transformations, please visit Configuring Data Transformations through the UI or Using the API to implement Data Transformations.

Considerations for Master Data Products

When combining data from multiple sources, establish rules for which source takes precedence:

  • Source Priority: Specify which source is authoritative for each field

    • Example: Salesforce for customer contact info, SAP for billing address

  • Most Recent Wins: Use the latest updated value across sources

  • Most Complete Wins: Choose the record with fewest null values

  • Custom Business Rules: Apply specific logic (e.g., "Use ERP price unless promotional price exists in e-commerce")

Best Practice: Start with a minimal viable transformation pipeline and add complexity iteratively. Test thoroughly at each stage before adding more transformations.

Performance Tip: For large datasets, consider using incremental processing transformations that only process changed records rather than the full dataset on each refresh.

Step 3: Access Data Product Configuration

Please visit Managing Data Product Metadatato understand how to configure or edit a data product.

Step 4: Set the Data Product Category and other Metadata

  1. In the Data Product Category field, select one of the following:

    • Master - For core business entities

    • Reference - For lookup values and classifications

  2. Add supporting metadata.

  3. Configure governance settings specific to master/reference data:

    • Data Quality Rules: Define completeness, accuracy, and validity checks

    • Update Frequency: Specify expected refresh schedules

    • Access Controls: Implement stricter permissions if needed

  4. Click Save to apply the categorization

Step 5: Understand Data Lineage

For master and reference data products, thorough lineage documentation is critical:

  1. Navigate to the Lineage tab of your data product

  2. Review the automatically generated lineage graph showing:

    • All connected data sources

    • Source-aligned data products

    • Each transformation step applied

    • The final master data product

  3. Review downstream dependencies:

    • Use the Lineage UI to identify which data products and applications consume this master data

    • Understand the impact radius for potential changes

  4. Document the transformation pipeline:

    • List each transformation applied and why

    • Explain how data quality improves through the pipeline

    • Note any data loss or filtering that occurs

Read Exploring Data Lineageto understand more.


Best Practices

Start with High-Impact Entities

Focus your initial MDM efforts on the master data that has the highest business impact:

  • Begin with customer or product data if you're in retail

  • Start with asset or equipment data if you're in logistics or manufacturing

  • Prioritize employee data if you're implementing HR analytics

Establish Clear Ownership

For each master data product:

  • Assign an owner from the domain that knows the data best

  • Designate a data steward responsible for day-to-day quality

  • Document escalation paths for data issues

  • Create a RACI matrix for data governance activities

Design Transformation Pipelines for Maintainability

When building transformation pipelines for master data:

  • Use descriptive names for each transformation step

  • Document the business rationale for each transformation

  • Keep transformations modular and reusable

  • Test transformations independently before chaining

  • Version your transformation logic alongside the data product

  • Favor library transformations over custom code for maintainability

Implement Progressive Governance

Don't try to enforce perfect governance from day one:

  1. Phase 1: Categorize and document existing master data

  2. Phase 2: Implement basic quality rules and monitoring

  3. Phase 3: Add approval workflows and access controls

  4. Phase 4: Establish formal data contracts and SLAs

  5. Phase 5: Continuously improve based on usage patterns

Monitor and Maintain

Master data management is an ongoing process:

  • Review quality metrics weekly or monthly

  • Conduct quarterly reviews of categorizations

  • Update documentation as business needs evolve

  • Gather feedback from data consumers

  • Track and resolve quality issues promptly

  • Refine transformation logic based on discovered data issues


Common Use Cases

Customer Master Data

Scenario: Creating a single customer view across CRM, ERP, and support systems

Implementation:

  1. Connect data sources: Salesforce (CRM), SAP (ERP), Zendesk (Support)

  2. Create source-aligned data products for each system

  3. Build a "Customer Master" consumer-aligned data product

  4. Apply transformations from the library:

    • Standardize customer names and addresses

    • Normalize phone numbers and email addresses

    • Deduplicate records using fuzzy matching on name + address

    • Merge records with golden record logic (Salesforce for contact info, SAP for billing)

    • Enrich with geography data (state, country from postal code)

    • Validate email addresses and flag invalid entries

  5. Categorize the Data Product as Master Data

  6. Configure data quality rules for:

    • Unique customer IDs

    • Valid email formats and phone numbers

    • Complete address information

  7. Review the lineage graphs showing transformation pipeline

  8. Implement access controls for PII compliance

  9. Monitor usage and quality metrics

Product Reference Data

Scenario: Maintaining standardized product categories and hierarchies

Implementation:

  1. Connect to PIM system and e-commerce platform

  2. Create source-aligned data products

  3. Build "Product Categories" consumer-aligned data product

  4. Apply transformations:

    • Standardize category names (title case, trim whitespace)

    • Build hierarchical relationships (parent-child)

    • Validate hierarchy completeness (no orphaned categories)

    • Add category descriptions from multiple sources

  5. Categorize as Reference Data

  6. Configure quality checks for:

    • Complete category hierarchies

    • Standardized naming conventions

    • No orphaned categories

  7. Review the lineage graphs showing transformation pipeline

  8. Implement access controls for PII compliance

  9. Monitor usage and quality metrics

Employee Master Data

Scenario: Centralizing HR data for analytics and operations

Implementation:

  1. Connect to HRIS (Workday), Active Directory, and Payroll system

  2. Create source-aligned data products for each

  3. Build "Employee Master" consumer-aligned data product

  4. Apply transformations:

    • Standardize employee names

    • Deduplicate based on employee ID

    • Build organizational hierarchy from Active Directory

    • Join compensation data from payroll (with strict access controls)

    • Calculate tenure and other derived fields

    • Validate required fields (manager, department, hire date)

  5. Categorize as Master Data with high sensitivity

  6. Configure strict access controls and data masking for compensation fields

  7. Implement quality rules for required fields

Last updated