Managing Master & Reference Data
Overview
Master Data Management (MDM) is essential for maintaining data quality, consistency, and governance across your organization. Foundation provides built-in capabilities to categorize and manage your most critical data assets through Master and Reference data product categories, enabling you to establish clear data ownership, lineage, and quality standards.
This guide explains how to implement effective master data management practices using Foundation's data product categorization features.
Understanding Master vs Reference Data
Master Data
Master data represents the core business entities that are shared across your organization and require consistent management. These are your critical business objects that multiple systems and processes depend on.
Common examples include:
Customer records (accounts, contacts, demographics)
Product information (SKUs, specifications, pricing)
Employee data (personnel records, organizational structure)
Asset information (equipment, facilities, inventory)
Supplier and vendor details
Characteristics:
High business value and impact
Shared across multiple domains and use cases
Requires strict governance and quality controls
Changes infrequently but needs careful management
Single source of truth for the business entity
Reference Data
Reference data consists of standardized lookup values, codes, and classifications that provide context and consistency across your data ecosystem. This data categorizes and validates other data.
Common examples include:
Country and currency codes
Status codes (order status, account status)
Product categories and hierarchies
Industry classifications
Units of measurement
Time zones and calendar data
Characteristics:
Relatively static and standardized
Used for validation and categorization
Often industry-standard or regulatory-defined
Lower volume but high reuse across systems
Provides consistent terminology
Why Categorize Data Products
Properly categorizing data products as Master or Reference types provides several governance and operational benefits:
Data Governance
Establishes clear ownership and accountability for critical data
Enables appropriate access controls and security policies
Supports compliance with data protection regulations
Creates audit trails for sensitive business entities
Data Quality
Focuses quality improvement efforts on high-value data products
Establishes an expectation for stricter validation rules for master data
Stronger data quality rules reduce data duplication and inconsistency
Facilitates data stewardship activities
Operational Efficiency
Helps users quickly identify authoritative data sources
Improves data discovery through meaningful categorization
Guides integration and architecture decisions
Supports impact analysis when changes are needed
Decision Intelligence
Ensures AI models use trusted, high-quality data
Provides context for analytics and reporting
Enables consistent business metrics across the organization
Creating Master and Reference Data Products in Foundation
Prerequisites
Before starting, ensure you have:
The right permissions
Existing data products created in Foundation
Understanding of your organization's critical business entities
Defined data governance policies and ownership structure
Step 1: Identify Critical Entities
Review your existing data products and identify which ones represent master or reference data:
For Master Data, ask:
Is this a core business entity used across multiple domains?
Does this data require strict governance and quality controls?
Would inconsistencies in this data significantly impact business operations?
Is this the authoritative source for this entity?
For Reference Data, ask:
Does this provide standardized codes or classifications?
Is this used primarily for validation or categorization?
Is this data relatively static and standardized?
Does this support data consistency across systems?
Step 2: Build the Master or Reference Data Product with the Right Transformations
To learn how to create data products, please visit Creating and Managing Data Products
To learn how to use the transformations, please visit Configuring Data Transformations through the UI or Using the API to implement Data Transformations.
Considerations for Master Data Products
When combining data from multiple sources, establish rules for which source takes precedence:
Source Priority: Specify which source is authoritative for each field
Example: Salesforce for customer contact info, SAP for billing address
Most Recent Wins: Use the latest updated value across sources
Most Complete Wins: Choose the record with fewest null values
Custom Business Rules: Apply specific logic (e.g., "Use ERP price unless promotional price exists in e-commerce")
Best Practice: Start with a minimal viable transformation pipeline and add complexity iteratively. Test thoroughly at each stage before adding more transformations.
Performance Tip: For large datasets, consider using incremental processing transformations that only process changed records rather than the full dataset on each refresh.
Step 3: Access Data Product Configuration
Please visit Managing Data Product Metadatato understand how to configure or edit a data product.
Step 4: Set the Data Product Category and other Metadata
In the Data Product Category field, select one of the following:
Master - For core business entities
Reference - For lookup values and classifications
Add supporting metadata.
Configure governance settings specific to master/reference data:
Data Quality Rules: Define completeness, accuracy, and validity checks
Update Frequency: Specify expected refresh schedules
Access Controls: Implement stricter permissions if needed
Click Save to apply the categorization
Step 5: Understand Data Lineage
For master and reference data products, thorough lineage documentation is critical:
Navigate to the Lineage tab of your data product
Review the automatically generated lineage graph showing:
All connected data sources
Source-aligned data products
Each transformation step applied
The final master data product
Review downstream dependencies:
Use the Lineage UI to identify which data products and applications consume this master data
Understand the impact radius for potential changes
Document the transformation pipeline:
List each transformation applied and why
Explain how data quality improves through the pipeline
Note any data loss or filtering that occurs
Read Exploring Data Lineageto understand more.
Best Practices
Start with High-Impact Entities
Focus your initial MDM efforts on the master data that has the highest business impact:
Begin with customer or product data if you're in retail
Start with asset or equipment data if you're in logistics or manufacturing
Prioritize employee data if you're implementing HR analytics
Establish Clear Ownership
For each master data product:
Assign an owner from the domain that knows the data best
Designate a data steward responsible for day-to-day quality
Document escalation paths for data issues
Create a RACI matrix for data governance activities
Design Transformation Pipelines for Maintainability
When building transformation pipelines for master data:
Use descriptive names for each transformation step
Document the business rationale for each transformation
Keep transformations modular and reusable
Test transformations independently before chaining
Version your transformation logic alongside the data product
Favor library transformations over custom code for maintainability
Implement Progressive Governance
Don't try to enforce perfect governance from day one:
Phase 1: Categorize and document existing master data
Phase 2: Implement basic quality rules and monitoring
Phase 3: Add approval workflows and access controls
Phase 4: Establish formal data contracts and SLAs
Phase 5: Continuously improve based on usage patterns
Monitor and Maintain
Master data management is an ongoing process:
Review quality metrics weekly or monthly
Conduct quarterly reviews of categorizations
Update documentation as business needs evolve
Gather feedback from data consumers
Track and resolve quality issues promptly
Refine transformation logic based on discovered data issues
Common Use Cases
Customer Master Data
Scenario: Creating a single customer view across CRM, ERP, and support systems
Implementation:
Connect data sources: Salesforce (CRM), SAP (ERP), Zendesk (Support)
Create source-aligned data products for each system
Build a "Customer Master" consumer-aligned data product
Apply transformations from the library:
Standardize customer names and addresses
Normalize phone numbers and email addresses
Deduplicate records using fuzzy matching on name + address
Merge records with golden record logic (Salesforce for contact info, SAP for billing)
Enrich with geography data (state, country from postal code)
Validate email addresses and flag invalid entries
Categorize the Data Product as Master Data
Configure data quality rules for:
Unique customer IDs
Valid email formats and phone numbers
Complete address information
Review the lineage graphs showing transformation pipeline
Implement access controls for PII compliance
Monitor usage and quality metrics
Product Reference Data
Scenario: Maintaining standardized product categories and hierarchies
Implementation:
Connect to PIM system and e-commerce platform
Create source-aligned data products
Build "Product Categories" consumer-aligned data product
Apply transformations:
Standardize category names (title case, trim whitespace)
Build hierarchical relationships (parent-child)
Validate hierarchy completeness (no orphaned categories)
Add category descriptions from multiple sources
Categorize as Reference Data
Configure quality checks for:
Complete category hierarchies
Standardized naming conventions
No orphaned categories
Review the lineage graphs showing transformation pipeline
Implement access controls for PII compliance
Monitor usage and quality metrics
Employee Master Data
Scenario: Centralizing HR data for analytics and operations
Implementation:
Connect to HRIS (Workday), Active Directory, and Payroll system
Create source-aligned data products for each
Build "Employee Master" consumer-aligned data product
Apply transformations:
Standardize employee names
Deduplicate based on employee ID
Build organizational hierarchy from Active Directory
Join compensation data from payroll (with strict access controls)
Calculate tenure and other derived fields
Validate required fields (manager, department, hire date)
Categorize as Master Data with high sensitivity
Configure strict access controls and data masking for compensation fields
Implement quality rules for required fields
Last updated