Best Practices for Modeling Data Sources
In Foundation, data sources are logical representations of data retrieved through connectors—they define what data is available after a connector executes its query or extraction. While connectors handle the physical connection and credentials, data sources represent the actual datasets that become available for transformation into data products.
Data Sources as Governance Boundaries
Each data source acts as a governance and security checkpoint that controls:
Access scope: Which specific tables, APIs, or datasets are exposed
Refresh patterns: Real-time streams vs. batch extracts
Audit points: Where data lineage begins
Balancing Risk and Maintainability
The Consolidation Paradox
Over-consolidation (everything through one connector/data source):
Single point of failure blocks all dependent data flows
Maintenance affects all downstream data products
Security breaches have wider blast radius
Over-distribution (too many connectors/data sources):
Increased maintenance overhead
Complex credential management
Difficult to enforce consistent policies
Recommended Modeling Strategies
1. Separate Connectors by System and Domain
Within a single System, connectors can be mapped to the different Domains that the operational system holds data for.
For example:
SAP-Connector
├── Finance-Data-Source (GL, AP, AR tables)
├── Inventory-Data-Source (stock levels, movements)
└── Orders-Data-Source (sales orders, purchase orders)
CRM-Connector
├── Customers-Data-Source (account data)
├── Opportunities-Data-Source (pipeline data)
└── Activities-Data-Source (interactions, tasks)2. Align Data Sources with Organizational Ownership
Model data sources to match team responsibilities:
For example, if the Finance team owns the Financial Systems (like SAP Hana), then the data sources should be modeled based on how the Finance team is organised.
Each team should control access to their data sources
The owner for each data source should have clear accountability for data quality and freshness
3. Separate by Criticality and Processing Patterns
Real-time data sources: Isolate critical streams with dedicated connectors. For example:
Payment transactions
Operational sensors
System monitoring data
Batch data sources: Group related batch extracts. For example:
Nightly customer updates
Weekly inventory snapshots
Monthly financial reports
Practical Guidelines
The "Rule of 5-7": Aim for 5-7 data sources per connector for batch processes, fewer for real-time streams.
Define Clear Boundaries: Each data source should represent a cohesive dataset that:
Shares the same refresh frequency
Has consistent security requirements
Belongs to one business domain
Maintains similar quality standards
Key Design Principles
Start with logical business groupings rather than technical system boundaries
One connector failure shouldn't cascade across unrelated business processes
Enforce clear ownership by leaning on the platform's operating model, which expects an owner for a Data Source.
Monitor usage patterns and rebalance quarterly based on actual needs
Last updated