Best Practices for Modeling Data Sources

In Foundation, data sources are logical representations of data retrieved through connectors—they define what data is available after a connector executes its query or extraction. While connectors handle the physical connection and credentials, data sources represent the actual datasets that become available for transformation into data products.

Data Sources as Governance Boundaries

Each data source acts as a governance and security checkpoint that controls:

  • Access scope: Which specific tables, APIs, or datasets are exposed

  • Refresh patterns: Real-time streams vs. batch extracts

  • Audit points: Where data lineage begins

Balancing Risk and Maintainability

The Consolidation Paradox

Over-consolidation (everything through one connector/data source):

  • Single point of failure blocks all dependent data flows

  • Maintenance affects all downstream data products

  • Security breaches have wider blast radius

Over-distribution (too many connectors/data sources):

  • Increased maintenance overhead

  • Complex credential management

  • Difficult to enforce consistent policies

1. Separate Connectors by System and Domain

Within a single System, connectors can be mapped to the different Domains that the operational system holds data for.

For example:

SAP-Connector
├── Finance-Data-Source (GL, AP, AR tables)
├── Inventory-Data-Source (stock levels, movements)
└── Orders-Data-Source (sales orders, purchase orders)

CRM-Connector  
├── Customers-Data-Source (account data)
├── Opportunities-Data-Source (pipeline data)
└── Activities-Data-Source (interactions, tasks)

2. Align Data Sources with Organizational Ownership

Model data sources to match team responsibilities:

  • For example, if the Finance team owns the Financial Systems (like SAP Hana), then the data sources should be modeled based on how the Finance team is organised.

  • Each team should control access to their data sources

  • The owner for each data source should have clear accountability for data quality and freshness

3. Separate by Criticality and Processing Patterns

Real-time data sources: Isolate critical streams with dedicated connectors. For example:

  • Payment transactions

  • Operational sensors

  • System monitoring data

Batch data sources: Group related batch extracts. For example:

  • Nightly customer updates

  • Weekly inventory snapshots

  • Monthly financial reports

Practical Guidelines

The "Rule of 5-7": Aim for 5-7 data sources per connector for batch processes, fewer for real-time streams.

Define Clear Boundaries: Each data source should represent a cohesive dataset that:

  • Shares the same refresh frequency

  • Has consistent security requirements

  • Belongs to one business domain

  • Maintains similar quality standards

Key Design Principles

  1. Start with logical business groupings rather than technical system boundaries

  2. One connector failure shouldn't cascade across unrelated business processes

  3. Enforce clear ownership by leaning on the platform's operating model, which expects an owner for a Data Source.

  4. Monitor usage patterns and rebalance quarterly based on actual needs

Last updated