Glossary
Core Concepts
Data Mesh
A data mesh serves as the foundational container for your entire data process. It represents a logical boundary that could correspond to a project, department, or business domain within your organization. For example, a retail company might have separate meshes for "Sales Analytics", "Inventory Management", and "Customer Experience", each containing the relevant data infrastructure for that domain.
Data System
Data systems act as organizational containers that group multiple data sources together. They provide a logical structure for managing related data ingestion endpoints. For instance, a "Customer Data System" might group together data sources from your CRM database, customer support tickets, and website analytics, maintaining organizational clarity and simplifying access control management.
Data Source
Data sources are the actual connection points that ingest data into Foundation. These can be databases (PostgreSQL, MySQL), cloud storage (S3), or any other data origin. Each data source maintains the configuration and credentials needed to connect to and retrieve data from external systems.
Data Object
Data objects are containers that store raw, unprocessed data exactly as it arrives from data sources. When a CSV file is ingested from an S3 bucket or records are pulled from a database, they are stored in data objects without any transformation. This preservation of original data is crucial for auditing, compliance, and enabling different transformation strategies.
Note: Currently, data objects support only structured data, but this may evolve in the future to support additional data types.
Data Product
Data products represent the refined, transformed data assets ready for consumption. They are the end result of transformations applied to data objects or other products, making the data suitable for specific business purposes or analysis. Data products come in two types:
Source-Aligned Data Products (SADPs): Direct transformations of raw data from data objects
Consumer-Aligned Data Products (CADPs): Transformations built on top of other data products, creating layers of refined data
Application
Applications represent external systems that consume the processed data from Foundation. These could be business intelligence tools like PowerBI, custom dashboards, machine learning platforms, or any system that needs access to your transformed data.
Data Architecture and Workflow
The true power of Foundation emerges when these entities are connected to create data pipelines. The typical data workflow follows this pattern:
Data systems contain multiple data sources, with each source ingesting data in various formats (JSON, CSV, binary). The raw data flows into data objects for storage, then undergoes transformation in data products, and finally feeds into external applications for consumption and analysis.
Relationship Constraints
It's important to understand the relationship within this architecture:
System relationships: A data system can contain multiple data sources, and a data source can connect to multiple data objects
Object relationships: Each data object ingests from only one data source
Product relationships: Data products can consume from multiple data objects or other data products and output to multiple downstream products or applications
Cascading Effects
This interconnected architecture means that failures at any point create cascading effects downstream. If a data source connection fails, all dependent data objects, products, and applications will be affected. Every operation except initial element creation triggers a Spark job that executes on the cluster, whether it's establishing connections, loading data, or performing transformations.
Last updated