Data sources are the actual connection points that ingest data into Foundation from external systems. They represent the points where data is ingested from various external systems such as databases, file systems, APIs, and streaming sources. Each data source maintains the configuration and credentials needed to connect to and retrieve data from external systems.
When to Use Data Sources
Data sources are essential in the following scenarios:
External data ingestion: When you need to ingest data from external systems into your data platform
Reusable connections: When you want to establish a reusable connection configuration for a specific data source
Secure access: When you need to maintain secure credentials for accessing external systems
Pipeline automation: When you want to set up regular data ingestion pipelines from specific sources
Multiple data types: When working with various data formats (databases, cloud storage, APIs, streaming data)
Creating a Data Source
Creating a data source is a multi-step process that involves initial creation, linking to data systems, configuration, and setting up secrets.
Step 1: Initial Data Source Creation
Endpoint: POST /api/data/data_source
Request Body:
{"entity":{"name":"Data Source example","entity_type":"origin","label":"DSE","description":"This is an example for data source"},"entity_info":{"owner":"[email protected]","contact_ids":["Data Source contact"],"links":["example.com"]}}
Required Headers
Key Fields Explanation
name: Descriptive name for the data source
entity_type: Always "origin" for data source endpoints
label: Short identification code (typically 3 letters)
description: Detailed description of what this data source provides
entity_info: Contact information for the person/team responsible for this data source
Response
The API returns a response with the data source details:
Important: Note that the initial state is "001" with reason "Requires configuration" and healthy status is "false". This is expected, as you'll need to set up the connection details next.
Step 2: Link Data Source to Data System
Endpoint: POST /api/data/link/data_system/data_source
Parameters:
identifier: The data system identifier
child_identifier: The data source identifier
Step 3: Configure Connection Details
Endpoint: PUT /api/data/data_source/connection?identifier={data_source_id}
Request Body (S3 example):
Step 4: Set Connection Secrets
Endpoint: POST /api/data/data_source/secret?identifier={data_source_id}
Request Body:
Supported Connection Types
Foundation currently supports the following connection types:
"database": For database connections (PostgreSQL, MySQL, etc.)
"s3": For S3-compatible storage systems
"synthetic": For generating synthetic data for testing purposes
Complete Python Example
Here's a comprehensive Python example that demonstrates the entire data source creation and configuration process:
Managing Existing Data Sources
Once you have created and configured data sources, you can perform various management operations:
List All Data Sources
Get Specific Data Source
Update Data Source
Delete Data Source
Important Notes
Secret Management: The keys in the secrets JSON object must match the env_key values you specified in the connection configuration
Connection State: After proper configuration, the data source state should change from "001" (Requires configuration) to a healthy state
Cascading Effects: Remember that data sources feed into data objects, so any connection issues will affect downstream data processing
Security: Always use environment variables or secure secret management for sensitive connection details