Adding and Managing Data Sources
Data sources are the actual connection points that ingest data into Foundation from external systems. They represent the points where data is ingested from various external systems such as databases, file systems, APIs, and streaming sources. Each data source maintains the configuration and credentials needed to connect to and retrieve data from external systems.
When to Use Data Sources
Data sources are essential in the following scenarios:
External data ingestion: When you need to ingest data from external systems into your data platform
Reusable connections: When you want to establish a reusable connection configuration for a specific data source
Secure access: When you need to maintain secure credentials for accessing external systems
Pipeline automation: When you want to set up regular data ingestion pipelines from specific sources
Multiple data types: When working with various data formats (databases, cloud storage, APIs, streaming data)
Creating a Data Source
Creating a data source is a multi-step process that involves initial creation, linking to data systems, configuration, and setting up secrets.
Step 1: Initial Data Source Creation
Endpoint: POST /api/data/data_source
Request Body:
{
"entity": {
"name": "Data Source example",
"entity_type": "origin",
"label": "DSE",
"description": "This is an example for data source"
},
"entity_info": {
"owner": "[email protected]",
"contact_ids": [
"Data Source contact"
],
"links": [
"example.com"
]
}
}Required Headers
Authorization: Bearer {your_access_token}
x-org: {your_organization_name}Key Fields Explanation
name: Descriptive name for the data sourceentity_type: Always "origin" for data source endpointslabel: Short identification code (typically 3 letters)description: Detailed description of what this data source providesentity_info: Contact information for the person/team responsible for this data source
Response
The API returns a response with the data source details:
{
"entity": {
"identifier": "1a083d33-46a1-45e5-a709-fd3d5ac9823f",
"urn": "urn:meshx:backend:data:root:origin:1a083d33-46a1-45e5-a709-fd3d5ac9823f",
"name": "Data Source example test",
"is_system": false,
"description": "This is an example for data source",
"label": "DSE",
"created_at": "2025-04-10T13:21:18.078779Z",
"state": {
"code": "001",
"reason": "Requires configuration.",
"healthy": false
},
"owner": null
},
"entity_info": {
"owner": "[email protected]",
"contact_ids": [
"Data Source contact"
],
"links": [
"example.com"
]
},
"links": {
"parents": [],
"children": []
},
"compute_identifier": null,
"secrets": [],
"connection": null
}Important: Note that the initial state is "001" with reason "Requires configuration" and healthy status is "false". This is expected, as you'll need to set up the connection details next.
Step 2: Link Data Source to Data System
Endpoint: POST /api/data/link/data_system/data_source
Parameters:
identifier: The data system identifierchild_identifier: The data source identifier
Step 3: Configure Connection Details
Endpoint: PUT /api/data/data_source/connection?identifier={data_source_id}
Request Body (S3 example):
{
"connection": {
"connection_type": "s3",
"url": "s3-endpoint-url",
"access_key": {
"env_key": "S3_ACCESS_KEY"
},
"access_secret": {
"env_key": "S3_SECRET_KEY"
}
}
}Step 4: Set Connection Secrets
Endpoint: POST /api/data/data_source/secret?identifier={data_source_id}
Request Body:
{
"S3_ACCESS_KEY": "your_access_key_value",
"S3_SECRET_KEY": "your_secret_key_value"
}Supported Connection Types
Foundation currently supports the following connection types:
"database": For database connections (PostgreSQL, MySQL, etc.)"s3": For S3-compatible storage systems"synthetic": For generating synthetic data for testing purposes
Complete Python Example
Here's a comprehensive Python example that demonstrates the entire data source creation and configuration process:
def create_data_source(name, description, owner_email="[email protected]"):
"""Create a new data source"""
data_source_resp = requests.post(
f"{API_URL}/data/data_source",
headers=get_headers(),
json={
"entity": {
"name": name,
"entity_type": "origin",
"label": name[:3].upper(),
"description": description
},
"entity_info": {
"owner": owner_email,
"contact_ids": [f"{name} contact"],
"links": ["example.com"]
}
}
)
if data_source_resp.status_code == 200:
return data_source_resp.json()["entity"]["identifier"]
else:
print(f"Error creating data source: {data_source_resp.text}")
return None
def link_data_source_to_data_system(data_system_id, data_source_id):
"""Link a data source to a data system"""
link_resp = requests.post(
f"{API_URL}/data/link/data_system/data_source",
headers=get_headers(),
params={
"identifier": data_system_id,
"child_identifier": data_source_id
}
)
return link_resp.status_code == 200
def configure_s3_data_source(data_source_id, s3_url):
"""Configure S3 connection for a data source"""
connection_resp = requests.put(
f"{API_URL}/data/data_source/connection?identifier={data_source_id}",
headers=get_headers(),
json={
"connection": {
"connection_type": "s3",
"url": s3_url,
"access_key": {"env_key": "S3_ACCESS_KEY"},
"access_secret": {"env_key": "S3_SECRET_KEY"}
}
}
)
return connection_resp.status_code == 200
def set_s3_secrets(data_source_id, access_key, secret_key):
"""Set S3 access credentials for a data source"""
secrets_resp = requests.post(
f"{API_URL}/data/data_source/secret?identifier={data_source_id}",
headers=get_headers(),
json={
"S3_ACCESS_KEY": access_key,
"S3_SECRET_KEY": secret_key
}
)
return secrets_resp.status_code == 200Managing Existing Data Sources
Once you have created and configured data sources, you can perform various management operations:
List All Data Sources
GET /api/data/data_source/listGet Specific Data Source
GET /api/data/data_source?identifier={data_source_id}Update Data Source
PUT /api/data/data_source?identifier={data_source_id}Delete Data Source
DELETE /api/data/data_source?identifier={data_source_id}Important Notes
Secret Management: The keys in the secrets JSON object must match the
env_keyvalues you specified in the connection configurationConnection State: After proper configuration, the data source state should change from "001" (Requires configuration) to a healthy state
Cascading Effects: Remember that data sources feed into data objects, so any connection issues will affect downstream data processing
Security: Always use environment variables or secure secret management for sensitive connection details
Last updated