Importing Files into Foundation
This guide provides comprehensive instructions for importing files into the Foundation platform for data processing. It covers both UI-based and programmatic approaches for file upload, along with the subsequent steps to transform uploaded files into data objects ready for processing.
What This Guide Covers
This guide specifically addresses scenarios where you need to:
Upload files directly to Foundation's managed storage
Create data objects from uploaded files
Set up the complete pipeline from file upload to data ingestion
Prerequisites
Before importing files, ensure you have:
A configured Data Source that can connect to Foundation's storage
Appropriate permissions to upload files and create data objects
Access to your Foundation environment's storage interface
Method 1: Importing Files via Storage UI
The Storage UI provides a visual interface for uploading files directly to Foundation's managed storage buckets.
Accessing the Storage UI
Navigate to your Foundation's storage interface:
https://storage.<env_name>.meshx.foundation/uiReplace <env_name> with your specific Foundation name (e.g., stg, prod, dev).
Understanding Storage Buckets
When you access the storage interface, you will see several pre-configured buckets, each serving a specific purpose:
connectors: Configuration files for data connectors
iceberg: Iceberg table storage
models: Machine learning model storage
samples: Sample data and test files (recommended for user uploads)
warehouse: PostgreSQL table exports
For general file uploads, we recommend using a dedicated bucket (like samples) or creating your own bucket for better organization.
Uploading Files
Select Your Target Bucket
Click on the bucket where you want to store your files
For testing and general use,
samplesis typically appropriate
Initiate Upload
Click the "Upload" button in the top-right corner of the interface
Select Files or Folders
Choose Files: Select individual files from your local machine
Choose Folder: Upload entire folder structures
Maximum file size: 5GB per file
Complete Upload
Files will be uploaded to the current directory in the selected bucket
You can create subdirectories by navigating into them before uploading
File Organization Best Practices
samples/
├── project-alpha/
│ ├── raw-data/
│ │ ├── transactions-2024.csv
│ │ └── customers-2024.csv
│ └── processed/
└── project-beta/
└── daily-exports/Method 2: Importing Files via AWS CLI
Foundation's Storage Engine provides an S3-compatible proxy that allows you to use standard AWS S3 commands regardless of the underlying cloud provider. This enables programmatic and batch file uploads.
Configuration
First, configure your AWS CLI with the appropriate endpoint and credentails:
# Set the endpoint URL for your environment
export S3_ENDPOINT_URL="https://storage.<env_name>.meshx.foundation/ui"
# Example for staging environment
export S3_ENDPOINT_URL="https://storage.stg.meshx.foundation"
# Configure AWS credentials for Foundation storage
aws configure set aws_access_key_id <your_storage_username>
aws configure set aws_secret_access_key <your_storage_password>
aws configure set region us-east-1 # Default region, adjust if needed
# Alternative: Set credentials as environment variables
export AWS_ACCESS_KEY_ID=<your_storage_username>
export AWS_SECRET_ACCESS_KEY=<your_storage_password>Note: The storage credentials (username/password) should be provided by your Foundation administrator. These are specific to the Foundation storage system and are different from your Foundation UI login credentials. For persistent configuration, you can also create a profile in your AWS credentials file:
# Edit ~/.aws/credentials
[foundation-storage]
aws_access_key_id = <your_storage_username>
aws_secret_access_key = <your_storage_password>
# Then use the profile in commands
aws s3 ls --endpoint-url $S3_ENDPOINT_URL --profile foundation-storageCommon S3 Operations
List Available Buckets
aws s3 mb s3://<bucket_name> --endpoint-url $S3_ENDPOINT_URL
# Example
aws s3 mb s3://my-data-bucket --endpoint-url $S3_ENDPOINT_URLCreate a New Bucket
aws s3 mb s3://<bucket_name> --endpoint-url $S3_ENDPOINT_URL
# Example
aws s3 mb s3://my-data-bucket --endpoint-url $S3_ENDPOINT_URLUpload a Single File
aws s3 cp <local_file_path> s3://<bucket_name>/<destination_path> --endpoint-url $S3_ENDPOINT_URL
# Example
aws s3 cp /home/user/data/sales_2024.csv s3://samples/sales/sales_2024.csv --endpoint-url $S3_ENDPOINT_URLUpload Multiple Files (Sync Directory)
aws s3 sync <local_directory> s3://<bucket_name>/<destination_path> --endpoint-url $S3_ENDPOINT_URL
# Example
aws s3 sync "/mnt/c/Users/user/data/cargo-output" s3://samples/cargo-synthetic/ --endpoint-url $S3_ENDPOINT_URLList Files in a Bucket
aws s3 ls s3://<bucket_name>/<path> --endpoint-url $S3_ENDPOINT_URL --recursive
# Example
aws s3 ls s3://samples/cargo-synthetic/ --endpoint-url $S3_ENDPOINT_URL --recursiveDownload Files
aws s3 cp s3://<bucket_name>/<file_path> <local_destination> --endpoint-url $S3_ENDPOINT_URL
# Example
aws s3 cp s3://samples/sales/report.csv ./local_reports/report.csv --endpoint-url $S3_ENDPOINT_URLCreating Data Objects from Uploaded Files
After uploading files to Foundation storage, you need to create Data Objects to make them available for processing.
Step 1: Ensure Data Source Configuration
Your data source must be configured to connect to Foundation's S3 storage:
# Configure S3 data source (if not already done)
PUT /api/data/data_source/connection?identifier={data_source_id}
{
"connection": {
"connection_type": "s3",
"url": "https://storage.<env_name>.meshx.foundation",
"access_key": {
"env_key": "S3_ACCESS_KEY"
},
"access_secret": {
"env_key": "S3_SECRET_KEY"
}
}
}Step 2: Create Data Object
Create a data object pointing to your uploaded file:
POST /api/data/data_object
{
"entity": {
"name": "Cargo Transactions Upload",
"entity_type": "data_object",
"label": "CTU",
"description": "Uploaded cargo transaction data from local system"
"owner": "[email protected]",
},
"entity_info": {
"owner": "[email protected]",
"contact_ids": ["Data Engineering Team"],
"links": []
}
}Step 3: Link to Data Source
Connect the data object to your S3 data source:
POST /api/data/link/data_source/data_object
Parameters:
- identifier: {s3_data_source_id}
- child_identifier: {data_object_id}Step 4: Configure File Path
Point the data object to your uploaded file:
PUT /api/data/data_object/config?identifier={data_object_id}
{
"configuration": {
"data_object_type": "csv",
"path": "/samples/cargo-synthetic/transactions_2024.csv",
"has_header": true,
"delimiter": ",",
"quote_char": "\"",
"escape_char": null,
"multi_line": false
}
}For different file types, adjust the configuration:
Monitoring Import Status
After creating a data object, monitor its ingestion status:
def check_import_status(data_object_id):
"""Check if file import completed successfully"""
# Get data object details
resp = requests.get(
f"{API_URL}/data/data_object?identifier={data_object_id}",
headers=get_headers()
)
if resp.status_code != 200:
return "Unable to fetch status"
data = resp.json()
# Check state
state = data["entity"]["state"]
print(f"Status: {state['code']} - {state['reason']}")
print(f"Healthy: {state['healthy']}")
# Check compute job if exists
compute_id = data.get("compute_identifier")
if compute_id:
compute_resp = requests.get(
f"{API_URL}/data/compute?identifier={compute_id}",
headers=get_headers()
)
if compute_resp.status_code == 200:
job_status = compute_resp.json()["status"]["status"]
print(f"Ingestion job: {job_status}")
return state["healthy"]Troubleshooting Common Issues
File Not Found After Upload
Verify the bucket and path in your data object configuration
Ensure the path starts with
/and includes the bucket nameCheck file permissions in the S3 storage
Data Object Stuck in Unhealthy State
Review compute job logs using
/api/data/compute/log?identifier={compute_id}Verify data source credentials are correctly configured
Check file format matches the configured
data_object_type
Authentication Issues
Verify your S3 credentials are correctly set in the data source
Ensure your user has necessary permissions for the bucket
Check endpoint URL matches your environment
Best Practices
Organize Files Logically: Create a clear folder structure in your storage buckets
Use Descriptive Names: Include dates and data types in file names
Batch Similar Files: Group related files in the same upload session
Validate Before Upload: Check file formats and data quality locally first
Monitor Ingestion: Always verify data objects reach healthy state after configuration
Document Data Sources: Maintain clear documentation of what each uploaded file contains
Clean Up Old Files: Regularly remove obsolete files from storage to manage costs
Next Steps
After successfully importing files and creating data objects:
Create Source-Aligned Data Products (SADPs) to transform the raw data
Set up regular ingestion schedules if files are updated periodically
Configure data quality checks on the ingested data
Build Consumer-Aligned Data Products (CADPs) for specific use cases
Last updated