Importing Files into Foundation

This guide provides comprehensive instructions for importing files into the Foundation platform for data processing. It covers both UI-based and programmatic approaches for file upload, along with the subsequent steps to transform uploaded files into data objects ready for processing.

What This Guide Covers

This guide specifically addresses scenarios where you need to:

  • Upload files directly to Foundation's managed storage

  • Create data objects from uploaded files

  • Set up the complete pipeline from file upload to data ingestion

Prerequisites

Before importing files, ensure you have:

  • A configured Data Source that can connect to Foundation's storage

  • Appropriate permissions to upload files and create data objects

  • Access to your Foundation environment's storage interface

Method 1: Importing Files via Storage UI

The Storage UI provides a visual interface for uploading files directly to Foundation's managed storage buckets.

Accessing the Storage UI

Navigate to your Foundation's storage interface:

Replace <env_name> with your specific Foundation name (e.g., stg, prod, dev).

Understanding Storage Buckets

When you access the storage interface, you will see several pre-configured buckets, each serving a specific purpose:

  • connectors: Configuration files for data connectors

  • iceberg: Iceberg table storage

  • models: Machine learning model storage

  • samples: Sample data and test files (recommended for user uploads)

  • warehouse: PostgreSQL table exports

For general file uploads, we recommend using a dedicated bucket (like samples) or creating your own bucket for better organization.

Uploading Files

  1. Select Your Target Bucket

    1. Click on the bucket where you want to store your files

    2. For testing and general use, samples is typically appropriate

  2. Initiate Upload

    1. Click the "Upload" button in the top-right corner of the interface

  1. Select Files or Folders

    1. Choose Files: Select individual files from your local machine

    2. Choose Folder: Upload entire folder structures

    3. Maximum file size: 5GB per file

  2. Complete Upload

    1. Files will be uploaded to the current directory in the selected bucket

    2. You can create subdirectories by navigating into them before uploading

File Organization Best Practices

Method 2: Importing Files via AWS CLI

Foundation's Storage Engine provides an S3-compatible proxy that allows you to use standard AWS S3 commands regardless of the underlying cloud provider. This enables programmatic and batch file uploads.

Configuration

First, configure your AWS CLI with the appropriate endpoint and credentails:

Note: The storage credentials (username/password) should be provided by your Foundation administrator. These are specific to the Foundation storage system and are different from your Foundation UI login credentials. For persistent configuration, you can also create a profile in your AWS credentials file:

Common S3 Operations

List Available Buckets

Create a New Bucket

Upload a Single File

Upload Multiple Files (Sync Directory)

List Files in a Bucket

Download Files

Creating Data Objects from Uploaded Files

After uploading files to Foundation storage, you need to create Data Objects to make them available for processing.

Step 1: Ensure Data Source Configuration

Your data source must be configured to connect to Foundation's S3 storage:

Step 2: Create Data Object

Create a data object pointing to your uploaded file:

Connect the data object to your S3 data source:

Step 4: Configure File Path

Point the data object to your uploaded file:

For different file types, adjust the configuration:

Monitoring Import Status

After creating a data object, monitor its ingestion status:

Troubleshooting Common Issues

File Not Found After Upload

  • Verify the bucket and path in your data object configuration

  • Ensure the path starts with / and includes the bucket name

  • Check file permissions in the S3 storage

Data Object Stuck in Unhealthy State

  • Review compute job logs using /api/data/compute/log?identifier={compute_id}

  • Verify data source credentials are correctly configured

  • Check file format matches the configured data_object_type

Authentication Issues

  • Verify your S3 credentials are correctly set in the data source

  • Ensure your user has necessary permissions for the bucket

  • Check endpoint URL matches your environment

Best Practices

  1. Organize Files Logically: Create a clear folder structure in your storage buckets

  2. Use Descriptive Names: Include dates and data types in file names

  3. Batch Similar Files: Group related files in the same upload session

  4. Validate Before Upload: Check file formats and data quality locally first

  5. Monitor Ingestion: Always verify data objects reach healthy state after configuration

  6. Document Data Sources: Maintain clear documentation of what each uploaded file contains

  7. Clean Up Old Files: Regularly remove obsolete files from storage to manage costs

Next Steps

After successfully importing files and creating data objects:

  1. Create Source-Aligned Data Products (SADPs) to transform the raw data

  2. Set up regular ingestion schedules if files are updated periodically

  3. Configure data quality checks on the ingested data

  4. Build Consumer-Aligned Data Products (CADPs) for specific use cases

Last updated