Adding and Managing Data Objects

Data Objects

Data objects store raw data ingested into Foundation without any transformation. They function as SQL tables containing initial data exactly as it comes from source systems before transformation is applied.

When to Use Data Objects

  • Raw data preservation: Store original data for audit, compliance, or historical purposes

  • Multiple product feeds: Make raw data available to multiple data products

  • Decoupled processing: Separate data ingestion from transformation processes

  • Source integrity: Maintain data exactly as received from external systems

Creating a Data Object

Step 1: Create Data Object

Endpoint: POST /api/data/data_object

{
  "entity": {
    "name": "Customer Transactions",
    "entity_type": "data_object",
    "label": "CTX",
    "description": "Raw customer transaction data from payment system"
  },
  "entity_info": {
    "owner": "[email protected]",
    "contact_ids": ["Data Object contact"],
    "links": ["example.com"]
  }
}

Endpoint: POST /api/data/link/data_source/data_object

Parameters:

  • identifier: Data source identifier

  • child_identifier: Data object identifier

Step 3: Configure Data Object

Endpoint: PUT /api/data/data_object/config?identifier={data_object_id}

Python Functions

Configuration Options

Supported Resource Types

  • "csv" - Comma-separated values

  • "json" - JSON format

  • "parquet" - Parquet format

  • "avro" - Avro format

  • "jdbc" - Database tables

CSV Configuration Fields

  • path: File location in the data source

  • has_header: First row contains column names

  • delimiter: Value separator (,, ;, |)

  • quote_char: Character for quoting values

  • escape_char: Character for escaping special characters

  • multi_line: Records can span multiple lines

Monitoring Data Object Status

Status Values

  • COMPLETED: Ingestion finished successfully

  • FAILED: Ingestion encountered an error

  • RUNNING: Ingestion currently executing

  • STARTING_UP: Ingestion is starting

  • SCHEDULED: Ingestion scheduled for execution

  • UNSCHEDULED: Job could not be scheduled

Example Usage

Last updated