Creating and Managing Data Products

Data Products

Data products are refined data assets with transformations applied, ready for consumption. They function as SQL tables containing processed data suitable for specific business purposes or analysis.

Data Product Types

  • Source-Aligned Data Products (SADPs): Direct transformations of raw data from data objects

  • Consumer-Aligned Data Products (CADPs): Transformations built on top of other data products, creating layers of refined data

When to Use Data Products

  • Business-ready data: Transform raw data into formats suitable for specific business purposes

  • Consumer needs: Create data tailored to specific analytical or operational requirements

  • Data refinement: Apply business logic, calculations, and aggregations to raw data

  • Consumption layer: Provide clean, structured data for applications and analytics tools

Creating a Data Product

Step 1: Create Data Product

Endpoint: POST /api/data/data_product

{
  "entity": {
    "name": "Customer Analytics Product",
    "entity_type": "data_product",
    "label": "CAP",
    "description": "Processed customer data for analytics dashboard"
  },
  "entity_info": {
    "owner": "[email protected]",
    "contact_ids": ["Data Product contact"],
    "links": ["example.com"]
  },
  "host_mesh_identifier": "mesh-id-here"
}

Step 2: Define Schema

Endpoint: PUT /api/data/data_product/schema?identifier={product_id}

Step 3: Assign to Mesh (if not done during creation)

Endpoint: PATCH /api/data/data_product?identifier={product_id}

Python Functions

Schema Field Structure

Supported Column Types

  • VARCHAR - Variable character strings

  • INTEGER - 32-bit integers

  • BIGINT - 64-bit integers

  • DECIMAL - Decimal numbers

  • DOUBLE - Double precision floating point

  • BOOLEAN - True/false values

  • DATE - Date values

  • TIMESTAMP - Timestamp values

  • TIMESTAMPTZ - Timestamp with timezone

  • JSON - JSON data

  • ARRAY - Array data

  • UUID - UUID values

Classification and Sensitivity Levels

  • Classification: Categorizes the data product according to the data classification policies set at Foundation-level (for example: Confidential, Public, Secret). Only one value is allowed.

  • Sensitivity: Set of tags defined at Foundation-level that classifies the data in the data product acording to different sensitivities or frameworks (for example: PII, Biometric,...). More than one value can be selected.

Management Operations

Example Usage

SADP vs CADP Examples

Efficient Data Product Design

  • Start with the end in mind – understand what consumers need

  • Reuse existing data products where possible

  • Design transformations for performance and maintainability

  • Implement appropriate data quality checks

Last updated