Using the API to implement Data Transformations
Transformations
Purpose
Creating Transformations
Transformation Builder Structure
{
"config": {
"docker_tag": "0.0.23",
"executor_core_request": "800m",
"executor_core_limit": "1500m",
"executor_instances": 1,
"executor_memory": "5120m",
"driver_core_request": "0.3",
"driver_core_limit": "800m",
"driver_memory": "2048m"
},
"inputs": {
"input_data_object_id": {
"input_type": "data_object",
"identifier": "data-object-id-here",
"preview_limit": 10
}
},
"transformations": [
{
"transform": "cast",
"input": "input_data_object_id",
"output": "casted_data",
"changes": [
{
"column": "customer_id",
"data_type": "integer",
"kwargs": {}
}
]
}
],
"finalisers": {
"input": "casted_data",
"enable_quality": true,
"write_config": {"mode": "overwrite"},
"enable_profiling": true,
"enable_classification": false
},
"preview": false
}Python Functions
Common Transformations
Cast - Convert Data Types
Select Columns - Choose Specific Columns
Filter - Apply Conditions
Group By - Aggregate Data
Join - Combine Data Sources
Expression - Apply SQL-like Expressions
Complete Example
Monitoring and Validation
Status Values
Troubleshooting
Failed Transformations
Schema Validation Issues
Best Practices
Last updated