Setting up Custom Data Quality Checks for a Data Product
Data Quality System Overview
The Foundation backend provides a comprehensive data quality system that allows users to configure both automatic and custom data quality rules for their data products. The system supports:
Automatic Checks: Generated based on schema definitions (column types, constraints, etc.)
Custom Checks: User-defined rules using Great Expectations syntax
Quality scoring: Weighted scoring system with configurable thresholds
Validation execution: Automated quality checks and reporting
Key API Endpoints
1. Custom Expectation Management
Base URL: /api/data/data_product/quality/expectation/custom
Add Custom Expectation
POST /api/data/data_product/quality/expectation/custom?identifier={data_product_id}Request Body (ExpectationItem):
{
"type": "expect_column_values_to_be_between",
"kwargs": {
"column": "year",
"min_value": 1980,
"max_value": 2020
},
"meta": {
"description": "Expect a year min max values"
}
}Update Custom Expectation
PUT /api/data/data_product/quality/expectation/custom?identifier={data_product_id}&custom_identifier={expectation_id}Delete Custom Expectation
DELETE /api/data/data_product/quality/expectation/custom?identifier={data_product_id}&custom_identifier={expectation_id}2. Quality Configuration
Get Current Expectations
GET /api/data/data_product/quality/expectation?identifier={data_product_id}Update Quality Weights
PUT /api/data/data_product/quality/expectation/weights?identifier={data_product_id}Request Body:
{
"accuracy": 0.2,
"completeness": 0.3,
"consistency": 0.1,
"uniqueness": 0.1,
"validity": 0.3
}Update Quality Thresholds
PUT /api/data/data_product/quality/expectation/thresholds?identifier={data_product_id}Request Body:
{
"table": 0.8,
"columns": {
"column_name": {
"accuracy": 0.9,
"completeness": 0.8,
"consistency": 1.0,
"uniqueness": 0.0,
"validity": 0.5
}
}
}3. Quality Execution and Results
Run Quality Checks
POST /api/data/data_product/compute/builder/run/quality?identifier={data_product_id}Get Validation Results
GET /api/data/data_product/quality/validations?identifier={data_product_id}Get Quality Overview
GET /api/data/data_product/quality/overviewCustom Expectation Types
The system supports all Great Expectations expectation types. Here are common examples:
Column Value Expectations
{
"type": "expect_column_values_to_be_between",
"kwargs": {
"column": "age",
"min_value": 0,
"max_value": 120
},
"meta": {"description": "Age should be between 0 and 120"}
}Column Type Expectations
{
"type": "expect_column_values_to_be_of_type",
"kwargs": {
"column": "email",
"type_": "StringType"
},
"meta": {"description": "Email should be a string"}
}Uniqueness Expectations
{
"type": "expect_column_values_to_be_unique",
"kwargs": {
"column": "user_id"
},
"meta": {"description": "User IDs should be unique"}
}Null Value Expectations
{
"type": "expect_column_values_to_not_be_null",
"kwargs": {
"column": "required_field"
},
"meta": {"description": "Required field cannot be null"}
}Regex Pattern Expectations
{
"type": "expect_column_values_to_match_regex",
"kwargs": {
"column": "phone_number",
"regex": "^\\+?[1-9]\\d{1,14}$"
},
"meta": {"description": "Phone number should match international format"}
}Complete Workflow
1. Configure Custom Expectations
# Add a custom expectation
curl -X POST "/api/data/data_product/quality/expectation/custom?identifier=123e4567-e89b-12d3-a456-426614174000" \
-H "Content-Type: application/json" \
-d '{
"type": "expect_column_values_to_be_between",
"kwargs": {
"column": "revenue",
"min_value": 0,
"max_value": 1000000
},
"meta": {
"description": "Revenue should be between 0 and 1M"
}
}'2. Set Quality Weights (Optional)
curl -X PUT "/api/data/data_product/quality/expectation/weights?identifier=123e4567-e89b-12d3-a456-426614174000" \
-H "Content-Type: application/json" \
-d '{
"accuracy": 0.3,
"completeness": 0.2,
"consistency": 0.2,
"uniqueness": 0.1,
"validity": 0.2
}'3. Run Quality Checks
curl -X POST "/api/data/data_product/compute/builder/run/quality?identifier=123e4567-e89b-12d3-a456-426614174000" \
-H "Content-Type: application/json" \
-d '{
"config": {
"spark_config": {
"spark.sql.adaptive.enabled": "true"
}
}
}'4. Review Results
# Get validation results
curl "/api/data/data_product/quality/validations?identifier=123e4567-e89b-12d3-a456-426614174000"
# Get current expectations
curl "/api/data/data_product/quality/expectation?identifier=123e4567-e89b-12d3-a456-426614174000"Authentication & Permissions
All quality management endpoints require:
Manage permissions for creating/updating/deleting expectations
Read permissions for viewing results
Browse permissions for quality overview
The system uses the IAM framework to control access to data products and their quality configurations.
Last updated