Foundation's ML Approach
Foundation provides an integrated machine learning library that seamlessly works within the platform's transformation framework. ML models are implemented as transformations, following the same logic and patterns as any other data transformation in Foundation. This approach ensures consistency and ease of use while leveraging Foundation's existing infrastructure for data processing, storage, and lineage tracking.
Supported Models
Foundation currently supports the following ML models:
LightGBM: Gradient boosting framework for regression and time-series forecasting
LSTM: Long Short-Term Memory networks for multi-variate time-series prediction
K-Means: Clustering algorithm with outlier detection capabilities
Our ML library is continuously expanding, and we expect to add more models based on user needs and use cases.
ML Ops Approaches
Foundation supports two distinct approaches for machine learning workflows:
Training/Inference Approach (Model Persistence)
This approach separates model training and inference into distinct steps:Training Phase:
Train a model on historical data
Store the trained model in Foundation's storage under the /models directory
Generate a metadata data product containing model performance metrics and configuration
Inference Phase:
Load a previously trained model from storage
Apply the model to new data for predictions
Generate a predictions data product with results
This approach requires creating two data products:
Metadata Data Product: Stores model training information, metrics, and configuration
Predictions Data Product: Stores the inference results
Each data product requires:
A defined schema matching the expected output structure
A builder configuration (training builder for metadata, inference builder for predictions)
Linkage to the source data product containing the features
Transient Inference Approach (Single Execution)
This approach combines training and inference in a single transformation:
Train a model on the input data
Immediately apply it to generate predictions
Model is not persisted for future use
This approach requires one data product:
A single data product with schema and builder configuration
Direct transformation from input features to predictions
No model storage or versioning
Last updated