Foundation's ML Approach

Foundation provides an integrated machine learning library that seamlessly works within the platform's transformation framework. ML models are implemented as transformations, following the same logic and patterns as any other data transformation in Foundation. This approach ensures consistency and ease of use while leveraging Foundation's existing infrastructure for data processing, storage, and lineage tracking.

Supported Models

Foundation currently supports the following ML models:

  • LightGBM: Gradient boosting framework for regression and time-series forecasting

  • LSTM: Long Short-Term Memory networks for multi-variate time-series prediction

  • K-Means: Clustering algorithm with outlier detection capabilities

Our ML library is continuously expanding, and we expect to add more models based on user needs and use cases.

ML Ops Approaches

Foundation supports two distinct approaches for machine learning workflows:

Training/Inference Approach (Model Persistence)

This approach separates model training and inference into distinct steps:Training Phase:

  • Train a model on historical data

  • Store the trained model in Foundation's storage under the /models directory

  • Generate a metadata data product containing model performance metrics and configuration

Inference Phase:

  • Load a previously trained model from storage

  • Apply the model to new data for predictions

  • Generate a predictions data product with results

This approach requires creating two data products:

  1. Metadata Data Product: Stores model training information, metrics, and configuration

  2. Predictions Data Product: Stores the inference results

Each data product requires:

  • A defined schema matching the expected output structure

  • A builder configuration (training builder for metadata, inference builder for predictions)

  • Linkage to the source data product containing the features

Transient Inference Approach (Single Execution)

This approach combines training and inference in a single transformation:

  • Train a model on the input data

  • Immediately apply it to generate predictions

  • Model is not persisted for future use

This approach requires one data product:

  • A single data product with schema and builder configuration

  • Direct transformation from input features to predictions

  • No model storage or versioning

Last updated