Using an LSTM Model

Model Overview

LSTM (Long Short-Term Memory) is a sophisticated deep learning architecture specifically designed for sequential data and multi-variate time-series forecasting. Within Foundation, LSTM networks excel at capturing long-term dependencies and complex temporal patterns across multiple features simultaneously. The model is particularly effective for scenarios requiring multi-step ahead predictions, handling irregular patterns in time-series data, and forecasting multiple correlated metrics that influence each other over time.

The LSTM implementation in Foundation introduces a unique multi-model ensemble approach that enables sophisticated feature group management. While appearing as a single model from the user interface, the system actually trains multiple specialized LSTM models, each focused on predicting specific feature subsets while considering the full context of all available features. This architecture allows for better specialization and performance when dealing with heterogeneous features from different domains that may have varying temporal dynamics.

Training and Inference Approach

LSTM in Foundation operates exclusively through a training/inference approach, requiring a two-phase workflow. During the training phase, the system processes your historical time-series data, trains one or more LSTM models based on your feature group configuration, and stores both the individual models and an ensemble metadata structure. During inference, the system loads these trained models and generates multi-step predictions for all specified features, maintaining temporal consistency across the forecast horizon.

Feature Groups and Multi-Model Architecture

The distinguishing characteristic of Foundation's LSTM implementation is its feature group mechanism. When you have a data product combining features from different domains or with different characteristics, you can define feature groups that partition your features into logical subsets. Each group receives its own specialized LSTM model that, while trained on all available features for context, focuses on predicting only its assigned subset. This approach leverages the full information available in your data while allowing each model to specialize in its specific prediction task.

If no feature groups are specified, the system automatically creates a single "default" group containing all numeric features. However, when you define multiple groups, the system trains separate models for each group and coordinates their predictions through an ensemble mechanism. This architecture is particularly powerful when you have features with different temporal patterns, scales, or business meanings that benefit from specialized treatment.

Model Metadata Data Product

The metadata data product handles the training of LSTM models and stores comprehensive information about each trained model as well as the ensemble configuration. Unlike simpler models, LSTM metadata includes records for each individual model plus an ensemble record that coordinates them.

Schema Configuration

The metadata schema captures information about individual models and the ensemble:

{
    "details": {
      "data_product_type": "stored",
      "fields": [
      {
        "name": "metadata",
        "primary": false,
        "optional": true,
        "data_type": {
          "column_type": "VARCHAR"
        },
        "classification": "internal"
      },
      {
        "name": "model_path",
        "primary": false,
        "optional": true,
        "data_type": {
          "column_type": "VARCHAR"
        },
        "classification": "internal"
      },
      {
        "name": "metadata_path",
        "primary": false,
        "optional": true,
        "data_type": {
          "column_type": "VARCHAR"
        },
        "classification": "internal"
      },
      {
        "name": "version",
        "primary": false,
        "optional": true,
        "data_type": {
          "column_type": "VARCHAR"
        },
        "classification": "internal"
      },
      {
        "name": "model_name",
        "primary": false,
        "optional": true,
        "data_type": {
          "column_type": "VARCHAR"
        },
        "classification": "internal"
      },
      {
        "name": "model_type",
        "primary": false,
        "optional": true,
        "data_type": {
          "column_type": "VARCHAR"
        },
        "classification": "internal"
      },
      {
        "name": "created_at",
        "primary": false,
        "optional": true,
        "data_type": {
          "column_type": "TIMESTAMPTZ"
        },
        "classification": "internal"
      }
    ]
  }
}

Builder Configuration with Feature Groups

The builder configuration demonstrates how to define multiple feature groups for specialized model training:

{
    "config": {
      "docker_tag": "0.0.58",
      "executor_core_request": "800m",
      "executor_core_limit": "4000m",
      "executor_instances": 1,
      "executor_memory": "7168m",
      "driver_memory": "3072m"
    },
    "inputs": {
      "input_features": {
        "input_type": "data_product",
        "identifier": "feature-dataset-id",
        "preview_limit": 10
      }
    },
    "transformations": [
      {
        "transform": "filter_with_condition",
        "input": "input_features",
        "output": "cleaned_data",
        "condition": "date >= '2023-01-01' AND date <= current_date()"
      },
      {
        "transform": "regression_training_lstm",
        "input": "cleaned_data",
        "output": "model_metadata",
        "timestamp_col": "date",
        "features_to_predict": null,
        "drop_cols": ["non_numeric_field"],
        "feature_groups": {
          "operations_metrics": [
            "production_volume",
            "machine_utilization",
            "downtime_hours"
          ],
          "quality_metrics": [
            "defect_rate",
            "inspection_passes",
            "rework_percentage"
          ],
          "financial_metrics": [
            "daily_revenue",
            "operational_cost",
            "margin_percentage"
          ]
        },
        "random_seed": 42,
        "train_ratio": 0.8,
        "scale_features": true,
        "validation_split": 0.2,
        "sequence_length": 90,
        "epochs": 20,
        "patience": 10,
        "batch_size": 32,
        "models_config": {
          "operations_metrics": {
            "units": 64,
            "num_layers": 3,
            "dropout": 0.3,
            "bidirectional": true
          },
          "quality_metrics": {
            "units": 32,
            "num_layers": 2,
            "dropout": 0.2,
            "bidirectional": false
          },
          "financial_metrics": {
            "units": 48,
            "num_layers": 2,
            "dropout": 0.25,
            "attention_heads": 2
          }
        },
        "model_bucket": "models",
        "project_name": "manufacturing_forecast"
      }
    ],
    "finalisers": {
      "input": "model_metadata",
      "write_config": {"mode": "overwrite"}
    }
  }
}

The models_config parameter allows you to specify different architectures for each feature group. Each configuration can include the following tunable parameters:

units (default: 32): The number of LSTM cells in each layer, controlling the model's capacity to learn complex patterns. Higher values increase model expressiveness but also computational requirements.
num_layers (default: 2): The depth of the LSTM network, determining how many stacked LSTM layers to use. Deeper networks can capture more complex hierarchical patterns but require more data to train effectively.
dropout (default: 0.2): The dropout rate applied between layers for regularization, helping prevent overfitting by randomly dropping connections during training. Values range from 0 to 1.
batch_norm (default: true): Whether to apply batch normalization after LSTM layers, which can accelerate training and improve stability by normalizing layer inputs.
activation (default: "tanh"): The activation function used in LSTM cells. Options include "tanh" for standard LSTM behavior or "relu" for potentially faster training in some scenarios.
learning_rate (default: 0.0002): The step size for gradient descent optimization, controlling how quickly the model adapts during training. Lower values provide more stable but slower convergence.
bidirectional (default: true): Whether to process sequences in both forward and backward directions, allowing the model to capture dependencies from both past and future context within the training window.
optimizer (default: "adam"): The optimization algorithm used for training. Options include "adam" (adaptive moment estimation), "sgd" (stochastic gradient descent), "rmsprop", "adagrad", "adadelta", "adamax", and "nadam", each with different convergence characteristics.
l2_reg (default: 0.0001): The L2 regularization coefficient applied to model weights, helping prevent overfitting by penalizing large weight values. Higher values increase regularization strength.
attention_heads (default: 1): The number of attention heads for multi-head attention mechanisms, allowing the model to focus on different aspects of the input sequence simultaneously.
attention_type (default: "self"): The type of attention mechanism to employ. Options include "self" for self-attention where the model attends to different positions within its own sequence, or "luong" for Luong-style attention which can be more effective for certain sequence-to-sequence tasks.

If no specific configuration is provided for a feature group, the system applies these default parameters which have been optimized for general time-series forecasting tasks. You can override any subset of these parameters for each feature group based on the specific characteristics of the features being predicted.

Metadata Output Format

The metadata data product generates multiple records - one for each individual model plus an ensemble record:

Individual Model Record:

{
  "metadata": {
    "model_name": "operations_metrics",
    "version": "v2",
    "training_date": "2025-01-20 10:15:00+00:00",
    "feature_group": "operations_metrics",
    "timestamp_column": "date",
    "predicted_features": [
      "production_volume",
      "machine_utilization",
      "downtime_hours"
    ],
    "feature_columns": [
      "production_volume", "machine_utilization", "downtime_hours",
      "defect_rate", "inspection_passes", "rework_percentage",
      "daily_revenue", "operational_cost", "margin_percentage",
      "hour", "dayofweek", "month", "quarter",
      "hour_sin", "hour_cos", "dayofweek_sin", "dayofweek_cos"
    ],
    "hyperparameters": {
      "units": 64,
      "num_layers": 3,
      "dropout": 0.3,
      "bidirectional": true,
      "sequence_length": 90,
      "epochs": 20
    }
  },
  "model_path": "models/manufacturing_forecast/models/operations_metrics/v2/model.keras",
  "model_type": "lstm"
}

Ensemble Record:

{
  "metadata": {
    "model_name": "manufacturing_forecast_lstm_model_ensemble",
    "version": "v2",
    "feature_groups": [
      "operations_metrics",
      "quality_metrics",
      "financial_metrics"
    ],
    "metrics": {
      "rmse": 2.34,
      "mae": 1.82,
      "smape": 12.5
    },
    "individual_models": [
      {
        "model_name": "operations_metrics",
        "version": "v2",
        "model_path": "models/manufacturing_forecast/models/operations_metrics/v2/model.keras"
      },
      {
        "model_name": "quality_metrics",
        "version": "v2",
        "model_path": "models/manufacturing_forecast/models/quality_metrics/v2/model.keras"
      },
      {
        "model_name": "financial_metrics",
        "version": "v2",
        "model_path": "models/manufacturing_forecast/models/financial_metrics/v2/model.keras"
      }
    ]
  },
  "model_type": "lstm_ensemble"
}

Predictions Data Product

The predictions data product loads the ensemble of trained models and generates forecasts for all features across the specified horizon. The system automatically coordinates predictions from all individual models to produce a unified output.

Schema Configuration

The predictions schema includes all features being predicted plus metadata:

{
  "details": {
    "data_product_type": "stored",
    "fields": [
      {
        "name": "production_volume",
        "primary": false,
        "optional": true,
        "data_type": {
          "column_type": "DOUBLE"
        },
        "classification": "internal"
      },
      {
        "name": "machine_utilization",
        "primary": false,
        "optional": true,
        "data_type": {
          "column_type": "DOUBLE"
        },
        "classification": "internal"
      },
      {
        "name": "downtime_hours",
        "primary": false,
        "optional": true,
        "data_type": {
          "column_type": "DOUBLE"
        },
        "classification": "internal"
      },
      {
        "name": "defect_rate",
        "primary": false,
        "optional": true,
        "data_type": {
          "column_type": "DOUBLE"
        },
        "classification": "internal"
      },
      {
        "name": "inspection_passes",
        "primary": false,
        "optional": true,
        "data_type": {
          "column_type": "INTEGER"
        },
        "classification": "internal"
      },
      {
        "name": "rework_percentage",
        "primary": false,
        "optional": true,
        "data_type": {
          "column_type": "DOUBLE"
        },
        "classification": "internal"
      },
      {
        "name": "daily_revenue",
        "primary": false,
        "optional": true,
        "data_type": {
          "column_type": "DOUBLE"
        },
        "classification": "internal"
      },
      {
        "name": "operational_cost",
        "primary": false,
        "optional": true,
        "data_type": {
          "column_type": "DOUBLE"
        },
        "classification": "internal"
      },
      {
        "name": "margin_percentage",
        "primary": false,
        "optional": true,
        "data_type": {
          "column_type": "DOUBLE"
        },
        "classification": "internal"
      },
      {
        "name": "date",
        "primary": false,
        "optional": true,
        "data_type": {
          "column_type": "DATE"
        },
        "classification": "internal"
      },
      {
        "name": "_predicted_at",
        "primary": false,
        "optional": true,
        "data_type": {
          "column_type": "TIMESTAMPTZ"
        },
        "classification": "internal"
      },
      {
        "name": "model_version",
        "primary": false,
        "optional": true,
        "data_type": {
          "column_type": "VARCHAR"
        },
        "classification": "internal"
      }
    ]
  }
}

Builder Configuration

The predictions builder coordinates the ensemble inference:

{
    "config": {
      "docker_tag": "0.0.58",
      "executor_core_request": "800m",
      "executor_core_limit": "4000m",
      "executor_instances": 1,
      "executor_memory": "7168m",
      "driver_memory": "3072m"
    },
    "inputs": {
      "input_features": {
        "input_type": "data_product",
        "identifier": "feature-dataset-id",
        "preview_limit": 10
      }
    },
    "transformations": [
      {
        "transform": "filter_with_condition",
        "input": "input_features",
        "output": "recent_data",
        "condition": "date >= current_date() - interval '180' day"
      },
      {
        "transform": "regression_prediction_lstm",
        "input": "recent_data",
        "output": "predictions",
        "timestamp_col": "date",
        "model_bucket": "models",
        "project_name": "manufacturing_forecast",
        "version": null,
        "forecast_horizon": 30,
        "sequence_length": 90
      },
      {
        "transform": "select_expression",
        "input": "predictions",
        "output": "formatted_predictions",
        "expressions": [
          "cast(production_volume as double) as production_volume",
          "cast(machine_utilization as double) as machine_utilization",
          "cast(downtime_hours as double) as downtime_hours",
          "cast(defect_rate as double) as defect_rate",
          "cast(inspection_passes as integer) as inspection_passes",
          "cast(rework_percentage as double) as rework_percentage",
          "cast(daily_revenue as double) as daily_revenue",
          "cast(operational_cost as double) as operational_cost",
          "cast(margin_percentage as double) as margin_percentage",
          "to_date(date) as date",
          "_predicted_at",
          "model_version"
        ]
      }
    ],
    "finalisers": {
      "input": "formatted_predictions",
      "write_config": {"mode": "overwrite"}
    }
  }
}

The sequence_length parameter must match the value used during training, as it defines the historical context window the model expects. The forecast_horizon determines how many future time steps to predict, with the model generating day-by-day predictions that maintain temporal dependencies across all features.

Prediction Output Format

The predictions data product generates comprehensive forecasts for all features:

date

production_volume

machine_utilization

downtime_hours

defect_rate

daily_revenue

model_version

_predicted_at

2025-01-21

1,250.5

0.88

2.3

0.02

45,678.9

2025-01-20T14:30:00.000Z

2025-01-22

1,185.3

0.83

3.1

0.03

43,256.78

2025-01-20T14:30:00.000Z

2025-01-23

1,302.7

0.89

1.8

0.02

47,523.45

2025-01-20T14:30:00.000Z

Model Architecture and Training Process

The LSTM training process involves sophisticated coordination between multiple models. Each feature group's model receives the complete feature set as input, enabling it to learn from cross-domain relationships and dependencies. However, the loss function for each model is computed only on its assigned prediction features, allowing specialization while maintaining awareness of the broader context.The system automatically adds temporal features to enrich the input space, including cyclical encodings of time components that help the models capture seasonal patterns. Feature scaling is typically enabled for LSTM models to ensure numerical stability during training, with the scaler parameters stored alongside the models for consistent application during inference.During training, each model uses early stopping with patience to prevent overfitting, monitoring validation loss to determine optimal training duration. The models can employ various architectural enhancements including bidirectional processing for capturing both forward and backward temporal dependencies, attention mechanisms for focusing on relevant time steps, and dropout layers for regularization.

Ensemble Coordination and Prediction

During inference, the ensemble coordinator loads all individual models and orchestrates their predictions. The system processes the input sequence through each model, with each model generating predictions for its assigned features. These partial predictions are then combined to form the complete feature vector for each future time step.The day-by-day prediction approach ensures temporal consistency, where each predicted time step becomes part of the input for predicting the next step. This rolling forecast mechanism maintains the temporal relationships learned during training and enables the model to generate coherent multi-step predictions that respect the interdependencies between features.The ensemble approach offers several advantages over training a single large model. It enables better handling of features with different scales and patterns, allows for model-specific hyperparameter tuning, provides more interpretable results by showing which models contribute to which predictions, and offers flexibility to retrain individual models without affecting others.

PreviousUsing a LightGBM Model NextUsing a K-Means Model

Last updated 2 months ago