Logging Runs

Professional+

A run in Calabi ML captures everything about one model training execution: the hyperparameters configured, the metrics measured, the files produced, and the tags assigned. This page covers the full Python SDK API for logging, autologging integrations with popular frameworks, and patterns for logging from Calabi Notebooks.

Setting Up the Tracking URI

Before logging anything, point the Calabi ML client at your Calabi ML server:

import mlflow

mlflow.set_tracking_uri("https://calabi.<your-domain>/mlflow")
mlflow.set_experiment("my-experiment")

Set the tracking URI once per session (or per script). It is also accepted via the environment variable MLFLOW_TRACKING_URI:

export MLFLOW_TRACKING_URI="https://calabi.<your-domain>/mlflow"

Starting a Run

All logging calls must happen inside an active run context. Use mlflow.start_run() as a context manager:

with mlflow.start_run(run_name="xgboost_tuned_v1"):
    # All logging calls here are associated with this run
    mlflow.log_param("n_estimators", 200)
    mlflow.log_metric("val_auc", 0.88)

When the with block exits, the run is automatically ended and its state is set to FINISHED.

Run Parameters

Parameter	Type	Description
`run_name`	str	Human-readable name shown in the UI
`run_id`	str	Resume a specific existing run
`experiment_id`	str	Override the active experiment
`tags`	dict	Key-value tags set on run creation
`description`	str	Free-text description of the run

Logging Parameters

Parameters are configuration values set before training begins. They are immutable once logged for a given run — use them for hyperparameters, dataset versions, and model architecture choices.

with mlflow.start_run():
    # Log individual parameters
    mlflow.log_param("learning_rate", 0.01)
    mlflow.log_param("max_depth", 6)
    mlflow.log_param("n_estimators", 300)
    mlflow.log_param("random_seed", 42)
    mlflow.log_param("dataset_version", "v3.2")

    # Log all parameters at once from a dict
    params = {
        "learning_rate": 0.01,
        "max_depth": 6,
        "n_estimators": 300,
        "colsample_bytree": 0.8,
        "subsample": 0.9,
    }
    mlflow.log_params(params)

Log everything you might need later

Log all hyperparameters, even those you're not actively tuning. When comparing runs weeks later, you will want to verify that "fixed" parameters were identical across the runs you're comparing.

Logging Metrics

Metrics are measured outcomes logged during or after training. Unlike parameters, metrics can be logged multiple times to capture how they evolve over training steps or epochs.

with mlflow.start_run():
    for epoch in range(100):
        train_loss, val_loss = train_one_epoch(model, epoch)
        train_acc, val_acc = evaluate(model)

        # Log with step to track training curves
        mlflow.log_metric("train_loss", train_loss, step=epoch)
        mlflow.log_metric("val_loss", val_loss, step=epoch)
        mlflow.log_metric("train_accuracy", train_acc, step=epoch)
        mlflow.log_metric("val_accuracy", val_acc, step=epoch)

    # Log final summary metrics
    mlflow.log_metric("test_accuracy", test_accuracy)
    mlflow.log_metric("test_f1", test_f1)
    mlflow.log_metric("test_auc_roc", test_auc)

Logging Multiple Metrics at Once

final_metrics = {
    "test_accuracy": 0.923,
    "test_precision": 0.901,
    "test_recall": 0.914,
    "test_f1": 0.907,
    "test_auc_roc": 0.971,
}
mlflow.log_metrics(final_metrics)

Logging Artifacts

Artifacts are files associated with a run: model weights, serialised pipelines, plots, confusion matrices, feature importance tables, and any other output file.

Logging a Single File

import matplotlib.pyplot as plt
import pandas as pd

with mlflow.start_run():
    # Log a plot
    fig, ax = plt.subplots()
    ax.plot(train_losses, label="Train")
    ax.plot(val_losses, label="Validation")
    ax.legend()
    fig.savefig("/tmp/loss_curve.png")
    mlflow.log_artifact("/tmp/loss_curve.png", artifact_path="plots")

    # Log a CSV
    feature_importance_df.to_csv("/tmp/feature_importance.csv", index=False)
    mlflow.log_artifact("/tmp/feature_importance.csv", artifact_path="reports")

Logging a Directory

# Log all files in a directory
mlflow.log_artifacts("/tmp/eval_outputs/", artifact_path="evaluation")

Logging a Dictionary as JSON

import json

config = {"preprocessing": "standard_scaler", "feature_selection": "rfe", "n_features": 42}
with open("/tmp/config.json", "w") as f:
    json.dump(config, f)
mlflow.log_artifact("/tmp/config.json")

Logging Models

For frameworks Calabi ML supports natively, use the dedicated mlflow.<framework>.log_model() function. This stores the model with a standard signature and loading interface.

scikit-learn

from sklearn.ensemble import GradientBoostingClassifier
from sklearn.pipeline import Pipeline
from sklearn.preprocessing import StandardScaler
import mlflow.sklearn

with mlflow.start_run(run_name="sklearn_gb_v1"):
    pipeline = Pipeline([
        ("scaler", StandardScaler()),
        ("classifier", GradientBoostingClassifier(n_estimators=200, max_depth=4)),
    ])
    pipeline.fit(X_train, y_train)

    mlflow.log_params(pipeline.get_params())
    mlflow.log_metric("val_accuracy", pipeline.score(X_val, y_val))

    # Log the model with input/output signature
    from mlflow.models import infer_signature
    signature = infer_signature(X_train, pipeline.predict(X_train))
    mlflow.sklearn.log_model(pipeline, "model", signature=signature)

XGBoost

import xgboost as xgb
import mlflow.xgboost

with mlflow.start_run():
    params = {"n_estimators": 300, "max_depth": 6, "learning_rate": 0.05}
    model = xgb.XGBClassifier(**params)
    model.fit(X_train, y_train, eval_set=[(X_val, y_val)], verbose=False)

    mlflow.log_params(params)
    mlflow.log_metric("val_auc", roc_auc_score(y_val, model.predict_proba(X_val)[:,1]))
    mlflow.xgboost.log_model(model, "model")

PyTorch

import torch
import mlflow.pytorch

with mlflow.start_run():
    model = MyNeuralNetwork()
    # ... train model ...
    mlflow.pytorch.log_model(model, "model")

TensorFlow / Keras

import tensorflow as tf
import mlflow.tensorflow

with mlflow.start_run():
    model = tf.keras.Sequential([...])
    model.compile(optimizer="adam", loss="binary_crossentropy", metrics=["accuracy"])
    model.fit(X_train, y_train, validation_data=(X_val, y_val), epochs=20)
    mlflow.tensorflow.log_model(model, "model")

Autologging

Calabi ML supports autologging for the most popular ML frameworks. When autologging is enabled, Calabi ML automatically captures parameters, metrics, and the model artifact without any explicit logging calls in your training code.

Enabling Autologging

import mlflow

# Enable for all supported frameworks at once
mlflow.autolog()

# Enable for a specific framework only
mlflow.sklearn.autolog()
mlflow.xgboost.autolog()
mlflow.pytorch.autolog()
mlflow.tensorflow.autolog()
mlflow.lightgbm.autolog()

scikit-learn Autolog Example

import mlflow
import mlflow.sklearn
from sklearn.ensemble import RandomForestClassifier

mlflow.set_tracking_uri("https://calabi.<your-domain>/mlflow")
mlflow.set_experiment("churn-prediction-v2")
mlflow.sklearn.autolog()   # Enable autologging

with mlflow.start_run(run_name="random_forest_auto"):
    model = RandomForestClassifier(n_estimators=100, max_depth=8, random_state=42)
    model.fit(X_train, y_train)
    # Parameters, metrics, and the model are logged automatically

Autologging captures:

All __init__ parameters of the estimator
Cross-validation metrics (if cross_val_score is called)
Training accuracy/loss
The fitted model as an artifact
Feature importance (for tree-based models)

Disabling Autologging Selectively

mlflow.autolog(disable=True)  # Disable globally
mlflow.sklearn.autolog(log_models=False)  # Autolog params/metrics but not the model file

Logging from Calabi Notebooks

Calabi Notebooks are the most common environment for exploratory ML work. Logging from notebooks follows the same API as scripts, with a few practical considerations.

Recommended Notebook Pattern

import mlflow
import mlflow.sklearn
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.ensemble import GradientBoostingClassifier
from sklearn.metrics import classification_report, roc_auc_score

# ── Cell 1: Setup ───────────────────────────────────────────────────────────
mlflow.set_tracking_uri("https://calabi.<your-domain>/mlflow")
mlflow.set_experiment("churn-prediction-v2")

# ── Cell 2: Data loading ─────────────────────────────────────────────────────
df = pd.read_parquet("s3://my-bucket/data/churn_features_v3.parquet")
X = df.drop(columns=["churned"])
y = df["churned"]
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# ── Cell 3: Training + Logging ───────────────────────────────────────────────
with mlflow.start_run(run_name="gb_depth6_lr005"):
    params = {
        "n_estimators": 400,
        "max_depth": 6,
        "learning_rate": 0.05,
        "subsample": 0.8,
    }
    mlflow.log_params(params)
    mlflow.set_tag("notebook", "churn_exploration_v3.ipynb")
    mlflow.set_tag("dataset_version", "v3")

    model = GradientBoostingClassifier(**params)
    model.fit(X_train, y_train)

    y_pred_proba = model.predict_proba(X_test)[:, 1]
    test_auc = roc_auc_score(y_test, y_pred_proba)

    mlflow.log_metric("test_auc", test_auc)
    mlflow.sklearn.log_model(model, "model")

    print(f"Test AUC: {test_auc:.4f}")
    print(classification_report(y_test, model.predict(X_test)))

Logging the Notebook Itself as an Artifact

# At the end of your notebook, save the notebook file as an artifact
mlflow.log_artifact("churn_exploration_v3.ipynb", artifact_path="notebooks")

This links the exact notebook version to the run, providing full reproducibility context.

Nested Runs

For hyperparameter search or cross-validation, use nested runs to organise individual trials under a parent run:

with mlflow.start_run(run_name="hyperparameter_search") as parent_run:
    mlflow.log_param("search_method", "grid_search")

    best_auc = 0
    for params in param_grid:
        with mlflow.start_run(run_name=f"trial_{params}", nested=True):
            model = GradientBoostingClassifier(**params)
            model.fit(X_train, y_train)
            auc = roc_auc_score(y_val, model.predict_proba(X_val)[:,1])

            mlflow.log_params(params)
            mlflow.log_metric("val_auc", auc)

            if auc > best_auc:
                best_auc = auc
                best_params = params

    mlflow.log_metric("best_val_auc", best_auc)
    mlflow.log_params({f"best_{k}": v for k, v in best_params.items()})

Nested runs appear as collapsible children under the parent run in the Calabi ML UI.

Next Steps

Comparing Runs — Find the best run across your experiment
Model Registry — Promote a run's model to staging or production
Experiments — Organise runs into projects

Setting Up the Tracking URI​

Starting a Run​

Run Parameters​

Logging Parameters​

Logging Metrics​

Logging Multiple Metrics at Once​

Logging Artifacts​

Logging a Single File​

Logging a Directory​

Logging a Dictionary as JSON​

Logging Models​

scikit-learn​

XGBoost​

PyTorch​

TensorFlow / Keras​

Autologging​

Enabling Autologging​

scikit-learn Autolog Example​

Disabling Autologging Selectively​

Logging from Calabi Notebooks​

Recommended Notebook Pattern​

Logging the Notebook Itself as an Artifact​

Tags​

Nested Runs​

Next Steps​