Logging Runs
A run in Calabi ML captures everything about one model training execution: the hyperparameters configured, the metrics measured, the files produced, and the tags assigned. This page covers the full Python SDK API for logging, autologging integrations with popular frameworks, and patterns for logging from Calabi Notebooks.
Setting Up the Tracking URI
Before logging anything, point the Calabi ML client at your Calabi ML server:
import mlflow
mlflow.set_tracking_uri("https://calabi.<your-domain>/mlflow")
mlflow.set_experiment("my-experiment")
Set the tracking URI once per session (or per script). It is also accepted via the environment variable MLFLOW_TRACKING_URI:
export MLFLOW_TRACKING_URI="https://calabi.<your-domain>/mlflow"
Starting a Run
All logging calls must happen inside an active run context. Use mlflow.start_run() as a context manager:
with mlflow.start_run(run_name="xgboost_tuned_v1"):
# All logging calls here are associated with this run
mlflow.log_param("n_estimators", 200)
mlflow.log_metric("val_auc", 0.88)
When the with block exits, the run is automatically ended and its state is set to FINISHED.
Run Parameters
| Parameter | Type | Description |
|---|---|---|
run_name | str | Human-readable name shown in the UI |
run_id | str | Resume a specific existing run |
experiment_id | str | Override the active experiment |
tags | dict | Key-value tags set on run creation |
description | str | Free-text description of the run |
Logging Parameters
Parameters are configuration values set before training begins. They are immutable once logged for a given run — use them for hyperparameters, dataset versions, and model architecture choices.
with mlflow.start_run():
# Log individual parameters
mlflow.log_param("learning_rate", 0.01)
mlflow.log_param("max_depth", 6)
mlflow.log_param("n_estimators", 300)
mlflow.log_param("random_seed", 42)
mlflow.log_param("dataset_version", "v3.2")
# Log all parameters at once from a dict
params = {
"learning_rate": 0.01,
"max_depth": 6,
"n_estimators": 300,
"colsample_bytree": 0.8,
"subsample": 0.9,
}
mlflow.log_params(params)
Log all hyperparameters, even those you're not actively tuning. When comparing runs weeks later, you will want to verify that "fixed" parameters were identical across the runs you're comparing.
Logging Metrics
Metrics are measured outcomes logged during or after training. Unlike parameters, metrics can be logged multiple times to capture how they evolve over training steps or epochs.
with mlflow.start_run():
for epoch in range(100):
train_loss, val_loss = train_one_epoch(model, epoch)
train_acc, val_acc = evaluate(model)
# Log with step to track training curves
mlflow.log_metric("train_loss", train_loss, step=epoch)
mlflow.log_metric("val_loss", val_loss, step=epoch)
mlflow.log_metric("train_accuracy", train_acc, step=epoch)
mlflow.log_metric("val_accuracy", val_acc, step=epoch)
# Log final summary metrics
mlflow.log_metric("test_accuracy", test_accuracy)
mlflow.log_metric("test_f1", test_f1)
mlflow.log_metric("test_auc_roc", test_auc)
Logging Multiple Metrics at Once
final_metrics = {
"test_accuracy": 0.923,
"test_precision": 0.901,
"test_recall": 0.914,
"test_f1": 0.907,
"test_auc_roc": 0.971,
}
mlflow.log_metrics(final_metrics)
Logging Artifacts
Artifacts are files associated with a run: model weights, serialised pipelines, plots, confusion matrices, feature importance tables, and any other output file.
Logging a Single File
import matplotlib.pyplot as plt
import pandas as pd
with mlflow.start_run():
# Log a plot
fig, ax = plt.subplots()
ax.plot(train_losses, label="Train")
ax.plot(val_losses, label="Validation")
ax.legend()
fig.savefig("/tmp/loss_curve.png")
mlflow.log_artifact("/tmp/loss_curve.png", artifact_path="plots")
# Log a CSV
feature_importance_df.to_csv("/tmp/feature_importance.csv", index=False)
mlflow.log_artifact("/tmp/feature_importance.csv", artifact_path="reports")
Logging a Directory
# Log all files in a directory
mlflow.log_artifacts("/tmp/eval_outputs/", artifact_path="evaluation")
Logging a Dictionary as JSON
import json
config = {"preprocessing": "standard_scaler", "feature_selection": "rfe", "n_features": 42}
with open("/tmp/config.json", "w") as f:
json.dump(config, f)
mlflow.log_artifact("/tmp/config.json")
Logging Models
For frameworks Calabi ML supports natively, use the dedicated mlflow.<framework>.log_model() function. This stores the model with a standard signature and loading interface.
scikit-learn
from sklearn.ensemble import GradientBoostingClassifier
from sklearn.pipeline import Pipeline
from sklearn.preprocessing import StandardScaler
import mlflow.sklearn
with mlflow.start_run(run_name="sklearn_gb_v1"):
pipeline = Pipeline([
("scaler", StandardScaler()),
("classifier", GradientBoostingClassifier(n_estimators=200, max_depth=4)),
])
pipeline.fit(X_train, y_train)
mlflow.log_params(pipeline.get_params())
mlflow.log_metric("val_accuracy", pipeline.score(X_val, y_val))
# Log the model with input/output signature
from mlflow.models import infer_signature
signature = infer_signature(X_train, pipeline.predict(X_train))
mlflow.sklearn.log_model(pipeline, "model", signature=signature)
XGBoost
import xgboost as xgb
import mlflow.xgboost
with mlflow.start_run():
params = {"n_estimators": 300, "max_depth": 6, "learning_rate": 0.05}
model = xgb.XGBClassifier(**params)
model.fit(X_train, y_train, eval_set=[(X_val, y_val)], verbose=False)
mlflow.log_params(params)
mlflow.log_metric("val_auc", roc_auc_score(y_val, model.predict_proba(X_val)[:,1]))
mlflow.xgboost.log_model(model, "model")
PyTorch
import torch
import mlflow.pytorch
with mlflow.start_run():
model = MyNeuralNetwork()
# ... train model ...
mlflow.pytorch.log_model(model, "model")
TensorFlow / Keras
import tensorflow as tf
import mlflow.tensorflow
with mlflow.start_run():
model = tf.keras.Sequential([...])
model.compile(optimizer="adam", loss="binary_crossentropy", metrics=["accuracy"])
model.fit(X_train, y_train, validation_data=(X_val, y_val), epochs=20)
mlflow.tensorflow.log_model(model, "model")
Autologging
Calabi ML supports autologging for the most popular ML frameworks. When autologging is enabled, Calabi ML automatically captures parameters, metrics, and the model artifact without any explicit logging calls in your training code.
Enabling Autologging
import mlflow
# Enable for all supported frameworks at once
mlflow.autolog()
# Enable for a specific framework only
mlflow.sklearn.autolog()
mlflow.xgboost.autolog()
mlflow.pytorch.autolog()
mlflow.tensorflow.autolog()
mlflow.lightgbm.autolog()
scikit-learn Autolog Example
import mlflow
import mlflow.sklearn
from sklearn.ensemble import RandomForestClassifier
mlflow.set_tracking_uri("https://calabi.<your-domain>/mlflow")
mlflow.set_experiment("churn-prediction-v2")
mlflow.sklearn.autolog() # Enable autologging
with mlflow.start_run(run_name="random_forest_auto"):
model = RandomForestClassifier(n_estimators=100, max_depth=8, random_state=42)
model.fit(X_train, y_train)
# Parameters, metrics, and the model are logged automatically
Autologging captures:
- All
__init__parameters of the estimator - Cross-validation metrics (if
cross_val_scoreis called) - Training accuracy/loss
- The fitted model as an artifact
- Feature importance (for tree-based models)
Disabling Autologging Selectively
mlflow.autolog(disable=True) # Disable globally
mlflow.sklearn.autolog(log_models=False) # Autolog params/metrics but not the model file
Logging from Calabi Notebooks
Calabi Notebooks are the most common environment for exploratory ML work. Logging from notebooks follows the same API as scripts, with a few practical considerations.
Recommended Notebook Pattern
import mlflow
import mlflow.sklearn
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.ensemble import GradientBoostingClassifier
from sklearn.metrics import classification_report, roc_auc_score
# ── Cell 1: Setup ───────────────────────────────────────────────────────────
mlflow.set_tracking_uri("https://calabi.<your-domain>/mlflow")
mlflow.set_experiment("churn-prediction-v2")
# ── Cell 2: Data loading ─────────────────────────────────────────────────────
df = pd.read_parquet("s3://my-bucket/data/churn_features_v3.parquet")
X = df.drop(columns=["churned"])
y = df["churned"]
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
# ── Cell 3: Training + Logging ───────────────────────────────────────────────
with mlflow.start_run(run_name="gb_depth6_lr005"):
params = {
"n_estimators": 400,
"max_depth": 6,
"learning_rate": 0.05,
"subsample": 0.8,
}
mlflow.log_params(params)
mlflow.set_tag("notebook", "churn_exploration_v3.ipynb")
mlflow.set_tag("dataset_version", "v3")
model = GradientBoostingClassifier(**params)
model.fit(X_train, y_train)
y_pred_proba = model.predict_proba(X_test)[:, 1]
test_auc = roc_auc_score(y_test, y_pred_proba)
mlflow.log_metric("test_auc", test_auc)
mlflow.sklearn.log_model(model, "model")
print(f"Test AUC: {test_auc:.4f}")
print(classification_report(y_test, model.predict(X_test)))
Logging the Notebook Itself as an Artifact
# At the end of your notebook, save the notebook file as an artifact
mlflow.log_artifact("churn_exploration_v3.ipynb", artifact_path="notebooks")
This links the exact notebook version to the run, providing full reproducibility context.
Tags
Tags are key-value labels that can be set on a run at any time. Use tags for metadata that doesn't fit neatly into the parameters/metrics model.
with mlflow.start_run():
mlflow.set_tag("env", "development")
mlflow.set_tag("data_version", "v3.2")
mlflow.set_tag("engineer", "alice@example.com")
mlflow.set_tag("jira_ticket", "DS-142")
mlflow.set_tag("gpu", "A100-40GB")
# Or set multiple tags at once
mlflow.set_tags({
"env": "development",
"framework": "sklearn",
"feature_set": "behavioral_v2",
})
Tags are searchable in the Calabi ML UI and via the API, making it easy to filter runs by engineer, environment, or dataset version.
Nested Runs
For hyperparameter search or cross-validation, use nested runs to organise individual trials under a parent run:
with mlflow.start_run(run_name="hyperparameter_search") as parent_run:
mlflow.log_param("search_method", "grid_search")
best_auc = 0
for params in param_grid:
with mlflow.start_run(run_name=f"trial_{params}", nested=True):
model = GradientBoostingClassifier(**params)
model.fit(X_train, y_train)
auc = roc_auc_score(y_val, model.predict_proba(X_val)[:,1])
mlflow.log_params(params)
mlflow.log_metric("val_auc", auc)
if auc > best_auc:
best_auc = auc
best_params = params
mlflow.log_metric("best_val_auc", best_auc)
mlflow.log_params({f"best_{k}": v for k, v in best_params.items()})
Nested runs appear as collapsible children under the parent run in the Calabi ML UI.
Next Steps
- Comparing Runs — Find the best run across your experiment
- Model Registry — Promote a run's model to staging or production
- Experiments — Organise runs into projects