Skip to main content

Logging Runs

Professional+

A run in Calabi ML captures everything about one model training execution: the hyperparameters configured, the metrics measured, the files produced, and the tags assigned. This page covers the full Python SDK API for logging, autologging integrations with popular frameworks, and patterns for logging from Calabi Notebooks.


Setting Up the Tracking URI

Before logging anything, point the Calabi ML client at your Calabi ML server:

import mlflow

mlflow.set_tracking_uri("https://calabi.<your-domain>/mlflow")
mlflow.set_experiment("my-experiment")

Set the tracking URI once per session (or per script). It is also accepted via the environment variable MLFLOW_TRACKING_URI:

export MLFLOW_TRACKING_URI="https://calabi.<your-domain>/mlflow"

Starting a Run

All logging calls must happen inside an active run context. Use mlflow.start_run() as a context manager:

with mlflow.start_run(run_name="xgboost_tuned_v1"):
# All logging calls here are associated with this run
mlflow.log_param("n_estimators", 200)
mlflow.log_metric("val_auc", 0.88)

When the with block exits, the run is automatically ended and its state is set to FINISHED.

Run Parameters

ParameterTypeDescription
run_namestrHuman-readable name shown in the UI
run_idstrResume a specific existing run
experiment_idstrOverride the active experiment
tagsdictKey-value tags set on run creation
descriptionstrFree-text description of the run

Logging Parameters

Parameters are configuration values set before training begins. They are immutable once logged for a given run — use them for hyperparameters, dataset versions, and model architecture choices.

with mlflow.start_run():
# Log individual parameters
mlflow.log_param("learning_rate", 0.01)
mlflow.log_param("max_depth", 6)
mlflow.log_param("n_estimators", 300)
mlflow.log_param("random_seed", 42)
mlflow.log_param("dataset_version", "v3.2")

# Log all parameters at once from a dict
params = {
"learning_rate": 0.01,
"max_depth": 6,
"n_estimators": 300,
"colsample_bytree": 0.8,
"subsample": 0.9,
}
mlflow.log_params(params)
Log everything you might need later

Log all hyperparameters, even those you're not actively tuning. When comparing runs weeks later, you will want to verify that "fixed" parameters were identical across the runs you're comparing.


Logging Metrics

Metrics are measured outcomes logged during or after training. Unlike parameters, metrics can be logged multiple times to capture how they evolve over training steps or epochs.

with mlflow.start_run():
for epoch in range(100):
train_loss, val_loss = train_one_epoch(model, epoch)
train_acc, val_acc = evaluate(model)

# Log with step to track training curves
mlflow.log_metric("train_loss", train_loss, step=epoch)
mlflow.log_metric("val_loss", val_loss, step=epoch)
mlflow.log_metric("train_accuracy", train_acc, step=epoch)
mlflow.log_metric("val_accuracy", val_acc, step=epoch)

# Log final summary metrics
mlflow.log_metric("test_accuracy", test_accuracy)
mlflow.log_metric("test_f1", test_f1)
mlflow.log_metric("test_auc_roc", test_auc)

Logging Multiple Metrics at Once

final_metrics = {
"test_accuracy": 0.923,
"test_precision": 0.901,
"test_recall": 0.914,
"test_f1": 0.907,
"test_auc_roc": 0.971,
}
mlflow.log_metrics(final_metrics)

Logging Artifacts

Artifacts are files associated with a run: model weights, serialised pipelines, plots, confusion matrices, feature importance tables, and any other output file.

Logging a Single File

import matplotlib.pyplot as plt
import pandas as pd

with mlflow.start_run():
# Log a plot
fig, ax = plt.subplots()
ax.plot(train_losses, label="Train")
ax.plot(val_losses, label="Validation")
ax.legend()
fig.savefig("/tmp/loss_curve.png")
mlflow.log_artifact("/tmp/loss_curve.png", artifact_path="plots")

# Log a CSV
feature_importance_df.to_csv("/tmp/feature_importance.csv", index=False)
mlflow.log_artifact("/tmp/feature_importance.csv", artifact_path="reports")

Logging a Directory

# Log all files in a directory
mlflow.log_artifacts("/tmp/eval_outputs/", artifact_path="evaluation")

Logging a Dictionary as JSON

import json

config = {"preprocessing": "standard_scaler", "feature_selection": "rfe", "n_features": 42}
with open("/tmp/config.json", "w") as f:
json.dump(config, f)
mlflow.log_artifact("/tmp/config.json")

Logging Models

For frameworks Calabi ML supports natively, use the dedicated mlflow.<framework>.log_model() function. This stores the model with a standard signature and loading interface.

scikit-learn

from sklearn.ensemble import GradientBoostingClassifier
from sklearn.pipeline import Pipeline
from sklearn.preprocessing import StandardScaler
import mlflow.sklearn

with mlflow.start_run(run_name="sklearn_gb_v1"):
pipeline = Pipeline([
("scaler", StandardScaler()),
("classifier", GradientBoostingClassifier(n_estimators=200, max_depth=4)),
])
pipeline.fit(X_train, y_train)

mlflow.log_params(pipeline.get_params())
mlflow.log_metric("val_accuracy", pipeline.score(X_val, y_val))

# Log the model with input/output signature
from mlflow.models import infer_signature
signature = infer_signature(X_train, pipeline.predict(X_train))
mlflow.sklearn.log_model(pipeline, "model", signature=signature)

XGBoost

import xgboost as xgb
import mlflow.xgboost

with mlflow.start_run():
params = {"n_estimators": 300, "max_depth": 6, "learning_rate": 0.05}
model = xgb.XGBClassifier(**params)
model.fit(X_train, y_train, eval_set=[(X_val, y_val)], verbose=False)

mlflow.log_params(params)
mlflow.log_metric("val_auc", roc_auc_score(y_val, model.predict_proba(X_val)[:,1]))
mlflow.xgboost.log_model(model, "model")

PyTorch

import torch
import mlflow.pytorch

with mlflow.start_run():
model = MyNeuralNetwork()
# ... train model ...
mlflow.pytorch.log_model(model, "model")

TensorFlow / Keras

import tensorflow as tf
import mlflow.tensorflow

with mlflow.start_run():
model = tf.keras.Sequential([...])
model.compile(optimizer="adam", loss="binary_crossentropy", metrics=["accuracy"])
model.fit(X_train, y_train, validation_data=(X_val, y_val), epochs=20)
mlflow.tensorflow.log_model(model, "model")

Autologging

Calabi ML supports autologging for the most popular ML frameworks. When autologging is enabled, Calabi ML automatically captures parameters, metrics, and the model artifact without any explicit logging calls in your training code.

Enabling Autologging

import mlflow

# Enable for all supported frameworks at once
mlflow.autolog()

# Enable for a specific framework only
mlflow.sklearn.autolog()
mlflow.xgboost.autolog()
mlflow.pytorch.autolog()
mlflow.tensorflow.autolog()
mlflow.lightgbm.autolog()

scikit-learn Autolog Example

import mlflow
import mlflow.sklearn
from sklearn.ensemble import RandomForestClassifier

mlflow.set_tracking_uri("https://calabi.<your-domain>/mlflow")
mlflow.set_experiment("churn-prediction-v2")
mlflow.sklearn.autolog() # Enable autologging

with mlflow.start_run(run_name="random_forest_auto"):
model = RandomForestClassifier(n_estimators=100, max_depth=8, random_state=42)
model.fit(X_train, y_train)
# Parameters, metrics, and the model are logged automatically

Autologging captures:

  • All __init__ parameters of the estimator
  • Cross-validation metrics (if cross_val_score is called)
  • Training accuracy/loss
  • The fitted model as an artifact
  • Feature importance (for tree-based models)

Disabling Autologging Selectively

mlflow.autolog(disable=True)  # Disable globally
mlflow.sklearn.autolog(log_models=False) # Autolog params/metrics but not the model file

Logging from Calabi Notebooks

Calabi Notebooks are the most common environment for exploratory ML work. Logging from notebooks follows the same API as scripts, with a few practical considerations.

import mlflow
import mlflow.sklearn
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.ensemble import GradientBoostingClassifier
from sklearn.metrics import classification_report, roc_auc_score

# ── Cell 1: Setup ───────────────────────────────────────────────────────────
mlflow.set_tracking_uri("https://calabi.<your-domain>/mlflow")
mlflow.set_experiment("churn-prediction-v2")

# ── Cell 2: Data loading ─────────────────────────────────────────────────────
df = pd.read_parquet("s3://my-bucket/data/churn_features_v3.parquet")
X = df.drop(columns=["churned"])
y = df["churned"]
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# ── Cell 3: Training + Logging ───────────────────────────────────────────────
with mlflow.start_run(run_name="gb_depth6_lr005"):
params = {
"n_estimators": 400,
"max_depth": 6,
"learning_rate": 0.05,
"subsample": 0.8,
}
mlflow.log_params(params)
mlflow.set_tag("notebook", "churn_exploration_v3.ipynb")
mlflow.set_tag("dataset_version", "v3")

model = GradientBoostingClassifier(**params)
model.fit(X_train, y_train)

y_pred_proba = model.predict_proba(X_test)[:, 1]
test_auc = roc_auc_score(y_test, y_pred_proba)

mlflow.log_metric("test_auc", test_auc)
mlflow.sklearn.log_model(model, "model")

print(f"Test AUC: {test_auc:.4f}")
print(classification_report(y_test, model.predict(X_test)))

Logging the Notebook Itself as an Artifact

# At the end of your notebook, save the notebook file as an artifact
mlflow.log_artifact("churn_exploration_v3.ipynb", artifact_path="notebooks")

This links the exact notebook version to the run, providing full reproducibility context.


Tags

Tags are key-value labels that can be set on a run at any time. Use tags for metadata that doesn't fit neatly into the parameters/metrics model.

with mlflow.start_run():
mlflow.set_tag("env", "development")
mlflow.set_tag("data_version", "v3.2")
mlflow.set_tag("engineer", "alice@example.com")
mlflow.set_tag("jira_ticket", "DS-142")
mlflow.set_tag("gpu", "A100-40GB")

# Or set multiple tags at once
mlflow.set_tags({
"env": "development",
"framework": "sklearn",
"feature_set": "behavioral_v2",
})

Tags are searchable in the Calabi ML UI and via the API, making it easy to filter runs by engineer, environment, or dataset version.


Nested Runs

For hyperparameter search or cross-validation, use nested runs to organise individual trials under a parent run:

with mlflow.start_run(run_name="hyperparameter_search") as parent_run:
mlflow.log_param("search_method", "grid_search")

best_auc = 0
for params in param_grid:
with mlflow.start_run(run_name=f"trial_{params}", nested=True):
model = GradientBoostingClassifier(**params)
model.fit(X_train, y_train)
auc = roc_auc_score(y_val, model.predict_proba(X_val)[:,1])

mlflow.log_params(params)
mlflow.log_metric("val_auc", auc)

if auc > best_auc:
best_auc = auc
best_params = params

mlflow.log_metric("best_val_auc", best_auc)
mlflow.log_params({f"best_{k}": v for k, v in best_params.items()})

Nested runs appear as collapsible children under the parent run in the Calabi ML UI.


Next Steps