Experiment Tracking with MLflow: What to Log and What to Ignore
MLflow is powerful, but logging everything wastes storage and obscures signal. Learn what metrics, parameters, and artifacts actually matter for your ML experiments.
You've trained a model. Now you need to know: did it work? More importantly, can you reproduce it next month? This is where experiment tracking becomes critical—and where teams often go wrong by logging indiscriminately.
MLflow makes it easy to track experiments, but easy doesn't mean thoughtless. Logging every variable, every intermediate output, and every debug statement creates noise that makes it harder to compare runs, slower to retrieve results, and more expensive to store. The real skill is knowing what belongs in your experiment record and what should stay in your code.
What You Should Always Log
Model Performance Metrics
This is non-negotiable. Log metrics that directly answer: "Does this model work?" For classification, that's accuracy, precision, recall, F1-score, and AUC. For regression, MAE, RMSE, and R² are standard. Log both training and validation metrics—the gap between them tells you about overfitting.
pythonfrom mlflow import log_metric log_metric("train_loss", train_loss, step=epoch) log_metric("val_accuracy", val_accuracy, step=epoch) log_metric("val_loss", val_loss, step=epoch)
Be specific with naming. Use prefixes like
train_val_test_Hyperparameters That Matter
Log every hyperparameter you tuned or changed. Learning rate, batch size, regularization strength, model architecture choices—these are your experiment's DNA. When you find a good result three months later, you need to know exactly what settings created it.
pythonfrom mlflow import log_param log_param("learning_rate", 0.001) log_param("batch_size", 32) log_param("dropout_rate", 0.2) log_param("optimizer", "adam")
Skip logging constants that never change. If you always use the same random seed across all experiments, it doesn't need to be logged—document it in your code instead.
What You Should Selectively Log
Feature Importance and Model Artifacts
Model weights and feature importance plots add insight, but they multiply storage costs. Log them if they inform your next steps. If you're exploring 50 random forest variants to find a baseline, you probably don't need the feature importance for every single run. If you're narrowing down to your final three candidates, absolutely log them—you'll need to explain model behavior to stakeholders.
pythonfrom mlflow import log_artifact import matplotlib.pyplot as plt # Only log if it changes your decision if is_final_candidate: plt.figure() plt.barh(feature_names, importances) plt.savefig("/tmp/feature_importance.png") log_artifact("/tmp/feature_importance.png")
Data Summaries
Log statistics about your dataset—record count, class distribution, missing value rates—but only once per dataset. You don't need to repeat it across 100 runs using the same training data. This becomes essential when multiple team members run experiments; data validation up front saves debugging later.
What You Should Ignore
Training-Step Diagnostics
Logging loss or accuracy at every step is tempting but usually wasteful. Log at epoch intervals instead. If you need fine-grained diagnostics, save them locally during development, then remove them before running your final experiments.
python# Too much for step in range(10000): log_metric("loss", current_loss, step=step) # Better if step % 100 == 0: log_metric("loss", current_loss, step=step)
Implementation Details
Don't log library versions, Python path information, or system configuration unless they're known sources of variance in your results. The goal is reproducibility of the model, not the exact computational environment—that's what Docker and conda files are for.
The Practical Balance
When we work with clients at LavaPi on production ML systems, the pattern is consistent: teams waste months trying to parse sprawling experiment logs, then overcorrect by logging almost nothing. The solution is discipline. Log what affects model behavior and interpretation. Ignore operational details.
Start lean. Add logging only when you find yourself thinking "I wish I'd tracked that." Your future self, reviewing experiments six months from now, will appreciate the clarity.
LavaPi Team
Digital Engineering Company