2024-05-17 4 min read

Model Versioning in Production: Lessons from Breaking Things in the Dark

Deploy with confidence. We break down how to version ML models in production, why it matters, and the exact mistakes that taught us the hard way.

Model Versioning in Production: Lessons from Breaking Things in the Dark

At 3 AM on a Tuesday, a recommendation model silently degraded. No errors. No alerts. Just predictions that drifted slowly wrong over the course of six hours. By the time anyone noticed, the damage was done—and we had no way to roll back because nobody had documented which model version was running.

That was our wake-up call. Model versioning isn't a nice-to-have in production. It's the difference between a graceful recovery and a complete system failure. Here's what we learned.

The Cost of Not Versioning

Without versioning, production breaks silently. A new model trains faster than expected and deploys automatically. Training data shifts slightly. Dependencies change. A month later, performance has degraded 8%, but you can't trace why because you don't know which version is running, when it was deployed, or what changed.

The real cost isn't the technical debt—it's the blind spot. You can't debug what you can't see.

What to Version

Version three things: the model artifact, the training data snapshot, and the preprocessing logic.

python
# Model metadata structure
model_manifest = {
    "version": "2024.01.15.prod",
    "model_hash": "sha256:a3f2d1...",
    "training_date": "2024-01-15T09:22:00Z",
    "data_snapshot_id": "snapshot_v847",
    "preprocessing_version": "1.2.3",
    "framework": "tensorflow:2.14",
    "metrics": {
        "accuracy": 0.9847,
        "f1_score": 0.9156,
        "auc_roc": 0.9923
    },
    "deployed_by": "ml-pipeline",
    "rollback_target": "2024.01.10.prod"
}

Store this alongside your model. When something breaks, you have context.

Implementation: The Boring Way That Works

Use Semantic Versioning for Models

Adopt a date-based schema:

code
YYYY.MM.DD.stage
. It's sortable, human-readable, and immediately tells you when a model was built.

bash
# Model storage structure
ml-models/
  ├── recommendation-v1/
  │   ├── 2024.01.15.prod/
  │   │   ├── model.h5
  │   │   ├── manifest.json
  │   │   └── scaler.pkl
  │   ├── 2024.01.10.prod/
  │   └── 2024.01.12.staging/

Track Deployments Explicitly

Don't rely on implicit versioning. Record every deployment with context.

python
class ModelRegistry:
    def deploy(self, model_version, environment, metadata):
        deployment_record = {
            "timestamp": datetime.utcnow().isoformat(),
            "model_version": model_version,
            "environment": environment,
            "deployed_by": os.environ.get("CI_USER"),
            "commit_hash": os.environ.get("GIT_COMMIT"),
            "status": "active"
        }
        self.log_deployment(deployment_record)
        return deployment_record

Enable Fast Rollbacks

Your deployment must support instant rollback without ceremony. Maintain a pointer to the active version, not hardcoded model paths.

typescript
// Load model configuration at runtime
const loadModel = async (environment: string) => {
  const config = await fetch(
    `/api/model-registry/${environment}/active`
  ).then(r => r.json());
  
  return tf.loadLayersModel(
    `file://${config.model_path}/model.json`
  );
};

When you need to rollback, change a single pointer. Done in seconds.

Monitoring: Know When Things Go Wrong

Versioning solves half the problem. Monitoring solves the other half. Track model performance metrics continuously.

python
# Log predictions for monitoring
def log_prediction(version, prediction, confidence, ground_truth=None):
    metrics.record({
        "model_version": version,
        "prediction": prediction,
        "confidence": confidence,
        "ground_truth": ground_truth,
        "timestamp": time.time()
    })

Compare current version performance against your baseline hourly. Drift detection catches silent failures before they spread.

The Takeaway

Model versioning is not about perfection or elaborate systems. At LavaPi, we've found that simple, consistent versioning practices—combined with explicit deployment tracking and continuous monitoring—eliminate the worst class of production failures: the ones you don't notice until they've been wrong for hours.

Version your models. Track your deployments. Monitor continuously. That's the formula. Everything else is implementation detail.

Share
LP

LavaPi Team

Digital Engineering Company

All articles