Jesse Williams · September 4, 2024

How to Extend Your DevOps Pipeline to MLOps with KitOps

TL;DR: You don't need a separate MLOps team or toolchain. KitOps is a CNCF open-source CLI that packages code, data, and ML models into OCI-compatible artifacts called ModelKits. They work with your existing container registry and CI/CD pipeline. This guide walks through a working setup from local dev to production deployment.


The problem: two pipelines, double the overhead

If your org has started shipping ML models, you've probably watched this happen in real time. Data scientists build models in notebooks. Somebody hand-rolls a deployment script. And before long you're maintaining parallel infrastructure for "regular" software and ML workloads.

The numbers back this up: roughly 85% of ML models never make it to production. The reason isn't bad data science. It's operational complexity. Separate version control for data. Separate registries for models. Separate deployment workflows. Separate teams who don't talk to each other.

What if you could treat ML artifacts the same way you treat container images? Package them, version them, push them to a registry, pull them in CI/CD.

That's what KitOps does.


What is KitOps?

KitOps is a CNCF open-source CLI tool that packages code, datasets, ML models, prompts, and configuration into a single versioned unit called a ModelKit. ModelKits are stored as OCI-compatible artifacts, so they work with whatever container registry you're already running: Docker Hub, GitHub Container Registry, AWS ECR, Azure ACR, whatever.

KitOps is also the reference implementation of the CNCF's ModelPack specification, a vendor-neutral interchange format for AI/ML projects. Same org that governs Kubernetes, OpenTelemetry, and Prometheus.

Here's what matters if you're coming from a DevOps background:

  • OCI-native. ModelKits live alongside your container images in the same registry. No new infrastructure to spin up.
  • Tamper-proof and signable. Every artifact gets a checksum. You can verify integrity, track changes between versions, and link a model to the exact dataset it was trained on. You can also sign ModelKits for trust and verification.
  • Selective unpacking. Need just the model for inference? Pull only the model. Need just the code for testing? Pull only the code. You're not downloading 10 GB of training data when you only need a 200 MB model file.
  • Kitfile manifest. A declarative YAML file (think Dockerfile, but for your whole project) that defines what goes into the package.
  • Runnable containers. Kit CLI can generate runnable containers for Kubernetes or Docker directly from a ModelKit.

Prerequisites

Before you start, you'll need:

  • Kit CLI installed: Installation instructions
  • A container registry: Any OCI-compatible registry (Docker Hub, GHCR, ECR, Jozu Hub, etc.)
  • An existing CI/CD pipeline: The examples below use GitHub Actions, but the commands work anywhere
  • Basic familiarity with Docker: If you understand docker build / docker push, KitOps will feel familiar

Step 1: Define your Kitfile

A Kitfile declares what artifacts go into your ModelKit. It's a YAML manifest, similar to a Dockerfile but scoped to your whole project: code, model weights, datasets, prompts, config.

Here's a Kitfile for a Python ML project:

manifestVersion: v1.0.0
package:
  name: fraud-detection-model
  description: "XGBoost fraud detection model with preprocessing pipeline"
  license: Apache-2.0
  authors: [your-team]

code:
  - path: ./src
    description: Application source code and inference server

model:
  name: fraud-detector-v2
  path: ./models/xgboost_fraud.pkl
  framework: XGBoost
  description: Trained on Q4 2025 transaction data

datasets:
  - name: training-data
    path: ./data/transactions_q4.parquet
    description: Q4 2025 labeled transaction dataset

What each section does:

  • package: Metadata about the project. The name becomes the artifact identifier in your registry.
  • code: Points to your application source. This gets versioned alongside the model.
  • model: The trained model file. KitOps tracks it by checksum, so you always know exactly which model version is deployed.
  • datasets: Training data linked to this model version. This is important for EU AI Act compliance, since the regulation requires traceability between models and training data for up to 10 years after training completes.

You can also include prompts, docs, and parts (arbitrary files) in the Kitfile. If you're already working in Jupyter, the PyKitOps Python SDK lets you pack and push ModelKits directly from your notebook.


Step 2: Pack and push

Once your Kitfile is ready, package everything into a ModelKit and push it to your registry:

# Package all artifacts into a ModelKit
kit pack . -t ghcr.io/your-org/fraud-detection:v2.1.0

# Push to your container registry
kit push ghcr.io/your-org/fraud-detection:v2.1.0

That's it. You now have a single, immutable, versioned artifact that contains your code, model, and data. All content-addressed and tamper-proof.

Under the hood, Kit creates OCI-compatible layers for each artifact type. Your registry doesn't need any special configuration. It handles ModelKits the same way it handles container images.

If you want to use the CNCF ModelPack format instead, just add the flag:

kit pack . --use-model-pack -t ghcr.io/your-org/fraud-detection:v2.1.0

All your Kit commands (pull, push, unpack, inspect, list) work the same way with either format.


Step 3: Pull and test in CI/CD

This is where KitOps fits into your existing pipeline. Here's a GitHub Actions workflow that pulls the ModelKit, runs tests against the code and model, then builds a container image:

name: ML Pipeline - Test and Deploy

on:
  push:
    branches: [main]

jobs:
  test:
    runs-on: ubuntu-latest
    steps:
      - name: Install Kit CLI
        run: |
          curl -sSL https://kitops.org/install.sh | bash

      - name: Pull and unpack code + model
        run: |
          kit login ghcr.io -u ${{ secrets.REGISTRY_USER }} -p ${{ secrets.REGISTRY_TOKEN }}
          kit unpack ghcr.io/your-org/fraud-detection:v2.1.0 -d ./workspace

      - name: Run unit tests
        run: |
          cd ./workspace
          pip install -r requirements.txt
          pytest tests/ -v

      - name: Run model validation
        run: |
          cd ./workspace
          python scripts/validate_model.py --model ./models/xgboost_fraud.pkl

  deploy:
    needs: test
    runs-on: ubuntu-latest
    steps:
      - name: Unpack code for deployment
        run: |
          kit unpack ghcr.io/your-org/fraud-detection:v2.1.0 --code -d ./deploy

      - name: Build and push container image
        run: |
          docker build -t ghcr.io/your-org/fraud-detection-app:v2.1.0 ./deploy
          docker push ghcr.io/your-org/fraud-detection-app:v2.1.0

Notice the --code flag in the deploy step. KitOps lets you selectively unpack only what you need. In testing, you pull everything. In deployment, you might only pull the code and model, skipping the training dataset entirely. That saves real bandwidth when your datasets are large.


Step 4: Deploy and monitor

With your container image built and pushed, deployment works exactly like any other service in your stack:

# Pull and run on your production server
docker pull ghcr.io/your-org/fraud-detection-app:v2.1.0
docker run -d -p 8080:8080 ghcr.io/your-org/fraud-detection-app:v2.1.0

For monitoring, use whatever you already have running: Grafana, Datadog, Prometheus. The ML-specific addition worth thinking about is model drift detection, tracking whether your model's predictions are degrading over time as real-world data shifts. Tools like Evidently AI or Superwise can plug into your existing observability stack.

And if something goes wrong? Pull an earlier ModelKit version and unpack it. Since everything is tracked (code, data, config), you return to a known working state without patching things manually. No guesswork, no rebuilds from scratch, no digging through Slack threads to figure out what changed.

# Roll back to a previous version
kit pull ghcr.io/your-org/fraud-detection:v2.0.0
kit unpack ghcr.io/your-org/fraud-detection:v2.0.0 -d ./rollback

Adding governance with Jozu Hub

The open-source KitOps CLI handles packaging and versioning. If your team also needs governance, audit trails, and policy enforcement, Jozu Hub sits on top and adds:

  • Validation and integrity checks when you push a ModelKit (signature verification, metadata completeness, compliance rules)
  • Audit trail and lineage logging for every push, tag, or promotion
  • Auto-inference container generation so consumers don't have to manually build inference Docker images
  • Curated model library with control over which models your developers can use

Jozu Hub installs behind your firewall and uses your existing OCI registry. It works in private cloud, datacenter, or air-gapped environments.

In short: KitOps handles the packaging. Jozu keeps everything honest.


How KitOps compares to the alternatives

If you've looked into MLOps tooling, you've probably come across DVC, MLflow, and full platforms like SageMaker. Here's where KitOps fits:

KitOps DVC + MLflow AWS SageMaker
Learning curve Low, feels like Docker Medium, two tools to configure High, AWS-specific concepts
Registry Your existing OCI registry Separate (S3/GCS for data, MLflow server for models) AWS-only
CI/CD integration Drop-in (any pipeline) Requires custom scripts Tight AWS integration
Artifact storage OCI standard (non-proprietary) Proprietary formats Proprietary
Compliance/traceability Built-in checksums, signatures, lineage Manual setup Partial
Standards CNCF ModelPack reference implementation None None
Cost Free, open source Free (but infrastructure costs for MLflow server) Pay-per-use
Best for Teams extending DevOps to ML Data science-heavy teams All-in-AWS shops

If you're a DevOps team adding ML capabilities, KitOps has the lowest friction because it works with infrastructure you already have. If your data scientists are already using MLflow for experiment tracking, KitOps can complement it rather than replace it. Use MLflow for experiments and KitOps for the packaging and deployment layer. We have a full walkthrough of that setup on the blog.


Common questions from DevOps engineers

Does this replace our container registry?
No. KitOps stores artifacts in your existing registry. ModelKits are OCI artifacts, so they live alongside your Docker images.

How big can artifacts be?
The OCI spec supports multi-gigabyte layers, and KitOps handles them well. Selective unpacking means you don't have to pull 50 GB of training data when deploying a model.

What about secrets and credentials?
Don't put them in the Kitfile. Use your existing secrets management (Vault, AWS Secrets Manager, etc.) the same way you handle Docker container secrets.

Can I use this with GitOps workflows?
Yes. Since ModelKits are versioned artifacts in your registry, you can reference specific versions in your ArgoCD or Flux manifests the same way you reference container image tags. We wrote a detailed guide on the Argo + KitOps setup.

Can I import models from HuggingFace?
Yes. KitOps supports HuggingFace import so you can pull a model from HuggingFace and package it into a ModelKit directly.


Get started in 5 minutes

  1. Install the Kit CLI: kitops.org/docs/get-started
  2. Try a Quick Start: Pre-built ModelKits for LLMs, computer vision, and more
  3. Star the repo: github.com/kitops-ml/kitops

Have questions? Join the KitOps Discord or open an issue on GitHub.


What to read next

Share this post