Audit Logging for ML Workflows with KitOps and MLflow

As machine learning systems are increasingly used in high-stakes decisions across industries, the ability to track, reproduce, and understand model behavior has become as important as model performance itself. Regulatory pressure, internal governance requirements, and engineering reliability all depend on having a clear record of how models are trained, packaged, and deployed. Without that record, it becomes difficult to reproduce results or investigate failures. Audit logging addresses this gap by providing a structured record of events across the ML lifecycle, from data ingestion to deployment.

In this tutorial, we walk through what audit logging means in the context of ML systems and build an end-to-end audit trail using MLflow for experiment tracking, KitOps for model packaging, and GHCR for artifact distribution.

Prerequisites

To follow through with this tutorial, you need to have the following:

Jozu account. To create an account, register here
KitOps installed
Docker desktop
A GitHub account (for GHCR)
Virtual environment setup

What is Audit Logging for ML Workflows?

Audit logging in machine learning workflows is the process of capturing a complete, tamper-proof record of all events across the ML lifecycle. It gives you visibility into how data experiments and models change from training to deployment and inference.

Unlike traditional logs, which focus on system-level information such as uptime, API calls, or runtime errors, audit logs extend beyond this by capturing things like:

Recording dataset versions
Experiment configurations
Hyperparameters
Model approvals
Rollbacks
Deployment events

Collectively, these records provide developers with a clear trail of every action that affects a model's state or output. The idea behind audit logging rests on three key principles, which are integrity to ensure every record is tamper-proof, traceability to link every stage of the ML process, and compliance to bring structure to your workflow.

Components to Audit

An efficient audit log must document the entire lifecycle of a model. Information generated by each stage aids in tracking what was constructed, examined, and deployed. Here are some of the items you should consider auditing in your ML workflows:

Data ingestion: Document the dataset version, checksum, and all preprocessing or transformation steps taken.
Training: Document the experiment configurations, hyperparameters, metrics, and the code commit or version utilized.
Packaging: Collect artifact digests, digital signatures, and provenance metadata to confirm model origin.
Deployment: Monitor approvals, verify signatures, and document specifics of the target environment.
Inference or runtime: Log request IDs, model version, latency, and drift detection alerts.
Retirement: Document deprecation events and archived models to maintain a complete audit trail.

Tools Commonly Used for ML Audit Logging

Different stages of the ML lifecycle depend on specialized tools to create and uphold trustworthy audit records. Below is a compilation of various tools frequently utilized for ML audit logging, categorized according to the roles they fulfill:

Experiment tracking: Tools like CometML and Weights & Biases help record datasets, metrics, model artifacts, and experiment metadata.
Data lineage and feature stores: Platforms such as Feast and Tecton manage dataset versions and data transformations to guarantee reproducibility.
Model packaging and provenance: Tools such as KitOps, BentoML, and MLflow Models manage signed, portable packages while documenting provenance information.
Model registries and deployment: Services such as Jozu, SageMaker Model Registry, and MLflow Model Registry keep records of approvals, rollbacks, and deployment actions, while KServe handles the inference and serving of deployed models.
Monitoring and SIEM integration: Systems such as Graylog, OpenSearch, ELK, Splunk, and Datadog centralize log collection and make audit analysis easier.
Cryptographic integrity tools: Utilities like Cosign and Sigstore handle artifact signing and verification to maintain data integrity.

End-to-End ML Audit Trail with KitOps, MLflow, GHCR

In this demo, we'll build an end-to-end ML audit trail using MLflow for experiment tracking, KitOps for packaging models and metadata, and GHCR for artifact distribution. Afterwards, we'll show how this workflow can be integrated into Jozu to provide centralized visibility, governance, and records.

Set Up the Project

Create a folder named ml_audit_demo/ for the project demo. You also need to create a src folder.

mkdir ml_audit_demo/
mkdir src

To proceed with creating files, you need to set up a virtual environment inside your project:

python3 -m venv .venv
source .venv/bin/activate

The next thing is to install the required dependencies for this project. Create a requirements.txt in the root folder with the following:

mlflow
scikit-learn
pandas
joblib
python-dotenv

Afterwards, run:

pip install -r requirements.txt

Also start the MLflow UI in a separate terminal by running:

mlflow ui

Open your browser at:

http://127.0.0.1:5000

MLflow is responsible for experiment tracking in our project.

Authenticate with GitHub Container Registry (GHCR)

Before training and packaging the model, you need to authenticate with GitHub Container Registry so KitOps can push the packaged ModelKit later in the workflow.

Start by creating a GitHub personal access token:

Go to GitHub → Settings → Developer settings → Personal access tokens
Generate a new token with the following permission:

write:packages

Copy the token and store it somewhere safe.

Next, export the token and your GitHub username as environment variables:

export GITHUB_TOKEN=<your-github-token>
export GH_OWNER=<your-github-username-or-org>

Then log in to GHCR. Using --password-stdin avoids exposing the token in your shell history:

echo $GITHUB_TOKEN | docker login ghcr.io -u $GH_OWNER --password-stdin

If authentication is successful, Docker will confirm that the login succeeded. This login allows KitOps to push ModelKits to GHCR using standard OCI registry authentication.

Once authenticated, you can proceed with training the model and capturing experiment metadata.

Train and Log

Create a file called train.py then add the following code:

import hashlib
import json
import joblib
import pandas as pd
from pathlib import Path
from sklearn.datasets import load_iris
from sklearn.ensemble import RandomForestClassifier
from sklearn.model_selection import train_test_split
import mlflow
import mlflow.sklearn

ROOT = Path(__file__).resolve().parents[1]
DATA = ROOT / "data" / "iris.csv"
MODELS = ROOT / "models"
MODELS.mkdir(exist_ok=True)

mlflow.set_experiment("rf-iris-audit-demo")

# Load or generate dataset
if not DATA.exists():
    X, y = load_iris(return_X_y=True, as_frame=True)
    df = pd.concat([X, y.rename("target")], axis=1)
    DATA.parent.mkdir(exist_ok=True)
    df.to_csv(DATA, index=False)
else:
    df = pd.read_csv(DATA)

dataset_hash = hashlib.sha256(
    pd.util.hash_pandas_object(df, index=True).values
).hexdigest()

with mlflow.start_run() as run:
    Xtr, Xte, ytr, yte = train_test_split(
        df.iloc[:, :-1],
        df.iloc[:, -1],
        test_size=0.2,
        random_state=42
    )

    model = RandomForestClassifier(n_estimators=120, random_state=42)
    model.fit(Xtr, ytr)
    accuracy = model.score(Xte, yte)

    model_path = MODELS / "model.pkl"
    joblib.dump(model, model_path)

    mlflow.log_param("n_estimators", 120)
    mlflow.log_metric("accuracy", accuracy)
    mlflow.log_param("dataset_sha256", dataset_hash)
    mlflow.log_artifact(str(model_path))

    run_id = run.info.run_id

# Write experiment metadata for packaging
with open(ROOT / "mlflow_meta.json", "w") as f:
    json.dump({
        "mlflow_run_id": run_id,
        "dataset_sha256": dataset_hash,
        "accuracy": accuracy
    }, f, indent=2)

print("MLflow run_id:", run_id)
print("Dataset sha256:", dataset_hash)
print("Accuracy:", accuracy)

This script first checks whether the Iris dataset exists and generates it if it is missing. It then creates and logs an MLflow experiment, recording training parameters, evaluation metrics, and the trained model artifact. As part of the process, it computes a dataset hash to maintain data lineage and finally saves key experiment metadata, including the MLflow run ID, dataset hash, and accuracy, to a mlflow_meta.json file, which can later be linked to the Kitfile for full traceability across the workflow.

Afterwards, run this command:

python src/train.py

After running the training script, you'll have a few key outputs:

data/iris.csv, which contains the dataset used for training
models/model.pkl, which contains the trained model
mlruns/, the directory created by MLflow that stores experiment runs, parameters, metrics, and artifacts
mlflow_meta.json, which contains experiment metadata including the MLflow run ID, dataset hash, and model accuracy

Return to the MLflow UI to confirm that the audit record lives inside the experiment tracker.

However, MLflow alone does not solve distribution or downstream verification. That's where packaging comes in.

Package the Model and Lineage to GHCR with KitOps

You need a Kitfile for this project. To generate one, run:

kit init .

Once initialized, edit the generated Kitfile to embed the MLflow metadata:

manifestVersion: v1.0.0

package:
  name: rf-model-audit
  version: 0.1.0
  description: "RandomForest Iris model packaged with MLflow lineage"

model:
  name: rf_model
  path: ./models/model.pkl
  framework: scikit-learn
  parameters:
    mlflow_run_id: "<RUN_ID>"
    dataset_sha256: "<DATASET_HASH>"
    metrics:
      accuracy: <ACCURACY>

datasets:
  - name: iris
    path: ./data/iris.csv

Replace the placeholders using values from mlflow_meta.json.

This file defines the structure and metadata of your packaged model, including where to find the model file, which framework it uses, and which dataset or experiment it's linked to.

Now package the ModelKit by running:

export GH_OWNER=<github-username-or-org>
export TAG=ghcr.io/$GH_OWNER/rf-model-audit:0.1.0

kit pack . -t $TAG

Then push the packaged ModelKit to GHCR using this command:

kit push $TAG

To view the ModelKit configuration from the registry, run this command:

kit info --remote $TAG

This shows the configuration derived from the Kitfile, including embedded MLflow metadata.

To inspect the remote manifest and digests, run this command:

kit inspect --remote $TAG

This reveals the immutable manifest, including cryptographic digests for each component. These digests are the foundation of a tamper-evident audit trail.

So far, we have built an end-to-end audit trail using MLflow for experiment tracking and KitOps for packaging and distributing the model through GHCR. This approach works well for individual developers and open source workflows, where models can be shared and inspected using standard OCI tooling.

However, GHCR is only a registry. It does not provide centralized visibility, governance workflows, approval controls, or security insights across models and teams.

This is what Jozu was built for. It extends KitOps with management, governance, and security capabilities designed for teams operating at scale.

Using Jozu for Centralized Audit Logging and Governance

At this point, the ModelKit has been packaged with KitOps and pushed to GHCR, carrying training metadata from MLflow such as dataset hashes, metrics, and run identifiers. While this is sufficient for distributing artifacts, it still requires teams to rely on CLI access and manual inspection to understand what was built, by whom, and whether it is safe to deploy.

Jozu addresses this gap by acting as a managed control plane for ModelKits. Instead of treating models as opaque registry artifacts, Jozu ingests ModelKits and exposes their metadata, lineage, and security posture through a centralized interface designed for teams.

Pushing the ModelKit to Jozu

Because Jozu is OCI-compatible, the same ModelKit produced earlier can be pushed directly to Jozu without repackaging. Once authenticated, pushing the ModelKit follows the same workflow as any other registry:

kit push $TAG jozu.ml/<org>/rf-model-audit:0.1.0

This operation uploads the existing ModelKit along with all embedded metadata, including references to the MLflow run, dataset hash, and model artifacts.

After pushing the ModelKit to Jozu Hub, the model becomes visible through a single, versioned interface that aggregates packaging, security, and change information.

Repository Overview

The repository view shows the available ModelKit versions and provides a pull command that allows authorized users to retrieve the exact ModelKit using KitOps. This ensures the same model, dataset, and configuration are consistently pulled across environments.

ModelKit Contents

The ModelKit Contents tab lists the components included in the package:

The model artifact
Associated datasets
Configuration generated from the Kitfile

Security Report

The Security Report tab displays the results of automated scans performed on the ModelKit. Findings are associated with a specific model version, allowing teams to review security status before deployment or promotion.

ModelKit Diff

In case you have multiple versions, the ModelKit Diff view compares two versions of a ModelKit. It highlights changes in configuration, model layers, and metadata using digests and size differences. This makes it possible to see exactly what changed between releases.

Model Cards

Jozu supports Model Cards as part of the ModelKit workflow. If a Model Card is not present, Jozu explicitly indicates that documentation is missing, making gaps visible rather than implicit.

GHCR vs Jozu in an ML Audit Workflow

Both GHCR and Jozu are OCI-compatible registries and can store ModelKits produced with KitOps. The difference lies in what happens after the artifact is pushed. GHCR focuses on storage and distribution, while Jozu is designed to surface model context, lineage, and review-oriented information needed as teams scale.

Below is a table comparing the roles of GHCR and Jozu in an ML audit and governance workflow.

Capability	GHCR	Jozu
OCI-compatible registry	Yes	Yes
Store and distribute ModelKits	Yes	Yes
Push and pull via KitOps CLI	Yes	Yes
Centralized UI for ModelKit contents	No	Yes
Visibility into model, dataset, and config layers	No	Yes
Dataset hash and experiment reference visibility	No	Yes
Automated security scan results per version	No	Yes
Version-to-version ModelKit diff	No	Yes
Model Card support	No	Yes
Centralized audit and review interface	No	Yes

Conclusion

Audit logging brings structure and accountability to every stage of the ML lifecycle. By combining MLflow for experiment tracking, KitOps for model packaging, and GHCR for artifact distribution, teams can create a traceable workflow that connects data, training runs, and packaged models using open tooling.

As usage scales beyond individual developers, additional needs emerge around centralized visibility, security insights, and governance. Jozu builds on top of this existing KitOps workflow to surface model lineage, version changes, and risk signals through a centralized interface, without replacing the underlying open-source components.

Start using Jozu Hub to make your ML deployments traceable, secure, and easy to audit.

Share this post