KGW (Maryland) Watermarking

The KGW watermark by Kirchenbauer et al. (2023) partitions the vocabulary into “green” and “red” tokens using a pseudorandom function that maps the previous tokens to determine token partitioning.

Example code: 📓 View On GitHub

Author: Kirchenbauer et al.

Theory

The KGW watermark partitions the vocabulary into “green” and “red” tokens using a pseudorandom function (PRF) that maps the previous \(h\) tokens to a random seed value, determining the partitioning of tokens.

At each generation step \(t\), the logit scores for tokens in the green set \(G_t\) are increased by a fixed bias \(\delta\), promoting their selection:

\[\text{logit}_{wm}(x_t) = \text{logit}(x_t) + \delta \cdot \mathbf{1}_{x_t \in G_t}\]

where \(\mathbf{1}\) is the indicator function.

Detection is performed by re-computing the green token sets and counting how many generated tokens \(|s|\) belong to these sets. Under the null hypothesis (unwatermarked text), \(|s|\) approximately follows a binomial distribution.

The presence of a watermark is tested by computing the z-score:

\[z = \frac{|s| - \gamma T}{\sqrt{\gamma(1-\gamma)T}}\]

where \(T\) is the total token count and \(\gamma\) is the expected fraction of green tokens. A large z-score indicates the presence of the watermark.

Note

The watermark can be detected without model access, making it practical for deployment scenarios.

Paper reference

Kirchenbauer, J., Geiping, J., Wen, Y., Katz, J., Miers, I., & Goldstein, T. (2023). A Watermark for Large Language Models. arXiv preprint arXiv:2301.10226. https://arxiv.org/pdf/2301.10226

Example code

import os
import sys

# IMPORTANT: For watermarking to work, set these environment variables
os.environ["VLLM_USE_V1"] = "1"
os.environ["VLLM_ENABLE_V1_MULTIPROCESSING"] = "0"

from vllm import LLM, SamplingParams
from vllm_watermark.core import (
    DetectionAlgorithm,
    WatermarkedLLMs,
    WatermarkingAlgorithm,
)
from vllm_watermark.watermark_detectors import WatermarkDetectors

# Load the vLLM model
llm = LLM(model="meta-llama/Llama-3.2-1B")

# Create a KGW/Maryland watermarked LLM
wm_llm = WatermarkedLLMs.create(
    llm,
    algo=WatermarkingAlgorithm.MARYLAND,
    seed=42,
    ngram=2,
    gamma=0.5,  # Expected fraction of green tokens
    delta=2.0   # Logit bias for green tokens
)

# Create KGW detector with matching parameters
detector = WatermarkDetectors.create(
    algo=DetectionAlgorithm.MARYLAND_Z,
    model=llm,
    ngram=2,
    seed=42,
    gamma=0.5,
    delta=2.0,
    threshold=0.05,
)

# Generate watermarked text
prompts = ["Write a short story about a robot learning to paint"]
sampling_params = SamplingParams(temperature=0.8, top_p=0.95, max_tokens=64)
outputs = wm_llm.generate(prompts, sampling_params)

# Detect watermark
for output in outputs:
    generated_text = output.outputs[0].text
    detection_result = detector.detect(generated_text)

    print(f"Generated: {generated_text}")
    print(f"Watermarked: {detection_result['is_watermarked']}")
    print(f"P-value: {detection_result['pvalue']:.6f}")
    print(f"Z-score: {detection_result['score']:.4f}")

Notes

Uses pseudorandom function to partition vocabulary into green/red sets
Parameter \(\delta\) controls the strength of the watermark bias
Parameter \(\gamma\) sets the expected fraction of green tokens (typically 0.5)
Detection is model-agnostic and based on statistical hypothesis testing
Higher \(\delta\) values create stronger watermarks but may affect text quality