KGW (Maryland) Watermarking

The KGW watermark by Kirchenbauer et al. (2023) partitions the vocabulary into “green” and “red” tokens using a pseudorandom function that maps the previous tokens to determine token partitioning.

Example code: 📓 View On GitHub

Author: Kirchenbauer et al.

Theory

The KGW watermark partitions the vocabulary into “green” and “red” tokens using a pseudorandom function (PRF) that maps the previous \(h\) tokens to a random seed value, determining the partitioning of tokens.

At each generation step \(t\), the logit scores for tokens in the green set \(G_t\) are increased by a fixed bias \(\delta\), promoting their selection:

\[\text{logit}_{wm}(x_t) = \text{logit}(x_t) + \delta \cdot \mathbf{1}_{x_t \in G_t}\]

where \(\mathbf{1}\) is the indicator function.

Detection is performed by re-computing the green token sets and counting how many generated tokens \(|s|\) belong to these sets. Under the null hypothesis (unwatermarked text), \(|s|\) approximately follows a binomial distribution.

The presence of a watermark is tested by computing the z-score:

\[z = \frac{|s| - \gamma T}{\sqrt{\gamma(1-\gamma)T}}\]

where \(T\) is the total token count and \(\gamma\) is the expected fraction of green tokens. A large z-score indicates the presence of the watermark.

Note

The watermark can be detected without model access, making it practical for deployment scenarios.

Paper reference

Kirchenbauer, J., Geiping, J., Wen, Y., Katz, J., Miers, I., & Goldstein, T. (2023). A Watermark for Large Language Models. arXiv preprint arXiv:2301.10226. https://arxiv.org/pdf/2301.10226

Example code

import os
import sys

# IMPORTANT: For watermarking to work, set these environment variables
os.environ["VLLM_USE_V1"] = "1"
os.environ["VLLM_ENABLE_V1_MULTIPROCESSING"] = "0"

from vllm import LLM, SamplingParams
from vllm_watermark.core import (
    DetectionAlgorithm,
    WatermarkedLLMs,
    WatermarkingAlgorithm,
)
from vllm_watermark.watermark_detectors import WatermarkDetectors

# Load the vLLM model
llm = LLM(model="meta-llama/Llama-3.2-1B")

# Create a KGW/Maryland watermarked LLM
wm_llm = WatermarkedLLMs.create(
    llm,
    algo=WatermarkingAlgorithm.MARYLAND,
    seed=42,
    ngram=2,
    gamma=0.5,  # Expected fraction of green tokens
    delta=2.0   # Logit bias for green tokens
)

# Create KGW detector with matching parameters
detector = WatermarkDetectors.create(
    algo=DetectionAlgorithm.MARYLAND_Z,
    model=llm,
    ngram=2,
    seed=42,
    gamma=0.5,
    delta=2.0,
    threshold=0.05,
)

# Generate watermarked text
prompts = ["Write a short story about a robot learning to paint"]
sampling_params = SamplingParams(temperature=0.8, top_p=0.95, max_tokens=64)
outputs = wm_llm.generate(prompts, sampling_params)

# Detect watermark
for output in outputs:
    generated_text = output.outputs[0].text
    detection_result = detector.detect(generated_text)

    print(f"Generated: {generated_text}")
    print(f"Watermarked: {detection_result['is_watermarked']}")
    print(f"P-value: {detection_result['pvalue']:.6f}")
    print(f"Z-score: {detection_result['score']:.4f}")

Notes

  • Uses pseudorandom function to partition vocabulary into green/red sets

  • Parameter \(\delta\) controls the strength of the watermark bias

  • Parameter \(\gamma\) sets the expected fraction of green tokens (typically 0.5)

  • Detection is model-agnostic and based on statistical hypothesis testing

  • Higher \(\delta\) values create stronger watermarks but may affect text quality