OpenAI/Gumbel Watermarking

The Gumbel watermarking scheme by Aaronson (2023) leverages the Gumbel-Max trick for deterministic token selection while preserving the original sampling distribution.

Example code: 📓 View On GitHub

Author: Scott Aaronson

Theory

The Gumbel watermark uses the Gumbel-Max trick for deterministic token selection by hashing the preceding \(h\) tokens with a key \(k\) to generate scores \(r_t\) for each token in the vocabulary at timestep \(t\).

The Classical Gumbel-Max Formulation

In its classical form, the Gumbel-Max trick samples tokens according to:

\[x_t^* = \text{arg max}_{v \in V} \left[ \log p_v + G_v \right]\]

where \(p_v\) is the probability of token \(v\) from the language model, and \(G_v \sim \text{Gumbel}(0, 1)\) are i.i.d. Gumbel-distributed random variables.

For watermarking, we replace the random Gumbel noise with deterministic pseudo-random values generated from a hash of the context. Using uniform random variables \(u_v \sim \text{Uniform}([0, 1])\), we can generate Gumbel noise via the transformation \(G_v = -\log(-\log(u_v))\). This gives us:

\[x_t^* = \text{arg max}_{v \in V} \left[ \log p_v - \log(-\log(u_v)) \right]\]

Implementation via an Equivalent Form

The expression above can be simplified. Note that it is mathematically equivalent to:

\[x_t^* = \text{arg max}_{v \in V} \left[ u_v^{1/p_v} \right]\]

This is what the implementation uses. We prove the equivalence below.

Mathematical Proof of Equivalence

Starting with the Gumbel-Max formulation:

\[\text{arg max}_v \left[ \log(p_v) - \log(-\log(u_v)) \right]\]

We can manipulate this step by step:

\[\begin{split}&= \text{arg max}_v \left[ \log(p_v) - \log(-\log(u_v)) \right] \\ &= \text{arg max}_v \left[ \log \left( \frac{p_v}{-\log(u_v)} \right) \right] \quad \text{(log subtraction)} \\ &= \text{arg max}_v \left[ \exp \left( \log \left( \frac{p_v}{-\log(u_v)} \right) \right) \right] \quad \text{(exp preserves argmax)} \\ &= \text{arg max}_v \left[ p_v \cdot \frac{1}{-\log(u_v)} \right] \\ &= \text{arg max}_v \left[ p_v \cdot \exp(-\log(-\log(u_v))) \right] \\ &= \text{arg min}_v \left[ \frac{-\log(u_v)}{p_v} \right] \quad \text{(minimizing denominator)} \\ &= \text{arg max}_v \left[ \frac{\log(u_v)}{p_v} \right] \quad \text{(flip sign, flip min/max)} \\ &= \text{arg max}_v \left[ \log(u_v^{1/p_v}) \right] \\ &= \text{arg max}_v \left[ u_v^{1/p_v} \right] \quad \text{(log is monotonic)}\end{split}\]

Therefore, computing \(\text{arg max}_v [u_v^{1/p_v}]\) is equivalent to the Gumbel-Max trick. This simplification has several advantages:

Computational Efficiency: Avoids nested logarithms
Numerical Stability: The form \(u^{1/p}\) is more stable than \(-\log(-\log(u))\)
Implementation Simplicity: Fewer operations and clearer code

Implementation Details

The algorithm proceeds as follows:

Generate deterministic random values: Hash the previous \(h\) tokens to obtain a seed, then generate \(|V|\) uniform random values \(\{u_v\}\)
Apply nucleus (top-p) sampling: Filter to keep only the top-p probability mass
Compute the transformed scores: For each token \(v\) in the filtered vocabulary, compute \(u_v^{1/p_v}\)
Token selection: Choose \(\text{arg max}_v [u_v^{1/p_v}]\)

This preserves the original sampling distribution while ensuring deterministic outputs.

Interpretation

The transformation \(u_v^{1/p_v}\) balances probability and randomness:

For high-probability tokens (\(p_v\) large), \(1/p_v\) is small, so \(u_v^{1/p_v}\) remains close to \(u_v\)
For low-probability tokens (\(p_v\) small), \(1/p_v\) is large, so \(u_v^{1/p_v}\) can be significantly boosted if \(u_v\) is large

High-probability tokens tend to win without needing large \(u_v\) values, but occasionally a lower-probability token with a favorable \(u_v\) can prevail. This reproduces sampling from the original distribution.

The Gumbel-Max trick adds controlled randomness that, when combined with the true probabilities, produces samples from the correct distribution. The implementation achieves this without explicitly computing Gumbel random variables.

Detection

The detection score is computed as:

\[\text{Score}(x) = \sum_{t=1}^{n} \log\left(\frac{1}{1 - u_{x_t}}\right)\]

which follows a gamma distribution \(\Gamma(n, 1)\) under the watermark hypothesis, enabling statistical detection.

Note

This approach preserves the distribution while ensuring consistent output for fixed seeds, though it does reduce response diversity across different generations with the same context.

Paper reference

Aaronson, S. (2023). Gumbel Watermarking. Scott Aaronson’s Blog. https://scottaaronson.blog/?p=6823

Example code

import os
import sys

os.environ["VLLM_USE_V1"] = "1"
os.environ["VLLM_ENABLE_V1_MULTIPROCESSING"] = "0"

from vllm import LLM, SamplingParams
from vllm_watermark.core import (
    DetectionAlgorithm,
    WatermarkedLLMs,
    WatermarkingAlgorithm,
)
from vllm_watermark.watermark_detectors import WatermarkDetectors

# Load the vLLM model
llm = LLM(
    model="meta-llama/Llama-3.2-1B",
    enforce_eager=True,
    max_model_len=1024,
)

# Create a Gumbel watermarked LLM
wm_llm = WatermarkedLLMs.create(
    model=llm,
    algo=WatermarkingAlgorithm.OPENAI,  # Uses Gumbel watermarking
    seed=42,
    ngram=2,
)

# Create Gumbel detector with matching parameters
detector = WatermarkDetectors.create(
    algo=DetectionAlgorithm.OPENAI_Z,
    model=llm,
    ngram=2,
    seed=42,
    payload=0,
    threshold=0.05,
)

# Generate watermarked text
prompts = ["Write about machine learning applications"]
sampling_params = SamplingParams(temperature=1.0, top_p=0.95, max_tokens=64)
outputs = wm_llm.generate(prompts, sampling_params)

# Detect watermark
for output in outputs:
    generated_text = output.outputs[0].text
    detection_result = detector.detect(generated_text)

    print(f"Generated: {generated_text}")
    print(f"Watermarked: {detection_result['is_watermarked']}")
    print(f"P-value: {detection_result['pvalue']:.6f}")

Notes

Uses deterministic token selection via Gumbel-Max trick
Preserves original sampling distribution when hash length is large
Detection based on gamma distribution of accumulated scores
Trade-off between consistency and response diversity