vLLM-Watermark

Earlier this year, I worked on watermarking research that revealed how watermarking inadvertently affects alignment. During that work, I kept running into the same problem: I could not find a package that provides implementations of common watermarking algorithms properly integrated with modern inference engines like vLLM or SGLang. Existing implementations in pure PyTorch are painfully slow. I have talked to numerous researchers working in this space who share the same complaint. MarkLLM comes close, and the project has matured considerably since I first looked at it, but when I was running my experiments, I found sampling-based algorithms like Gumbel watermarking remained too slow for large-scale evaluation. My goal here is simple: provide tiny, hackable, minimal watermarking implementations in a single repository so that comparing different methods does not come at the cost of speed or accuracy.

The broader problem is that watermarking implementations for language models are scattered across repositories. Each paper includes its own implementation, tightly coupled to a specific experimental setup. Comparing methods means navigating different APIs, tokenization schemes, and optimization levels. Integration with serving frameworks like vLLM typically happens as an afterthought. Watermarking gets added as post-processing or patched into older generation code. The result? Implementations that are much slower than necessary, making it hard to run the kinds of large-scale experiments that actually matter.

Design

The high-level idea is quite simple: the code is abstracted in the form of generators and detectors, which I borrowed from existing implementations. Plugging it into vLLM proved more challenging than I expected. Many things in vLLM are opaque, and I had no prior experience working with it. But the core insight is straightforward: watermarking modifies the sampling distribution. If we intercept sampling at the right point, we can build something both minimal and fast.

In vLLM-Watermark, I use monkey patching to inject watermarking logic directly into vLLM’s sampling pipeline. Each watermark is a generator, typically 50-100 lines of code. By injecting the watermarking inside vLLM’s engine, the implementation inherits all the features vLLM provides for fast inference: PagedAttention, continuous batching, and more. The framework supports both sampling-based watermarks (like Gumbel) and logit-based watermarks (like KGW). Each implementation includes both generation and detection code.

This is where vLLM-Watermark really shines. Previously, when I was using native PyTorch code with for loops to generate watermarked text, even with batching, generation would take hours. With vLLM, I brought that down to minutes for datasets with several thousand examples. For algorithms like Gumbel watermark, the performance difference is dramatic.

There are some limitations you should know about. I could only get this working on a single GPU because the runtime patching requires disabling vLLM’s multiprocessing. You cannot patch objects at runtime in Python across processes. The patching approach is admittedly hackier than a clean API integration, and it could break with vLLM version upgrades. I have tried to make it as future-proof as possible by isolating the patching logic. For my research needs, the speed gains far outweigh these limitations. If you need multi-GPU support, this approach may not work for you, but for single-GPU experiments, it delivers exactly what I needed: production-level speed with research-level flexibility.

Contributing

I am actively looking for collaborators on this project. Adding a new algorithm is straightforward: you need a generator class (50-100 lines), a detection function, and a usage example. Your implementation automatically runs at vLLM speed and can be compared directly with other methods in the repository.

The goal is to develop this into a toolkit paper for JMLR MLOSS, and contributors of watermarking algorithms will be invited as co-authors. Having multiple implementations in one place reveals which design choices actually matter and which are artifacts of specific implementations. This kind of clarity is hard to achieve when implementations are scattered across repositories.

Documentation is available at vLLM-Watermark.

Citation

If you use this framework in your research, please consider citing:

@software{vllm_watermark,
  title  = {vLLM-Watermark: A tiny, hackable research framework for
            LLM watermarking experiments},
  author = {Verma, Apurv},
  year   = {2025},
  url    = {https://github.com/dapurv5/vLLM-Watermark}
}