Apurv Verma

Notes from NeurIPS 2025

Posted on January 3, 2026

(Views expressed here are my own and do not reflect those of my employer.) [Read More]
Tags:
- NeurIPS
- Conference
- RL
- Alignment
- LLMs
vLLM-Watermark

Tiny, Hackable, Lightning-fast Watermarking for Researchers

Posted on October 4, 2025

Most text-watermarking papers ship code that nobody can actually use. Each codebase is welded to a specific experimental setup, and none of them are fast. MarkLLM consolidates several watermarking methods, but Gumbel watermarking still took hours on my benchmarks. [Read More]
Tags:
- Watermarking
- Library
- Toolkit
- OSS
- LLMs
Watermarking Degrades Alignment in Language Models

Analysis and Mitigation

Posted on April 24, 2025

Watermarking is now used to authenticate LLM outputs. But we know surprisingly little about how it affects model behavior. Our paper, “Watermarking Degrades Alignment in Language Models: Analysis and Mitigation,” has now been accepted to TMLR (an earlier version was presented at the 1st GenAI Watermarking Workshop at ICLR 2025).... [Read More]
Tags:
- AISafety
- Watermarking
- Alignment
- LLMs
Red-Teaming Large Language Models (LLMs)

Operationalizing a Threat Model (SoK)

Posted on November 17, 2024

Imagine asking ChatGPT for homework help and receiving the response “Please die.”; or picture discovering that an AI chatbot has leaked someone’s private medical information. These real incidents that have made headlines in recent months underscore the fragility of current AI safeguards. [Read More]
Tags:
- AISafety
- AISecurity
- RedTeaming
- LLMSecurity
- LLMs

Notes from NeurIPS 2025

vLLM-Watermark

Tiny, Hackable, Lightning-fast Watermarking for Researchers

Watermarking Degrades Alignment in Language Models

Analysis and Mitigation

Red-Teaming Large Language Models (LLMs)

Operationalizing a Threat Model (SoK)