Watermarking is now used to authenticate LLM outputs. But we know surprisingly little about how it affects model behavior. In our new paper, “Watermarking Degrades Alignment in Language Models: Analysis and Mitigation,” presented at the 1st GenAI Watermarking Workshop at ICLR 2025, we investigate how watermarking impacts key alignment properties...
[Read More]