Watermarking is now used to authenticate LLM outputs. But we know surprisingly little about how it affects model behavior. Our paper, “Watermarking Degrades Alignment in Language Models: Analysis and Mitigation,” has now been accepted to TMLR (an earlier version was presented at the 1st GenAI Watermarking Workshop at ICLR 2025)....
[Read More]