Watermarking Degrades Alignment in Language Models
Analysis and Mitigation
Watermarking has emerged as a critical tool for ensuring the authenticity of LLM outputs. However, its broader effects on model behavior remain underexplored. In our paper, “Watermarking Degrades Alignment in Language Models: Analysis and Mitigation,” presented at the 1st GenAI Watermarking Workshop at ICLR 2025, we investigate how watermarking impacts...
[Read More]