Blog
  • About Me
  • Publications
  • Statement
  • Search
Navigation bar avatar
✕

    Apurv Verma

    • Watermarking Degrades Alignment in Language Models

      Analysis and Mitigation

      Posted on April 24, 2025

      Post thumbnail
      Post thumbnail
      Watermarking has emerged as a critical tool for ensuring the authenticity of LLM outputs. However, its broader effects on model behavior remain underexplored. In our paper, “Watermarking Degrades Alignment in Language Models: Analysis and Mitigation,” presented at the 1st GenAI Watermarking Workshop at ICLR 2025, we investigate how watermarking impacts... [Read More]
      Tags:
      • AISafety
      • Watermarking
      • Alignment
      • LLMs
    • Red-Teaming Large Language Models (LLMs)

      Operationalizing a Threat Model (SoK)

      Posted on November 17, 2024

      Post thumbnail
      Post thumbnail
      Imagine asking ChatGPT for homework help and receiving the response “Please die.” Or picture discovering that an AI chatbot has leaked someone’s private medical information. These are real incidents that have made headlines in recent months. [Read More]
      Tags:
      • AISafety
      • AISecurity
      • RedTeaming
      • LLMSecurity
      • LLMs
    • Email me
    • GitHub
    • Twitter
    • LinkedIn
    • StackOverflow
    • ORCID
    • Google Scholar

    Apurv Verma  •  2025  •  vermaapurv.com

    Powered by Beautiful Jekyll