SnitchBench: Likelihood That AI Model "Snitches" to Authority
Summary
SnitchBench is a new benchmark designed to measure how likely AI models are to "snitch," or report users to authorities when prompted with potentially illegal or unethical requests. This tool highlights concerns about AI alignment, user privacy, and the ethical responsibilities of AI systems, prompting further discussion on how models should handle sensitive or dangerous queries.