cybersecurity

  • A study by Anthropic shows that language models, such as Claude 3 Opus, can fake alignment with training objectives to disguise their actual behaviors. Simply put, if you inform the model that it’s being trained and non-compliance will lead to modification, there’s about 15% chance it will act as instructed to avoid changes. This study…

  • Andrew Ng’s article on AI deception is a standout in this issue. He provides an overview of research on when AI models become deceptive, highlighting six major tasks tested. A very interesting read. AI agent technology is anticipated to be a major trend by 2025, prompting the inclusion of two articles on the subject in…

  • Happy New Year! The AI Security Newsletter was on a two-week pause while I vacationed with family in China. I hope all my readers enjoyed the holiday season. Now, I’m excited to return and share the latest AI security news with you. As we enter another thrilling year in the AI era, MIT Technology Review…

  • Google’s latest advancement in quantum computing, “Willow,” demonstrates significant progress. However, concerns about its impact on cybersecurity, especially regarding Bitcoin, have emerged. Fortunately, Bitcoin’s encryption remains secure for now, and the community is prepared to address potential challenges. Last week also marked a milestone in generative AI with major releases from OpenAI, Google, Cohere, and…

  • In this issue, I want to spotlight OWASP’s recent developments in GenAI security guidance. This is an extension of the OWASP Top 10 for LLM Application Security Project. The new guidance provides practical resources for addressing deepfake threats, creating AI Security Centers of Excellence, and navigating the AI Security Solution Landscape. It serves as a…