January 2025

  • One of the most talked-about topics in AI recently is DeepSeek and its newly launched R-1 model. Its innovative methodology, low operational cost, and high performance have created a substantial impact on the AI community and even affected the U.S. economy. Notably, major AI companies, including Nvidia, experienced significant stock price declines after the announcement.…

  • A study by Anthropic shows that language models, such as Claude 3 Opus, can fake alignment with training objectives to disguise their actual behaviors. Simply put, if you inform the model that it’s being trained and non-compliance will lead to modification, there’s about 15% chance it will act as instructed to avoid changes. This study…

  • Andrew Ng’s article on AI deception is a standout in this issue. He provides an overview of research on when AI models become deceptive, highlighting six major tasks tested. A very interesting read. AI agent technology is anticipated to be a major trend by 2025, prompting the inclusion of two articles on the subject in…

  • I took a two-week vacation in China with my family during the Christmas and New Year holidays in December. It was a wonderful experience for my kids, as they had not been back to China since 2019. Although I worked in China during the COVID-19 pandemic, its strict travel restrictions made it impossible for them…

  • Happy New Year! The AI Security Newsletter was on a two-week pause while I vacationed with family in China. I hope all my readers enjoyed the holiday season. Now, I’m excited to return and share the latest AI security news with you. As we enter another thrilling year in the AI era, MIT Technology Review…