AI Security Newsletter (10-21-2024)

Mistral introduces two new edge models, featuring improved performancde and a long context window of 128K tokens. Meta’s FAIR lab has designed a novel training method that enhances LLMs’ reasoning capabilities. These developments highlight two major AI trends: small models optimized for resource-limited devices and improved reasoning in LLMs.

Meanwhile, a new threat has emerged involving the infiltration of AI chatbots with malicious instructions via invisible Unicode characters. This type of prompt injection attack poses a severe risk to AI systems, particularly those employing Agentic AI. It’s crucial to establish effective defenses against such vulnerabilities.

Technology & Tools

Mistral AI Unveils New Edge Models: Ministral 3B and 8B

Mistral AI celebrates the first anniversary of Mistral 7B by launching two advanced models, Ministral 3B and 8B, designed for on-device computing and edge use cases. These models, known as les Ministraux, excel in knowledge, commonsense reasoning, and efficiency, supporting up to 128k context length and featuring innovative attention patterns for improved performance. Aimed at providing privacy-first, low-latency solutions for a range of applications from on-device translation to autonomous robotics, les Ministraux set new benchmarks in the sub-10B category. Available now, they offer competitive pricing and are poised to enhance both hobbyist projects and global manufacturing processes.

https://mistral.ai/news/ministraux/

Apple Study Reveals LLMs Lack Formal Reasoning Skills

A groundbreaking study by Apple’s AI researchers highlights a critical flaw in language models (LLMs): a complete absence of formal reasoning, relying instead on sophisticated pattern matching. This vulnerability, demonstrated through a novel task called GSM-NoOp, shows that even minor changes can significantly affect outcomes. The findings echo earlier research and underscore the challenges in building reliable AI agents based on current LLM architectures, advocating for a neurosymbolic approach to advance AI development.

https://garymarcus.substack.com/p/llms-dont-do-formal-reasoning-and

Arch: Revolutionizing AI Application Gateways

Arch introduces a groundbreaking Layer 7 gateway tailored for generative AI applications, offering a suite of features designed to streamline prompt handling and processing. With its foundation on Envoy, Arch provides robust capabilities including jailbreak prevention, API integration for personalized responses, disaster recovery, and comprehensive observability of AI interactions. Engineered for speed with specialized LLMs, Arch enhances application safety, efficiency, and flexibility across various programming languages without the complexities of direct library upgrades.

https://github.com/katanemo/arch

Introducing Thinking LLMs for Enhanced General Instruction Following

Researchers at Meta FAIR, UC Berkeley, and NYU have developed a training method to equip Large Language Models (LLMs) with the ability to “think” before responding to instructions, enhancing their performance across a variety of tasks beyond traditional reasoning and problem-solving. This Thought Preference Optimization (TPO) approach iteratively trains LLMs to generate internal thoughts that lead to better responses, showing significant improvements in benchmarks like AlpacaEval and Arena-Hard. This method opens new avenues for applying LLMs in fields such as marketing, health, and general knowledge, where explicit thinking was not previously considered beneficial.

https://mail.yahoo.com/d/folders/1/messages/270001

Risks & Vulnerabilities

Invisible Text Exploits in AI Chatbots Exposed
Researchers have unveiled a method for sneaking malicious instructions into AI chatbots like Claude and Copilot using invisible Unicode characters, exploiting a quirk in the text encoding standard. This steganographic technique allows attackers to conceal payloads and extract sensitive data without detection by users. Demonstrated through proof-of-concept attacks, this vulnerability highlights the challenges of securing AI against sophisticated prompt injection and ASCII smuggling tactics. Mitigations have been introduced by Microsoft, while other platforms like OpenAI and Google Gemini are addressing the issue with varying degrees of success.

https://arstechnica.com/security/2024/10/ai-chatbots-can-read-and-write-invisible-text-creating-an-ideal-covert-channel/

AI Detection Tools Mislabel Student Work as AI-Generated

AI detection tools, used by about two-thirds of teachers to identify AI-generated content, are causing controversy by falsely flagging student work as AI-produced, with significant consequences. Moira Olmsted’s experience highlights the issue: her assignment was wrongly identified as AI-generated due to her formulaic writing style, a result of her autism spectrum disorder. Despite high accuracy claims from detection tool companies, instances of false positives raise concerns about their impact on students, especially those who are neurodivergent or non-native English speakers. The reliance on these tools in educational settings is creating an environment of anxiety and mistrust, pushing students to go to great lengths to prove the authenticity of their work.

https://www.bloomberg.com/news/features/2024-10-18/do-ai-detectors-work-students-face-false-cheating-accusations

Hong Kong Deepfake Romance Scam Nets $46M
In a sophisticated AI scam, Hong Kong police have arrested 27 individuals for defrauding victims of $46 million through fake cryptocurrency investments using deepfake technology. The scammers created alluring online personas with AI, engaging victims in video calls with fabricated appearances and voices, before convincing them to invest in non-existent cryptocurrency platforms. The operation, linked to the organized crime group Sun Yee On, highlights the growing challenge of real-time deepfakes in cybercrime.

https://arstechnica.com/ai/2024/10/deepfake-lovers-swindle-victims-out-of-46m-in-hong-kong-ai-scam/

Business & Products

Dane Stuckey Appointed as OpenAI’s New CISO

Dane Stuckey, previously the Chief Information Security Officer (CISO) at Palantir, has transitioned to OpenAI to take on the role of CISO, aiming to bolster the organization’s security framework. Stuckey, with a rich background in digital forensics and security across various sectors, emphasizes the mportance of stringent security measures to safeguard OpenAI’s technologies and users. His experience is expected to be instrumental in advancing OpenAI’s collaborations, including its growing relationship with the U.S. Department of Defense, especially after OpenAI’s recent initiatives to enhance its security capabilities and infrastructure.

https://techcrunch.com/2024/10/15/former-palantir-ciso-dane-stuckey-joins-openai-to-lead-security/

NotebookLM Launches Business Edition and Enhances Audio Overviews

NotebookLM has enhanced its Audio Overview feature, allowing users to customize the AI hosts’ focus and expertise level, and introduced NotebookLM Business for organizations. The updates include the ability to guide AI-generated audio discussions and listen in the background, aiming to improve user engagement with complex information. NotebookLM Business, integrated with Google Workspace, promises advanced features, data privacy, and security, marking the end of its “Experimental” phase and expanding its user base beyond the current 80,000 organizations.

https://blog.google/technology/ai/notebooklm-update-october-2024/

World Rebrands and Unveils New Iris-Scanning Technology

Sam Altman’s “World,” formerly known as Worldcoin, aims to redefine human verification online with its latest iris-scanning Orb technology. The rebranding reflects its broader mission beyond cryptocurrency, focusing on securing human identity in the digital age. Despite skepticism and regulatory hurdles, World continues to innovate, introducing faster, more accessible Orbs and partnerships to expand its verification network. Amidst concerns over privacy and trust, World’s integration into everyday life remains a significant challenge.

https://techcrunch.com/2024/10/17/sam-altmans-worldcoin-becomes-world-and-shows-new-iris-scanning-orb-to-prove-your-humanity/