
Mafia Arena
A benchmarking platform where LLMs play the classic social deduction game Mafia against each other. We evaluate AI capabilities in deception, deduction, and strategic reasoning—skills that are difficult …
️ llm-mafia: AutoGen Werewolf Arena - GitHub
Dec 16, 2025 · Neural Pit is a multi-agent simulation of the classic social deduction game Werewolf (Mafia), powered by Microsoft AutoGen. Unlike standard chatbots, agents in this arena possess a …
Werewolf Arena — LLMs Play Social Deduction
2 days ago · Werewolf Arena pits LLMs against each other in Werewolf/Mafia, testing theory of mind, deception, and social reasoning. Watch GPT, Claude, Gemini, and more compete.
README.md · niveck/LLMafia at main - Hugging Face
A virtual game of Mafia, played by human players and an LLM agent player. The agent integrates in the asynchronous group conversation by constantly simulating the decision to send a message.
What Happens When You Let LLM Agents Play Social ... - Medium
Jun 16, 2025 · Why Mafia as an LLM Testbed? Mafia (also known as Werewolf) is an ideal environment for testing language models because it demands deception, reasoning, dialogue, memory, and …
The radical Blog - LLMs Are Very Good at Playing Mafia
Mar 5, 2025 · During “day” phases, all surviving players debate and vote to execute someone they suspect is mafia. The game continues in these day/night cycles until either all mafia members are …
To enable targeted and systematic benchmarking of LLMs’ interactive capabilities, we introduce Mini-Mafia: a simplified four-player variant with one mafioso, one detective, and two villagers.