Andrej Karpathy's LLM Council: A Weekend Hack That Reveals Enterprise AI's Future

A Weekend Project, a Seismic Shift: Karpathy's 'Vibe Code' Unveiled

Imagine wanting to read a book, but not alone. Instead, you're surrounded by a panel of brilliant minds, each offering their perspective, debating the nuances, and ultimately, synthesizing a final, definitive answer. That's the premise behind Andrej Karpathy's recent weekend project, the 'LLM Council,' a piece of software that's less about function and more about sparking a revolution in how we think about AI orchestration. While built in a weekend, primarily by AI assistants, this isn't just a quirky experiment; it's a quiet sketch of the future of enterprise AI, and it's time we paid attention.

The LLM Council: How It Works

At its core, the LLM Council is surprisingly simple. A user inputs a query, much like they would with ChatGPT. But behind the scenes, a three-stage process unfolds, mimicking the dynamics of a human council:

The Dispatch: The user's query is sent to a panel of AI models. Karpathy's setup includes powerhouses like OpenAI's GPT-5.1, Google's Gemini 3.0 Pro, Anthropic's Claude Sonnet 4.5, and xAI's Grok 4, each generating an initial response in parallel.
The Peer Review: Each AI model then receives the anonymized responses of its peers and is tasked with evaluating them. This injects a crucial layer of quality control, transforming the AI from a mere generator into a critic.
The Synthesis: Finally, a designated 'Chairman LLM' (currently Gemini 3) receives the original query, all the individual responses, and the peer rankings. It synthesizes this information into a single, authoritative answer for the user.

Karpathy himself noted the surprising outcomes, observing that the models frequently favored each other's responses over their own. This highlights a fascinating aspect of AI: the potential for shared biases and the need for careful evaluation.

The Architecture: A Lean, Mean, AI-Orchestrating Machine

For enterprise architects, the true value of the LLM Council lies not in its literary analysis but in its construction. It's a blueprint for a modern, minimal AI stack. Here's a quick breakdown:

Backend: Built with FastAPI, a modern Python framework.
Frontend: A standard React application using Vite.
Data Storage: Simple JSON files.
The Linchpin: OpenRouter, an API aggregator that normalizes the differences between various model providers.

The use of OpenRouter is particularly insightful. It allows Karpathy to treat frontier models as interchangeable components. By simply editing a configuration file (the `COUNCIL_MODELS` list), he can swap out models without rewriting the core application. This 'commoditization' of the model layer protects the application from vendor lock-in, crucial for any enterprise strategy.

From Prototype to Production: The Missing Pieces

While elegant in its simplicity, the LLM Council also reveals the chasm between a weekend project and a production-ready system. Here's what's missing, and what commercial AI infrastructure providers offer:

Authentication: The system lacks user authentication, meaning anyone with access can query the models.
Governance: There's no mechanism to redact Personally Identifiable Information (PII) or track user queries for compliance. Sending data to multiple external AI providers triggers immediate compliance concerns.
Reliability: The system assumes the OpenRouter API is always available and models will respond promptly, lacking circuit breakers, fallback strategies, and retry logic.

Companies like LangChain, AWS Bedrock, and various AI gateway startups are essentially selling the 'hardening' around the core logic that Karpathy demonstrated. They provide the security, observability, and compliance wrappers that turn a raw orchestration script into a viable enterprise platform.

'Vibe Code' and the Future of Software Engineering

Perhaps the most radical aspect of the LLM Council is its development philosophy. Karpathy describes it as '99% vibe-coded,' relying heavily on AI assistants. This hints at a future where code is 'promptable scaffolding' – disposable and easily rewritten by AI. This raises a critical question for enterprises: should they build custom, disposable tools using AI or buy expensive, rigid software suites?

The AI-as-Judge Problem: When Machines and Humans Disagree

The LLM Council also highlights a potential pitfall: the divergence between human and machine preferences. Karpathy's observation that the models preferred GPT-5.1, while he preferred Gemini, suggests that AI models may have shared biases. They might favor verbosity or specific formatting that doesn't align with human needs for brevity and accuracy. As enterprises rely on 'LLM-as-a-Judge' systems, this discrepancy matters. If the automated evaluator rewards 'wordy' answers while customers want concise solutions, metrics can be misleading.

Actionable Takeaways for Enterprise Platform Teams

Karpathy's LLM Council is more than a weekend project; it's a reference architecture and a harbinger of the future. Here's what enterprise technology leaders should consider:

Embrace the Multi-Model Approach: Karpathy's code proves that a multi-model strategy is technically feasible.
Focus on Governance, Not Just Routing: The real challenge isn't routing prompts but governing the data, ensuring compliance, security, and reliability.
Evaluate the 'Build vs. Buy' Decision Carefully: While the core orchestration logic is simple, the enterprise-grade wrappers are complex. Decide whether to build or buy based on your team's capabilities and strategic priorities.
Be Aware of AI Biases: Recognize that AI models may have different preferences than human users and build systems that account for these differences.
Consider 'Vibe Code' for Internal Tools: Explore the potential of AI-assisted development for creating custom, disposable tools that meet specific needs.

As platform teams gear up for 2026, Karpathy's code provides a valuable starting point. The question is: will companies build the enterprise-grade armor around the 'vibe code' themselves, or will they pay someone else to do it? The answer will shape the future of enterprise AI.

This post was published as part of my automated content series.