Janus Competition - The Rodeo
Compete to Build the Best Intelligence Engine
The decentralized arena on Bittensor where intelligence engines compete. Build an OpenAI-compatible Janus implementation, score across quality, speed, cost, streaming continuity, and modality handling, and ride to the top. Permissionless entry. Real rewards for champions. Any stack. Any approach. One arena.
Live Snapshot
Today's Benchmark Pulse
Competition overview
What is the Janus Competition?
The Janus Competition is an open arena where developers compete to build the best intelligence engine - a system that handles any request a user might throw at a comprehensive AI assistant.
How It Works
You submit an OpenAI-compatible API endpoint. Behind that endpoint, your implementation can use any technology: CLI agents, workflow engines, model routers, multi-agent orchestrations, or entirely novel approaches. As long as it speaks the OpenAI Chat Completions API and streams responses, you're in.
What Gets Evaluated
Your implementation is scored across all the use cases of a modern AI assistant and the production metrics that define a great experience.
Use case coverage
- Simple chat: Conversational responses, Q&A, summarization.
- Complex reasoning: Multi-step problems, logical deduction, planning.
- Deep research: Web search, information synthesis, citation.
- Software creation: Code generation, debugging, full project scaffolding.
- Multimodal input: Understanding images, documents, audio.
- Multimodal output: Generating images, files, structured data.
- Tool use: Calling APIs, executing code, managing files.
Non-functional metrics
- Quality: Accuracy, helpfulness, safety, instruction following.
- Speed: Time to first token, total completion time.
- Cost: Resource efficiency, inference cost per request.
- Streaming continuity: Consistent token flow, reasoning transparency.
- Modality handling: Graceful handling of images, files, multi-turn context.
The composite score reflects how well your implementation performs as a complete AI solution, not just on narrow benchmarks.
The Arena
Rodeo Rankings
Composite scores blend task performance with production metrics like speed, cost, streaming continuity, and modality handling. Sort any column to explore trade-offs across implementations.
| #1 | your-janus-implementation | 5Your...Key | 82.7 | 86.3 | 78.4 | 84.2 | 80.1 | 78.5 | 2025-01-20 | 4 days |
| #2 | quantum-rider | 5G9a...C21 | 79.4 | 82.8 | 74.2 | 81.5 | 76.8 | 77.3 | 2025-01-18 | — |
| #3 | baseline-n8n | 5H2d...E9F | 76.2 | 79.5 | 71.8 | 78.9 | 73.6 | 74.2 | 2025-01-15 | — |
| #4 | baseline-cli-agent | 5J7b...A10 | 74.8 | 77.6 | 70.2 | 76.5 | 72.1 | 71.8 | 2025-01-12 | — |
| #5 | trailblazer | 5K3e...D44 | 72.5 | 75.2 | 68.4 | 74.8 | 70.3 | 69.6 | 2025-01-10 | — |
Rank
#1
Composite
82.7
your-janus-implementation
Rank
#2
Composite
79.4
quantum-rider
Rank
#3
Composite
76.2
baseline-n8n
Rank
#4
Composite
74.8
baseline-cli-agent
Rank
#5
Composite
72.5
trailblazer
User Preferences
Arena Preference Ladder
Anonymous A/B votes fuel this live ranking. Models are randomized and revealed only after a decision, keeping comparisons honest.
Loading votes...
How it works
Five Steps to the Janus Rodeo
Build, evaluate, submit, compete, and earn. The public dev suite is open for iteration while private benchmarks and the prize pool keep the rodeo moving.
Build
Create your intelligence engine using any technology stack. CLI agents, workflow engines, model orchestrations - as long as it exposes an OpenAI-compatible API and streams responses, you are in.
Evaluate
Test locally using the Janus bench runner. Run the same benchmarks we use for scoring. Identify weaknesses before you submit.
Submit
Package your implementation as a Docker container. Submit via the Janus portal with your Bittensor hotkey and source code link. All submissions must be open source.
Compete
Your implementation runs against the full benchmark suite. Results appear on the leaderboard within 24 hours. See how you stack up against the current champion.
Earn
If your implementation claims the #1 spot, you win the entire accumulated prize pool. The pool grows daily until someone beats you, then they claim it all and a new pool begins.
Claim the pool. Set the new bar.
The current pool stands at $47,250. Think you can take it?
View LeaderboardReference baselines
Reference Baselines
We provide two reference implementations to help you get started. Each demonstrates a different architectural approach to building a Janus-compatible intelligence engine.
CLI Agent Baseline
Sandbox-based approach using the Claude Code CLI agent with full tool access inside an isolated Sandy environment.
- Dual-path routing (fast vs complex)
- Secure sandbox execution
- Full filesystem and code access
- Artifact generation
LangChain Baseline
In-process approach using LangChain agents with direct tool integration and streaming support.
- LangChain agent framework
- In-process execution
- Extensible tool system
- Vision model routing
Prize pool
The Prize Pool
The Janus competition features a unique accumulating prize pool that rewards sustained excellence and keeps the competition moving. The pool grows daily while a champion holds the top spot, and the next breakthrough claims the entire balance.
This is not a one time hackathon. It is a continuous race where the prize for beating the leader gets bigger every day.
Current prize pool
$47,250.00
How It Works
- Daily contribution: A portion of Janus platform revenue flows into the pool every day.
- Accumulation: The pool grows as long as the same implementation holds the #1 rank.
- Claim: When a new implementation takes the top spot, the miner behind it claims the entire accumulated pool.
- Reset: After payout, the pool resets to zero and begins accumulating again.
Why This Model?
- Incentivizes improvement: the longer a champion holds #1, the bigger the bounty for beating them.
- Rewards sustained excellence: a clear bar is set and every challenger knows what is at stake.
- Continuous competition: there is always a reason to iterate and climb.
- Transparent economics: everyone can see the pool, the claims, and the reset history.
Pool Transparency
- The current pool balance is displayed on the leaderboard and this page.
- All contributions and payouts are recorded on-chain on Bittensor.
- Historical pool data is publicly accessible and linked from the leaderboard.
- The system moves toward fully automated, on-chain settlement over time.
Payout Process
Current (Phase 1 - Manual)
- New #1 is verified via benchmark run.
- Results are reviewed for integrity.
- Payout is initiated to the miner's Bittensor coldkey.
- The pool resets and the transaction is logged.
Future (Phase 2 - Automated)
- Benchmark results trigger on-chain verification.
- Smart contracts transfer the pool to the winner.
- The pool resets atomically.
- No manual intervention is required.
Claim Rules
- Ties are broken by the earliest verified submission timestamp. If verification is still in progress, payouts pause until the tie is resolved.
- Disqualifications for security or integrity violations void the claim. The pool remains and moves to the next highest verified submission.
- Disputes trigger an audit window. Funds are released only after the review finishes and results are published.
Questions about rulings or disputes should be raised in the competition issue tracker before payouts are finalized.
Component marketplace
Component Marketplace
Beyond competing with full implementations, you can contribute reusable components to the Janus ecosystem and earn rewards when they power the leading intelligence implementation.
What are components?
| Component type | Description | Examples |
|---|---|---|
| Research Nodes | Specialized research capabilities | Academic paper search, news aggregation |
| Tool Integrations | Connections to external services | GitHub API, database connectors |
| Memory Systems | Context management solutions | Vector stores, conversation history |
| Reasoning Modules | Thinking and planning logic | Chain-of-thought, tree-of-thought |
| Output Formatters | Response formatting | Code syntax, markdown, structured data |
How it works
- You build a component and publish it to the Marketplace.
- Implementation developers integrate your component.
- When that implementation wins, you earn a share of the prize.
- Attribution is automatic via dependency tracking.
Reward sharing
When an implementation claims the prize pool, rewards are distributed between the implementation developer and the component builders.
Miner reward
80%
Goes to the implementation developer.
Component rewards
20%
Split across component builders by usage and value.
Percentages are illustrative. Final model will be determined by governance.
Component requirements
- Open source: MIT, Apache 2.0, or compatible license.
- Documentation: clear API docs and usage examples.
- Packaging: pip package, npm module, or Docker image.
- Versioning: semantic versioning with changelog.
- Testing: automated tests with over 80% coverage.
Coming soon
The Marketplace is currently in development. Full launch is planned for Q2 2026, with early access starting in Q1 2026.
Scoring model
Scoring Categories
Evaluation spans functional performance and production readiness. Each category captures a different slice of what makes an intelligence engine useful, fast, and safe in the real world.
| Category | What it measures | Example benchmarks |
|---|---|---|
| Chat Quality | Conversational ability, helpfulness. | MT-Bench, AlpacaEval |
| Reasoning | Logic, math, multi-step problems. | GSM8K, MATH, ARC |
| Knowledge | Factual accuracy, world knowledge. | MMLU, TruthfulQA |
| Research | Web search, synthesis, citation. | Custom research tasks |
| Coding | Code generation, debugging, explanation. | HumanEval, MBPP, SWE-Bench |
| Tool Use | API calling, function execution. | Custom tool-use evals |
| Multimodal | Image understanding, file generation. | VQA, document tasks |
| Speed | Latency, throughput. | Time-to-first-token, TPS |
| Cost | Resource efficiency. | USD per 1M tokens (effective) |
| Streaming | Continuous output, reasoning tokens. | Streaming continuity score |
Composite Score
The final leaderboard ranking is based on a composite score that combines all evaluation categories. The formula rewards implementations that excel across the board, not just in one area.
- Each category is scored on a normalized scale (0-100).
- Weights reflect real-world usage and are published before each cycle.
- Weights may be adjusted as the competition evolves.
Current weight distribution (subject to change)
| Category | Weight |
|---|---|
| Quality (aggregate) | 40% |
| Speed | 20% |
| Cost | 15% |
| Streaming | 15% |
| Modality | 10% |
Quality aggregate includes chat, reasoning, knowledge, research, coding, tool use, and multimodal task performance.
Benchmark Suites
Evaluations use a combination of public and proprietary benchmarks.
Public benchmarks
- MMLU (knowledge)
- TruthfulQA (accuracy)
- GSM8K, MATH (reasoning)
- HumanEval, MBPP (coding)
- MT-Bench (chat quality)
Proprietary benchmarks
- Research synthesis tasks
- Multi-step tool use scenarios
- Streaming continuity tests
- Multimodal generation tasks
All public benchmark implementations are open source. Proprietary benchmarks rotate to prevent overfitting.
Scoring runs are reproducible. Questions or disputes can be raised through the competition issue tracker, and we will rerun and publish findings.
Ready to Test Your Implementation?
Run the official benchmarks against your API or container
Architecture overview
How the Janus Architecture Fits Together
Janus connects users to competing intelligence implementations through a secure gateway, TEE execution layer, and tightly controlled platform services.
High-Level Architecture
User requests flow through the Janus Gateway, route into a TEE-backed container, and call platform services as needed. Benchmarks run against the same API and feed the leaderboard.
- Gateway validates and routes all OpenAI-compatible requests.
- Implementations run inside Chutes CPU TEE nodes.
- Platform services are available via whitelisted endpoints only.
- Bench runner and scoring engine update the leaderboard.
Request flow
Request Flow
When a user sends a message, the request traverses the gateway, runs in a TEE container, calls platform services, and streams back to the client.
1. User Request
Users send OpenAI-compatible chat completion requests from Janus Chat, the API, or third-party apps.
POST /v1/chat/completions
{
"model": "janus",
"messages": [{"role": "user", "content": "Explain quantum entanglement"}],
"stream": true
}2. Gateway Routing
- Validates the request format.
- Selects the target implementation (current #1 or specified).
- Routes to the appropriate TEE node.
3. TEE Execution
- Runs inside a Chutes CPU TEE node (isolated, attested).
- Has access to platform services via whitelisted endpoints.
- Generates a response using whatever logic you build.
4. Platform Service Calls
5. Response Streaming
- Reasoning tokens via reasoning_content field.
- Content tokens via content field.
- Continuous streaming, not batched.
6. User Receives Response
The gateway streams responses back to the user's client with real-time updates and final completion metadata.
Platform services
Platform Services
Your implementation can call these services from inside the container. All other outbound access is blocked.
Web Proxy
Endpoint: https://proxy.janus.rodeo
Fetch web pages for research and information gathering.
import httpx
response = httpx.get(
"https://proxy.janus.rodeo/fetch",
params={"url": "https://example.com/article"}
)
content = response.json()["content"] # Markdown-formattedFeatures
- Converts HTML to clean markdown.
- Respects robots.txt.
- Rate limited: 10 requests/minute.
- Max page size: 1MB.
Search API
Endpoint: https://search.janus.rodeo
Web search for finding relevant information.
import httpx
response = httpx.post(
"https://search.janus.rodeo/search",
json={"query": "quantum entanglement explained", "num_results": 10}
)
results = response.json()["results"]
# [{"title": "...", "url": "...", "snippet": "..."}, ...]Features
- Powered by Brave Search API.
- Returns title, URL, snippet.
- Rate limited: 20 searches/minute.
Vector Index
Endpoint: https://vector.janus.rodeo
Semantic search over indexed knowledge bases.
import httpx
response = httpx.post(
"https://vector.janus.rodeo/query",
json={"query": "How does TCP handshake work?", "top_k": 5}
)
chunks = response.json()["chunks"]
# [{"content": "...", "source": "...", "score": 0.92}, ...]Features
- Pre-indexed documentation (Chutes, Bittensor, common frameworks).
- Custom index upload (future feature).
- Rate limited: 50 queries/minute.
Code Sandbox
Endpoint: https://sandbox.janus.rodeo
Execute code safely in an isolated environment.
import httpx
response = httpx.post(
"https://sandbox.janus.rodeo/execute",
json={
"language": "python",
"code": "print(2 + 2)",
"timeout": 30
}
)
result = response.json()
# {"stdout": "4\n", "stderr": "", "exit_code": 0}Features
- Supported languages: Python, JavaScript, Bash, Go, Rust.
- Timeout: max 60 seconds.
- Memory: max 512MB.
- File I/O available within sandbox.
- Network access: none (sandbox is isolated).
Chutes Inference
Endpoint: https://api.chutes.ai
Call any model available on Chutes.
import os
from openai import OpenAI
client = OpenAI(
api_key=os.environ["CHUTES_API_KEY"],
base_url="https://api.chutes.ai/v1"
)
response = client.chat.completions.create(
model="gpt-4o-mini",
messages=[{"role": "user", "content": "Summarize this..."}]
)Features
- OpenAI: gpt-4o, gpt-4o-mini, o1, o1-mini.
- Open models: Llama, Mistral, Qwen, DeepSeek.
- Specialized: code, vision, embedding models.
Your implementation receives a CHUTES_API_KEY environment variable with credits for platform use.
Chutes Model CatalogSecurity model
Security Model
Janus runs submissions inside a secure, isolated environment with strict egress controls and operational monitoring.
TEE Isolation
- Memory encryption: RAM is encrypted; host cannot read your data.
- Attestation: proof that your code runs unmodified.
- Isolation: no access to host filesystem or other containers.
Secrets Management
- CHUTES_API_KEY injected as an environment variable.
- No hardcoded secrets; use env vars in your code.
- Platform keys rotate regularly.
Monitoring
- Request/response logging (content redacted).
- Resource usage tracking.
- Anomaly detection for unusual patterns.
Network egress control
- All outbound connections are routed through a proxy.
- Only whitelisted domains are allowed.
- Connection attempts to other hosts are logged and blocked.
Requirements
Technical Requirements
Validate your container against the required API contract, streaming behavior, and resource limits before you submit.
API endpoints
| Endpoint | Method | Required |
|---|---|---|
| /v1/chat/completions | POST | Yes |
| /health | GET | Yes |
| /v1/models | GET | No (recommended) |
Streaming requirements
- Must support stream: true for SSE responses.
- Continuous output: tokens should flow continuously, not in batches.
- Reasoning tokens: use reasoning_content for thinking/planning.
- Finish reason: always include finish_reason in the final chunk.
Resource limits
| Resource | Limit |
|---|---|
| Memory | 16 GB |
| CPU | 4 cores |
| Disk | 50 GB |
| Network | Whitelisted egress only |
| Timeout | 5 minutes per request |
Whitelisted egress
Only these services are reachable from the container. All other outbound traffic is blocked.
- api.chutes.ai
- proxy.janus.rodeo
- search.janus.rodeo
- sandbox.janus.rodeo
- vector.janus.rodeo
Bench runner integration
Test Your Implementation
Run the public dev suite locally to validate your container before submitting. The same tooling powers the official leaderboard.
Quick start
# Install bench runner pip install janus-bench # Run quick suite (5 minutes) janus-bench run --target http://localhost:8000 --suite quick # Run full suite (2 hours) janus-bench run --target http://localhost:8000 --suite full # Run specific category janus-bench run --target http://localhost:8000 --suite coding
Benchmark integration
How Scoring Works
Bench runner evaluates your implementation through the same infrastructure used for the official leaderboard.
Benchmark flow
- Load suite: bench runner loads test cases from the benchmark suite.
- Execute tests: each test sends a request to your API.
- Collect responses: responses are captured with timing data.
- Evaluate quality: LLM judges or exact match evaluate correctness.
- Calculate metrics: quality, speed, cost, streaming scores computed.
- Update leaderboard: composite score published.
Benchmark transparency
- Public benchmarks are open source.
- Evaluation prompts are published.
- Scoring formulas are documented.
- You can reproduce any score locally.
Submission requirements
What It Means to Submit to Janus
Your submission is an open source Docker implementation that speaks the OpenAI Chat Completions API. The competition rewards incremental progress, transparency, and reproducibility.
What You're Submitting
A Janus submission is a Docker container that implements an OpenAI-compatible Chat Completions API. Behind that API, your implementation can use any technology to generate responses.
- It is not "an agent" - it is an implementation of intelligence.
- It is not "a miner" - the miner is you; the submission is your creation.
- It is a Docker container - portable, reproducible, isolated.
- It exposes a standard API - POST /v1/chat/completions, GET /health.
What Happens to Your Submission
- Your container is pulled and deployed to a Chutes CPU TEE node.
- It connects to platform services: web proxy, search, sandbox, inference.
- Benchmarks run against it via the same API users call.
- Results are published to the leaderboard.
Build on What Exists
The competition encourages incremental improvement. Start from the current leader, make it better, and push the frontier forward.
Why incremental?
- Lower barrier: improve a slice instead of rebuilding everything.
- Faster progress: small improvements compound into big gains.
- Community learning: each submission teaches the next.
- Reduced risk: if your change fails, the delta is small.
How to start
- Fork the baseline: git clone https://github.com/chutesai/janus-baseline
- Study the leader: review the current #1 source code.
- Identify a weakness: use benchmark breakdowns to find gaps.
- Make your improvement: prompts, routing, new capabilities.
- Test locally: run janus-bench to validate.
- Submit your enhanced version.
Improvement cycle
Open Source Requirement
All Janus submissions must be open source. This is non-negotiable.
Rationale
- Community progress: everyone learns and improves faster.
- Transparency: users can inspect how requests are handled.
- Security: open code can be audited.
- Bittensor ethos: the network is built on openness.
Acceptable licenses
| License | Allowed | Notes |
|---|---|---|
| MIT | Yes | Recommended |
| Apache 2.0 | Yes | Recommended |
| GPL v3 | Yes | Derivative works must also be GPL |
| BSD 3-Clause | Yes | |
| AGPL v3 | Yes | Network use triggers copyleft |
| Proprietary | No | Not allowed |
| No license | No | Defaults to proprietary |
What must be open
- Source code that runs inside the container.
- Prompts, few-shot examples, templates.
- Configuration and routing rules.
- Dependency lists and Dockerfiles.
What can stay private
- API keys stored in environment variables.
- Training data for fine-tuned models.
- Calls to proprietary models or APIs.
How Contributors Earn
Janus runs on Bittensor, a decentralized network where contributors (called "miners") compete to provide the best AI services. Think of it like an open marketplace for intelligence - anyone can participate, and the best implementations earn rewards.
The Competition Model
- Submit your AI implementation to the Janus competition
- Your code runs benchmarks against other submissions
- Top performers earn from the prize pool
- All submissions are open source, fostering community learning
Why Decentralized?
- Open access: Anyone can compete - no gatekeepers or approval processes
- Transparent scoring: All benchmarks and results are public
- Fair rewards: Earnings distributed automatically based on performance
- Community-driven: The best ideas rise to the top through open competition
Getting Started
To participate, you'll need a Bittensor wallet address (similar to a crypto wallet). This is used for:
- Attribution on the leaderboard
- Receiving prize pool payouts
- Building your reputation across submissions
Don't have a wallet yet? Get started with Bittensor - it takes just a few minutes.
Review Process
Current Phase: manual review before a submission appears on the leaderboard.
Review checklist
- Docker image accessible from the specified registry.
- API compliance for /v1/chat/completions and /health.
- Source code repository is public and matches the image.
- License file is OSI-approved.
- No obvious malicious code or data exfiltration.
- No unapproved egress beyond whitelisted services.
- Hotkey is valid and registered on Bittensor.
- Dockerfile can reproduce the image from source.
Timeline
- Submission received - review begins within 24 hours.
- Review complete - benchmarks run within 48 hours.
- Results published - leaderboard updated within 72 hours.
Future Phase: Decentralized Review
Phase 2 introduces a decentralized judging panel with consensus-based approvals and staking-backed incentives.
- Validator set of trusted community reviewers.
- Randomized assignment of submissions to reviewers.
- Consensus approval required to pass.
- Slashing for malicious or negligent reviewers.
- Appeals handled by the broader panel.
This phase will be specified separately once governance is finalized.
Submission form
Submit Your Implementation for Review
Submissions are manually reviewed before running on private benchmarks. Use this form to share your Docker image, hotkey, source code, and license details.
Submission fields
| Field | Required | Description |
|---|---|---|
| Implementation Name | Yes | Unique identifier (e.g., "turbo-reasoner-v2") |
| Docker Image | Yes | Full image reference (e.g., ghcr.io/user/janus-impl:v2) |
| Bittensor Hotkey | Yes | SS58 address for attribution and payout |
| Source Code URL | Yes | Link to public repository (GitHub, GitLab, etc.) |
| License | Yes | OSI-approved license identifier (e.g., "MIT") |
| Description | Yes | Brief description of your approach (100-500 chars) |
| Changelog | No | What differs from baseline or previous version |
| Contact | No | Discord handle or email for review communication |
Example submission
name: "turbo-reasoner-v2" image: "ghcr.io/alice/janus-turbo:2.0.1" hotkey: "5FHneW46xGXgs5mUiveU4sbTyGBzmstUspZC92UhjJM694ty" source: "https://github.com/alice/janus-turbo" license: "MIT" description: "Enhanced reasoning via chain-of-thought decomposition and parallel tool execution. Improves on baseline-v3 with 15% better GSM8K scores." changelog: "Added CoT decomposition, parallel tool calls, improved code generation prompts" contact: "alice#1234"
FAQ
Frequently Asked Questions
Need a quick answer before you ship? Click a category to explore.
Helpful Links
- Submission Portal - Submit your implementation
- Leaderboard - See current rankings
- Benchmark Docs - Detailed benchmark information
- janus-bench on PyPI - Local testing tool
- Baseline Repository - Start from the reference implementation
- Discord Community - Get help and discuss strategies
- Marketplace Waitlist - Early access to components
Search tips: open ai, openai, leader board, hot key, market place, bench mark, intelligence agent.