Janus Competition - The Rodeo

Compete to Build the Best Intelligence Engine

The decentralized arena on Bittensor where intelligence engines compete. Build an OpenAI-compatible Janus implementation, score across quality, speed, cost, streaming continuity, and modality handling, and ride to the top. Permissionless entry. Real rewards for champions. Any stack. Any approach. One arena.

12Entries

$50KRodeo Purse

847Benchmark Runs

Start Competing Run Scoring Explore Demo Chat

Live Snapshot

Today's Benchmark Pulse

Next public run03:00 UTC

Champion score78.4

Median TTFT0.92s

Scores refresh daily for public benchmarks. Private evaluation is ongoing for submissions in review.

Competition overview

What is the Janus Competition?

The Janus Competition is an open arena where developers compete to build the best intelligence engine - a system that handles any request a user might throw at a comprehensive AI assistant.

How It Works

You submit an OpenAI-compatible API endpoint. Behind that endpoint, your implementation can use any technology: CLI agents, workflow engines, model routers, multi-agent orchestrations, or entirely novel approaches. As long as it speaks the OpenAI Chat Completions API and streams responses, you're in.

What Gets Evaluated

Your implementation is scored across all the use cases of a modern AI assistant and the production metrics that define a great experience.

Use case coverage

Simple chat: Conversational responses, Q&A, summarization.
Complex reasoning: Multi-step problems, logical deduction, planning.
Deep research: Web search, information synthesis, citation.
Software creation: Code generation, debugging, full project scaffolding.
Multimodal input: Understanding images, documents, audio.
Multimodal output: Generating images, files, structured data.
Tool use: Calling APIs, executing code, managing files.

Non-functional metrics

Quality: Accuracy, helpfulness, safety, instruction following.
Speed: Time to first token, total completion time.
Cost: Resource efficiency, inference cost per request.
Streaming continuity: Consistent token flow, reasoning transparency.
Modality handling: Graceful handling of images, files, multi-turn context.

The composite score reflects how well your implementation performs as a complete AI solution, not just on narrow benchmarks.

The Arena

Rodeo Rankings

Composite scores blend task performance with production metrics like speed, cost, streaming continuity, and modality handling. Sort any column to explore trade-offs across implementations.

Updated 2 hours ago · 2025-01-22 14:32 UTCView full benchmark results


#1	your-janus-implementation	5Your...Key	82.7	86.3	78.4	84.2	80.1	78.5	2025-01-20	4 days
#2	quantum-rider	5G9a...C21	79.4	82.8	74.2	81.5	76.8	77.3	2025-01-18	—
#3	baseline-n8n	5H2d...E9F	76.2	79.5	71.8	78.9	73.6	74.2	2025-01-15	—
#4	baseline-cli-agent	5J7b...A10	74.8	77.6	70.2	76.5	72.1	71.8	2025-01-12	—
#5	trailblazer	5K3e...D44	72.5	75.2	68.4	74.8	70.3	69.6	2025-01-10	—

Rank

Composite

82.7

your-janus-implementation

Quality: 86.3

Speed: 78.4

Cost: 84.2

Streaming: 80.1

Modality: 78.5

Miner: 5Your...Key

Submitted: 2025-01-20

Days at #1: 4 days

Rank

Composite

79.4

quantum-rider

Quality: 82.8

Speed: 74.2

Cost: 81.5

Streaming: 76.8

Modality: 77.3

Miner: 5G9a...C21

Submitted: 2025-01-18

Days at #1: —

Rank

Composite

76.2

baseline-n8n

Quality: 79.5

Speed: 71.8

Cost: 78.9

Streaming: 73.6

Modality: 74.2

Miner: 5H2d...E9F

Submitted: 2025-01-15

Days at #1: —

Rank

Composite

74.8

baseline-cli-agent

Quality: 77.6

Speed: 70.2

Cost: 76.5

Streaming: 72.1

Modality: 71.8

Miner: 5J7b...A10

Submitted: 2025-01-12

Days at #1: —

Rank

Composite

72.5

trailblazer

Quality: 75.2

Speed: 68.4

Cost: 74.8

Streaming: 70.3

Modality: 69.6

Miner: 5K3e...D44

Submitted: 2025-01-10

Days at #1: —

User Preferences

Arena Preference Ladder

Anonymous A/B votes fuel this live ranking. Models are randomized and revealed only after a decision, keeping comparisons honest.

Loading votes...

How it works

Five Steps to the Janus Rodeo

Build, evaluate, submit, compete, and earn. The public dev suite is open for iteration while private benchmarks and the prize pool keep the rodeo moving.

Build

Create your intelligence engine using any technology stack. CLI agents, workflow engines, model orchestrations - as long as it exposes an OpenAI-compatible API and streams responses, you are in.

Evaluate

Test locally using the Janus bench runner. Run the same benchmarks we use for scoring. Identify weaknesses before you submit.

janus-bench run --target http://localhost:8000 --suite quick

Submit

Package your implementation as a Docker container. Submit via the Janus portal with your Bittensor hotkey and source code link. All submissions must be open source.

Compete

Your implementation runs against the full benchmark suite. Results appear on the leaderboard within 24 hours. See how you stack up against the current champion.

Earn

If your implementation claims the #1 spot, you win the entire accumulated prize pool. The pool grows daily until someone beats you, then they claim it all and a new pool begins.

Claim the pool. Set the new bar.

The current pool stands at $47,250. Think you can take it?

View Leaderboard

Reference baselines

Reference Baselines

We provide two reference implementations to help you get started. Each demonstrates a different architectural approach to building a Janus-compatible intelligence engine.

CLI Agent Baseline

Sandbox-based approach using the Claude Code CLI agent with full tool access inside an isolated Sandy environment.

Dual-path routing (fast vs complex)
Secure sandbox execution
Full filesystem and code access
Artifact generation

View Documentation

LangChain Baseline

In-process approach using LangChain agents with direct tool integration and streaming support.

LangChain agent framework
In-process execution
Extensible tool system
Vision model routing

View Documentation

Prize pool

The Prize Pool

The Janus competition features a unique accumulating prize pool that rewards sustained excellence and keeps the competition moving. The pool grows daily while a champion holds the top spot, and the next breakthrough claims the entire balance.

This is not a one time hackathon. It is a continuous race where the prize for beating the leader gets bigger every day.

Current prize pool

$47,250.00

Accumulating sinceJan 15, 2026

Days at #18

Current championyour-janus-implementation

Miner5Your...Key

View Pool History Claim Rules

Loading prize pool diagram...

How It Works

Daily contribution: A portion of Janus platform revenue flows into the pool every day.
Accumulation: The pool grows as long as the same implementation holds the #1 rank.
Claim: When a new implementation takes the top spot, the miner behind it claims the entire accumulated pool.
Reset: After payout, the pool resets to zero and begins accumulating again.

Why This Model?

Incentivizes improvement: the longer a champion holds #1, the bigger the bounty for beating them.
Rewards sustained excellence: a clear bar is set and every challenger knows what is at stake.
Continuous competition: there is always a reason to iterate and climb.
Transparent economics: everyone can see the pool, the claims, and the reset history.

Pool Transparency

The current pool balance is displayed on the leaderboard and this page.
All contributions and payouts are recorded on-chain on Bittensor.
Historical pool data is publicly accessible and linked from the leaderboard.
The system moves toward fully automated, on-chain settlement over time.

Payout Process

Current (Phase 1 - Manual)

New #1 is verified via benchmark run.
Results are reviewed for integrity.
Payout is initiated to the miner's Bittensor coldkey.
The pool resets and the transaction is logged.

Future (Phase 2 - Automated)

Benchmark results trigger on-chain verification.
Smart contracts transfer the pool to the winner.
The pool resets atomically.
No manual intervention is required.

Claim Rules

Ties are broken by the earliest verified submission timestamp. If verification is still in progress, payouts pause until the tie is resolved.
Disqualifications for security or integrity violations void the claim. The pool remains and moves to the next highest verified submission.
Disputes trigger an audit window. Funds are released only after the review finishes and results are published.

Questions about rulings or disputes should be raised in the competition issue tracker before payouts are finalized.

Component marketplace

Component Marketplace

Beyond competing with full implementations, you can contribute reusable components to the Janus ecosystem and earn rewards when they power the leading intelligence implementation.

What are components?

Component type	Description	Examples
Research Nodes	Specialized research capabilities	Academic paper search, news aggregation
Tool Integrations	Connections to external services	GitHub API, database connectors
Memory Systems	Context management solutions	Vector stores, conversation history
Reasoning Modules	Thinking and planning logic	Chain-of-thought, tree-of-thought
Output Formatters	Response formatting	Code syntax, markdown, structured data

How it works

You build a component and publish it to the Marketplace.
Implementation developers integrate your component.
When that implementation wins, you earn a share of the prize.
Attribution is automatic via dependency tracking.

Reward sharing

When an implementation claims the prize pool, rewards are distributed between the implementation developer and the component builders.

Miner reward

80%

Goes to the implementation developer.

Component rewards

20%

Split across component builders by usage and value.

Percentages are illustrative. Final model will be determined by governance.

Component requirements

Open source: MIT, Apache 2.0, or compatible license.
Documentation: clear API docs and usage examples.
Packaging: pip package, npm module, or Docker image.
Versioning: semantic versioning with changelog.
Testing: automated tests with over 80% coverage.

Coming soon

The Marketplace is currently in development. Full launch is planned for Q2 2026, with early access starting in Q1 2026.

Loading marketplace reward flow...

Scoring model

Scoring Categories

Evaluation spans functional performance and production readiness. Each category captures a different slice of what makes an intelligence engine useful, fast, and safe in the real world.

Category	What it measures	Example benchmarks
Chat Quality	Conversational ability, helpfulness.	MT-Bench, AlpacaEval
Reasoning	Logic, math, multi-step problems.	GSM8K, MATH, ARC
Knowledge	Factual accuracy, world knowledge.	MMLU, TruthfulQA
Research	Web search, synthesis, citation.	Custom research tasks
Coding	Code generation, debugging, explanation.	HumanEval, MBPP, SWE-Bench
Tool Use	API calling, function execution.	Custom tool-use evals
Multimodal	Image understanding, file generation.	VQA, document tasks
Speed	Latency, throughput.	Time-to-first-token, TPS
Cost	Resource efficiency.	USD per 1M tokens (effective)
Streaming	Continuous output, reasoning tokens.	Streaming continuity score

Composite Score

The final leaderboard ranking is based on a composite score that combines all evaluation categories. The formula rewards implementations that excel across the board, not just in one area.

CompositeScore = Σ (CategoryScore × CategoryWeight)

Each category is scored on a normalized scale (0-100).
Weights reflect real-world usage and are published before each cycle.
Weights may be adjusted as the competition evolves.

Current weight distribution (subject to change)

Category	Weight
Quality (aggregate)	40%
Speed	20%
Cost	15%
Streaming	15%
Modality	10%

Quality aggregate includes chat, reasoning, knowledge, research, coding, tool use, and multimodal task performance.

Benchmark Suites

Evaluations use a combination of public and proprietary benchmarks.

Public benchmarks

MMLU (knowledge)
TruthfulQA (accuracy)
GSM8K, MATH (reasoning)
HumanEval, MBPP (coding)
MT-Bench (chat quality)

Proprietary benchmarks

Research synthesis tasks
Multi-step tool use scenarios
Streaming continuity tests
Multimodal generation tasks

All public benchmark implementations are open source. Proprietary benchmarks rotate to prevent overfitting.

Scoring runs are reproducible. Questions or disputes can be raised through the competition issue tracker, and we will rerun and publish findings.

Ready to Test Your Implementation?

Run the official benchmarks against your API or container

Start Scoring Run

Architecture overview

How the Janus Architecture Fits Together

Janus connects users to competing intelligence implementations through a secure gateway, TEE execution layer, and tightly controlled platform services.

High-Level Architecture

User requests flow through the Janus Gateway, route into a TEE-backed container, and call platform services as needed. Benchmarks run against the same API and feed the leaderboard.

Gateway validates and routes all OpenAI-compatible requests.
Implementations run inside Chutes CPU TEE nodes.
Platform services are available via whitelisted endpoints only.
Bench runner and scoring engine update the leaderboard.

Request flow

Request Flow

When a user sends a message, the request traverses the gateway, runs in a TEE container, calls platform services, and streams back to the client.

1. User Request

Users send OpenAI-compatible chat completion requests from Janus Chat, the API, or third-party apps.

POST /v1/chat/completions
{
  "model": "janus",
  "messages": [{"role": "user", "content": "Explain quantum entanglement"}],
  "stream": true
}

2. Gateway Routing

Validates the request format.
Selects the target implementation (current #1 or specified).
Routes to the appropriate TEE node.

3. TEE Execution

Runs inside a Chutes CPU TEE node (isolated, attested).
Has access to platform services via whitelisted endpoints.
Generates a response using whatever logic you build.

4. Platform Service Calls

5. Response Streaming

Reasoning tokens via reasoning_content field.
Content tokens via content field.
Continuous streaming, not batched.

6. User Receives Response

The gateway streams responses back to the user's client with real-time updates and final completion metadata.

Platform services

Platform Services

Your implementation can call these services from inside the container. All other outbound access is blocked.

Web Proxy

Endpoint: https://proxy.janus.rodeo

Fetch web pages for research and information gathering.

import httpx

response = httpx.get(
    "https://proxy.janus.rodeo/fetch",
    params={"url": "https://example.com/article"}
)
content = response.json()["content"]  # Markdown-formatted

Features

Converts HTML to clean markdown.
Respects robots.txt.
Rate limited: 10 requests/minute.
Max page size: 1MB.

Search API

Endpoint: https://search.janus.rodeo

Web search for finding relevant information.

import httpx

response = httpx.post(
    "https://search.janus.rodeo/search",
    json={"query": "quantum entanglement explained", "num_results": 10}
)
results = response.json()["results"]
# [{"title": "...", "url": "...", "snippet": "..."}, ...]

Features

Powered by Brave Search API.
Returns title, URL, snippet.
Rate limited: 20 searches/minute.

Vector Index

Endpoint: https://vector.janus.rodeo

Semantic search over indexed knowledge bases.

import httpx

response = httpx.post(
    "https://vector.janus.rodeo/query",
    json={"query": "How does TCP handshake work?", "top_k": 5}
)
chunks = response.json()["chunks"]
# [{"content": "...", "source": "...", "score": 0.92}, ...]

Features

Pre-indexed documentation (Chutes, Bittensor, common frameworks).
Custom index upload (future feature).
Rate limited: 50 queries/minute.

Code Sandbox

Endpoint: https://sandbox.janus.rodeo

Execute code safely in an isolated environment.

import httpx

response = httpx.post(
    "https://sandbox.janus.rodeo/execute",
    json={
        "language": "python",
        "code": "print(2 + 2)",
        "timeout": 30
    }
)
result = response.json()
# {"stdout": "4\n", "stderr": "", "exit_code": 0}

Features

Supported languages: Python, JavaScript, Bash, Go, Rust.
Timeout: max 60 seconds.
Memory: max 512MB.
File I/O available within sandbox.
Network access: none (sandbox is isolated).

Chutes Inference

Endpoint: https://api.chutes.ai

Call any model available on Chutes.

import os
from openai import OpenAI

client = OpenAI(
    api_key=os.environ["CHUTES_API_KEY"],
    base_url="https://api.chutes.ai/v1"
)

response = client.chat.completions.create(
    model="gpt-4o-mini",
    messages=[{"role": "user", "content": "Summarize this..."}]
)

Features

OpenAI: gpt-4o, gpt-4o-mini, o1, o1-mini.
Open models: Llama, Mistral, Qwen, DeepSeek.
Specialized: code, vision, embedding models.

Your implementation receives a CHUTES_API_KEY environment variable with credits for platform use.

Chutes Model Catalog

Security model

Security Model

Janus runs submissions inside a secure, isolated environment with strict egress controls and operational monitoring.

TEE Isolation

Memory encryption: RAM is encrypted; host cannot read your data.
Attestation: proof that your code runs unmodified.
Isolation: no access to host filesystem or other containers.

Secrets Management

CHUTES_API_KEY injected as an environment variable.
No hardcoded secrets; use env vars in your code.
Platform keys rotate regularly.

Monitoring

Request/response logging (content redacted).
Resource usage tracking.
Anomaly detection for unusual patterns.

Network egress control

All outbound connections are routed through a proxy.
Only whitelisted domains are allowed.
Connection attempts to other hosts are logged and blocked.

Requirements

Technical Requirements

Validate your container against the required API contract, streaming behavior, and resource limits before you submit.

API endpoints

Endpoint	Method	Required
/v1/chat/completions	POST	Yes
/health	GET	Yes
/v1/models	GET	No (recommended)

Streaming requirements

Must support stream: true for SSE responses.
Continuous output: tokens should flow continuously, not in batches.
Reasoning tokens: use reasoning_content for thinking/planning.
Finish reason: always include finish_reason in the final chunk.

Resource limits

Resource	Limit
Memory	16 GB
CPU	4 cores
Disk	50 GB
Network	Whitelisted egress only
Timeout	5 minutes per request

Whitelisted egress

Only these services are reachable from the container. All other outbound traffic is blocked.

api.chutes.ai
proxy.janus.rodeo
search.janus.rodeo
sandbox.janus.rodeo
vector.janus.rodeo

Bench runner integration

Test Your Implementation

Run the public dev suite locally to validate your container before submitting. The same tooling powers the official leaderboard.

Bench Runner UI Bench Runner API CLI Documentation

Quick start

# Install bench runner
pip install janus-bench

# Run quick suite (5 minutes)
janus-bench run --target http://localhost:8000 --suite quick

# Run full suite (2 hours)
janus-bench run --target http://localhost:8000 --suite full

# Run specific category
janus-bench run --target http://localhost:8000 --suite coding

Benchmark integration

How Scoring Works

Bench runner evaluates your implementation through the same infrastructure used for the official leaderboard.

Benchmark flow

Load suite: bench runner loads test cases from the benchmark suite.
Execute tests: each test sends a request to your API.
Collect responses: responses are captured with timing data.
Evaluate quality: LLM judges or exact match evaluate correctness.
Calculate metrics: quality, speed, cost, streaming scores computed.
Update leaderboard: composite score published.

Benchmark transparency

Public benchmarks are open source.
Evaluation prompts are published.
Scoring formulas are documented.
You can reproduce any score locally.

Submission requirements

What It Means to Submit to Janus

Your submission is an open source Docker implementation that speaks the OpenAI Chat Completions API. The competition rewards incremental progress, transparency, and reproducibility.

What You're Submitting

A Janus submission is a Docker container that implements an OpenAI-compatible Chat Completions API. Behind that API, your implementation can use any technology to generate responses.

It is not "an agent" - it is an implementation of intelligence.
It is not "a miner" - the miner is you; the submission is your creation.
It is a Docker container - portable, reproducible, isolated.
It exposes a standard API - POST /v1/chat/completions, GET /health.

What Happens to Your Submission

Your container is pulled and deployed to a Chutes CPU TEE node.
It connects to platform services: web proxy, search, sandbox, inference.
Benchmarks run against it via the same API users call.
Results are published to the leaderboard.

Build on What Exists

The competition encourages incremental improvement. Start from the current leader, make it better, and push the frontier forward.

Why incremental?

Lower barrier: improve a slice instead of rebuilding everything.
Faster progress: small improvements compound into big gains.
Community learning: each submission teaches the next.
Reduced risk: if your change fails, the delta is small.

How to start

Fork the baseline: git clone https://github.com/chutesai/janus-baseline
Study the leader: review the current #1 source code.
Identify a weakness: use benchmark breakdowns to find gaps.
Make your improvement: prompts, routing, new capabilities.
Test locally: run janus-bench to validate.
Submit your enhanced version.

Improvement cycle

Open Source Requirement

All Janus submissions must be open source. This is non-negotiable.

Rationale

Community progress: everyone learns and improves faster.
Transparency: users can inspect how requests are handled.
Security: open code can be audited.
Bittensor ethos: the network is built on openness.

Acceptable licenses

License	Allowed	Notes
MIT	Yes	Recommended
Apache 2.0	Yes	Recommended
GPL v3	Yes	Derivative works must also be GPL
BSD 3-Clause	Yes
AGPL v3	Yes	Network use triggers copyleft
Proprietary	No	Not allowed
No license	No	Defaults to proprietary

What must be open

Source code that runs inside the container.
Prompts, few-shot examples, templates.
Configuration and routing rules.
Dependency lists and Dockerfiles.

What can stay private

API keys stored in environment variables.
Training data for fine-tuned models.
Calls to proprietary models or APIs.

How Contributors Earn

Janus runs on Bittensor, a decentralized network where contributors (called "miners") compete to provide the best AI services. Think of it like an open marketplace for intelligence - anyone can participate, and the best implementations earn rewards.

The Competition Model

Submit your AI implementation to the Janus competition
Your code runs benchmarks against other submissions
Top performers earn from the prize pool
All submissions are open source, fostering community learning

Why Decentralized?

Open access: Anyone can compete - no gatekeepers or approval processes
Transparent scoring: All benchmarks and results are public
Fair rewards: Earnings distributed automatically based on performance
Community-driven: The best ideas rise to the top through open competition

Getting Started

To participate, you'll need a Bittensor wallet address (similar to a crypto wallet). This is used for:

Attribution on the leaderboard
Receiving prize pool payouts
Building your reputation across submissions

Don't have a wallet yet? Get started with Bittensor - it takes just a few minutes.

Review Process

Current Phase: manual review before a submission appears on the leaderboard.

Review checklist

Docker image accessible from the specified registry.
API compliance for /v1/chat/completions and /health.
Source code repository is public and matches the image.
License file is OSI-approved.
No obvious malicious code or data exfiltration.
No unapproved egress beyond whitelisted services.
Hotkey is valid and registered on Bittensor.
Dockerfile can reproduce the image from source.

Timeline

Submission received - review begins within 24 hours.
Review complete - benchmarks run within 48 hours.
Results published - leaderboard updated within 72 hours.

Future Phase: Decentralized Review

Phase 2 introduces a decentralized judging panel with consensus-based approvals and staking-backed incentives.

Validator set of trusted community reviewers.
Randomized assignment of submissions to reviewers.
Consensus approval required to pass.
Slashing for malicious or negligent reviewers.
Appeals handled by the broader panel.

This phase will be specified separately once governance is finalized.

Submission form

Submit Your Implementation for Review

Submissions are manually reviewed before running on private benchmarks. Use this form to share your Docker image, hotkey, source code, and license details.

Submission fields

Field	Required	Description
Implementation Name	Yes	Unique identifier (e.g., "turbo-reasoner-v2")
Docker Image	Yes	Full image reference (e.g., ghcr.io/user/janus-impl:v2)
Bittensor Hotkey	Yes	SS58 address for attribution and payout
Source Code URL	Yes	Link to public repository (GitHub, GitLab, etc.)
License	Yes	OSI-approved license identifier (e.g., "MIT")
Description	Yes	Brief description of your approach (100-500 chars)
Changelog	No	What differs from baseline or previous version
Contact	No	Discord handle or email for review communication

Example submission

name: "turbo-reasoner-v2"
image: "ghcr.io/alice/janus-turbo:2.0.1"
hotkey: "5FHneW46xGXgs5mUiveU4sbTyGBzmstUspZC92UhjJM694ty"
source: "https://github.com/alice/janus-turbo"
license: "MIT"
description: "Enhanced reasoning via chain-of-thought decomposition and parallel tool execution. Improves on baseline-v3 with 15% better GSM8K scores."
changelog: "Added CoT decomposition, parallel tool calls, improved code generation prompts"
contact: "alice#1234"

FAQ

Frequently Asked Questions

Need a quick answer before you ship? Click a category to explore.

Helpful Links

Submission Portal - Submit your implementation
Leaderboard - See current rankings
Benchmark Docs - Detailed benchmark information
janus-bench on PyPI - Local testing tool
Baseline Repository - Start from the reference implementation
Discord Community - Get help and discuss strategies
Marketplace Waitlist - Early access to components

Search tips: open ai, openai, leader board, hot key, market place, bench mark, intelligence agent.

Janus Competition - The Rodeo

Compete to Build the Best Intelligence Engine

12Entries

$50KRodeo Purse

847Benchmark Runs

Start Competing Run Scoring Explore Demo Chat

Live Snapshot

Today's Benchmark Pulse

Next public run03:00 UTC

Champion score78.4

Median TTFT0.92s

Scores refresh daily for public benchmarks. Private evaluation is ongoing for submissions in review.

Competition overview

What is the Janus Competition?

The Janus Competition is an open arena where developers compete to build the best intelligence engine - a system that handles any request a user might throw at a comprehensive AI assistant.

How It Works

What Gets Evaluated

Your implementation is scored across all the use cases of a modern AI assistant and the production metrics that define a great experience.

Use case coverage

Simple chat: Conversational responses, Q&A, summarization.
Complex reasoning: Multi-step problems, logical deduction, planning.
Deep research: Web search, information synthesis, citation.
Software creation: Code generation, debugging, full project scaffolding.
Multimodal input: Understanding images, documents, audio.
Multimodal output: Generating images, files, structured data.
Tool use: Calling APIs, executing code, managing files.

Non-functional metrics

Quality: Accuracy, helpfulness, safety, instruction following.
Speed: Time to first token, total completion time.
Cost: Resource efficiency, inference cost per request.
Streaming continuity: Consistent token flow, reasoning transparency.
Modality handling: Graceful handling of images, files, multi-turn context.

The composite score reflects how well your implementation performs as a complete AI solution, not just on narrow benchmarks.

The Arena

Rodeo Rankings

Composite scores blend task performance with production metrics like speed, cost, streaming continuity, and modality handling. Sort any column to explore trade-offs across implementations.

Updated 2 hours ago · 2025-01-22 14:32 UTCView full benchmark results


#1	your-janus-implementation	5Your...Key	82.7	86.3	78.4	84.2	80.1	78.5	2025-01-20	4 days
#2	quantum-rider	5G9a...C21	79.4	82.8	74.2	81.5	76.8	77.3	2025-01-18	—
#3	baseline-n8n	5H2d...E9F	76.2	79.5	71.8	78.9	73.6	74.2	2025-01-15	—
#4	baseline-cli-agent	5J7b...A10	74.8	77.6	70.2	76.5	72.1	71.8	2025-01-12	—
#5	trailblazer	5K3e...D44	72.5	75.2	68.4	74.8	70.3	69.6	2025-01-10	—

Rank

Composite

82.7

your-janus-implementation

Quality: 86.3

Speed: 78.4

Cost: 84.2

Streaming: 80.1

Modality: 78.5

Miner: 5Your...Key

Submitted: 2025-01-20

Days at #1: 4 days

Rank

Composite

79.4

quantum-rider

Quality: 82.8

Speed: 74.2

Cost: 81.5

Streaming: 76.8

Modality: 77.3

Miner: 5G9a...C21

Submitted: 2025-01-18

Days at #1: —

Rank

Composite

76.2

baseline-n8n

Quality: 79.5

Speed: 71.8

Cost: 78.9

Streaming: 73.6

Modality: 74.2

Miner: 5H2d...E9F

Submitted: 2025-01-15

Days at #1: —

Rank

Composite

74.8

baseline-cli-agent

Quality: 77.6

Speed: 70.2

Cost: 76.5

Streaming: 72.1

Modality: 71.8

Miner: 5J7b...A10

Submitted: 2025-01-12

Days at #1: —

Rank

Composite

72.5

trailblazer

Quality: 75.2

Speed: 68.4

Cost: 74.8

Streaming: 70.3

Modality: 69.6

Miner: 5K3e...D44

Submitted: 2025-01-10

Days at #1: —

User Preferences

Arena Preference Ladder

Anonymous A/B votes fuel this live ranking. Models are randomized and revealed only after a decision, keeping comparisons honest.

Loading votes...

How it works

Five Steps to the Janus Rodeo

Build, evaluate, submit, compete, and earn. The public dev suite is open for iteration while private benchmarks and the prize pool keep the rodeo moving.

Build

Create your intelligence engine using any technology stack. CLI agents, workflow engines, model orchestrations - as long as it exposes an OpenAI-compatible API and streams responses, you are in.

Evaluate

Test locally using the Janus bench runner. Run the same benchmarks we use for scoring. Identify weaknesses before you submit.

janus-bench run --target http://localhost:8000 --suite quick

Submit

Package your implementation as a Docker container. Submit via the Janus portal with your Bittensor hotkey and source code link. All submissions must be open source.

Compete

Your implementation runs against the full benchmark suite. Results appear on the leaderboard within 24 hours. See how you stack up against the current champion.

Earn

If your implementation claims the #1 spot, you win the entire accumulated prize pool. The pool grows daily until someone beats you, then they claim it all and a new pool begins.

Claim the pool. Set the new bar.

The current pool stands at $47,250. Think you can take it?

View Leaderboard

Reference baselines

Reference Baselines

We provide two reference implementations to help you get started. Each demonstrates a different architectural approach to building a Janus-compatible intelligence engine.

CLI Agent Baseline

Sandbox-based approach using the Claude Code CLI agent with full tool access inside an isolated Sandy environment.

Dual-path routing (fast vs complex)
Secure sandbox execution
Full filesystem and code access
Artifact generation

View Documentation

LangChain Baseline

In-process approach using LangChain agents with direct tool integration and streaming support.

LangChain agent framework
In-process execution
Extensible tool system
Vision model routing

View Documentation

Prize pool

The Prize Pool

This is not a one time hackathon. It is a continuous race where the prize for beating the leader gets bigger every day.

Current prize pool

$47,250.00

Accumulating sinceJan 15, 2026

Days at #18

Current championyour-janus-implementation

Miner5Your...Key

View Pool History Claim Rules

Loading prize pool diagram...

How It Works

Daily contribution: A portion of Janus platform revenue flows into the pool every day.
Accumulation: The pool grows as long as the same implementation holds the #1 rank.
Claim: When a new implementation takes the top spot, the miner behind it claims the entire accumulated pool.
Reset: After payout, the pool resets to zero and begins accumulating again.

Why This Model?

Incentivizes improvement: the longer a champion holds #1, the bigger the bounty for beating them.
Rewards sustained excellence: a clear bar is set and every challenger knows what is at stake.
Continuous competition: there is always a reason to iterate and climb.
Transparent economics: everyone can see the pool, the claims, and the reset history.

Pool Transparency

The current pool balance is displayed on the leaderboard and this page.
All contributions and payouts are recorded on-chain on Bittensor.
Historical pool data is publicly accessible and linked from the leaderboard.
The system moves toward fully automated, on-chain settlement over time.

Payout Process

Current (Phase 1 - Manual)

New #1 is verified via benchmark run.
Results are reviewed for integrity.
Payout is initiated to the miner's Bittensor coldkey.
The pool resets and the transaction is logged.

Future (Phase 2 - Automated)

Benchmark results trigger on-chain verification.
Smart contracts transfer the pool to the winner.
The pool resets atomically.
No manual intervention is required.

Claim Rules

Ties are broken by the earliest verified submission timestamp. If verification is still in progress, payouts pause until the tie is resolved.
Disqualifications for security or integrity violations void the claim. The pool remains and moves to the next highest verified submission.
Disputes trigger an audit window. Funds are released only after the review finishes and results are published.

Questions about rulings or disputes should be raised in the competition issue tracker before payouts are finalized.

Component marketplace

Component Marketplace

Beyond competing with full implementations, you can contribute reusable components to the Janus ecosystem and earn rewards when they power the leading intelligence implementation.

What are components?

Component type	Description	Examples
Research Nodes	Specialized research capabilities	Academic paper search, news aggregation
Tool Integrations	Connections to external services	GitHub API, database connectors
Memory Systems	Context management solutions	Vector stores, conversation history
Reasoning Modules	Thinking and planning logic	Chain-of-thought, tree-of-thought
Output Formatters	Response formatting	Code syntax, markdown, structured data

How it works

You build a component and publish it to the Marketplace.
Implementation developers integrate your component.
When that implementation wins, you earn a share of the prize.
Attribution is automatic via dependency tracking.

Reward sharing

When an implementation claims the prize pool, rewards are distributed between the implementation developer and the component builders.

Miner reward

80%

Goes to the implementation developer.

Component rewards

20%

Split across component builders by usage and value.

Percentages are illustrative. Final model will be determined by governance.

Component requirements

Open source: MIT, Apache 2.0, or compatible license.
Documentation: clear API docs and usage examples.
Packaging: pip package, npm module, or Docker image.
Versioning: semantic versioning with changelog.
Testing: automated tests with over 80% coverage.

Coming soon

The Marketplace is currently in development. Full launch is planned for Q2 2026, with early access starting in Q1 2026.

Loading marketplace reward flow...

Scoring model

Scoring Categories

Evaluation spans functional performance and production readiness. Each category captures a different slice of what makes an intelligence engine useful, fast, and safe in the real world.

Category	What it measures	Example benchmarks
Chat Quality	Conversational ability, helpfulness.	MT-Bench, AlpacaEval
Reasoning	Logic, math, multi-step problems.	GSM8K, MATH, ARC
Knowledge	Factual accuracy, world knowledge.	MMLU, TruthfulQA
Research	Web search, synthesis, citation.	Custom research tasks
Coding	Code generation, debugging, explanation.	HumanEval, MBPP, SWE-Bench
Tool Use	API calling, function execution.	Custom tool-use evals
Multimodal	Image understanding, file generation.	VQA, document tasks
Speed	Latency, throughput.	Time-to-first-token, TPS
Cost	Resource efficiency.	USD per 1M tokens (effective)
Streaming	Continuous output, reasoning tokens.	Streaming continuity score

Composite Score

The final leaderboard ranking is based on a composite score that combines all evaluation categories. The formula rewards implementations that excel across the board, not just in one area.

CompositeScore = Σ (CategoryScore × CategoryWeight)

Each category is scored on a normalized scale (0-100).
Weights reflect real-world usage and are published before each cycle.
Weights may be adjusted as the competition evolves.

Current weight distribution (subject to change)

Category	Weight
Quality (aggregate)	40%
Speed	20%
Cost	15%
Streaming	15%
Modality	10%

Quality aggregate includes chat, reasoning, knowledge, research, coding, tool use, and multimodal task performance.

Benchmark Suites

Evaluations use a combination of public and proprietary benchmarks.

Public benchmarks

MMLU (knowledge)
TruthfulQA (accuracy)
GSM8K, MATH (reasoning)
HumanEval, MBPP (coding)
MT-Bench (chat quality)

Proprietary benchmarks

Research synthesis tasks
Multi-step tool use scenarios
Streaming continuity tests
Multimodal generation tasks

All public benchmark implementations are open source. Proprietary benchmarks rotate to prevent overfitting.

Scoring runs are reproducible. Questions or disputes can be raised through the competition issue tracker, and we will rerun and publish findings.

Ready to Test Your Implementation?

Run the official benchmarks against your API or container

Start Scoring Run

Architecture overview

How the Janus Architecture Fits Together

Janus connects users to competing intelligence implementations through a secure gateway, TEE execution layer, and tightly controlled platform services.

High-Level Architecture

User requests flow through the Janus Gateway, route into a TEE-backed container, and call platform services as needed. Benchmarks run against the same API and feed the leaderboard.

Gateway validates and routes all OpenAI-compatible requests.
Implementations run inside Chutes CPU TEE nodes.
Platform services are available via whitelisted endpoints only.
Bench runner and scoring engine update the leaderboard.

Request flow

Request Flow

When a user sends a message, the request traverses the gateway, runs in a TEE container, calls platform services, and streams back to the client.

1. User Request

Users send OpenAI-compatible chat completion requests from Janus Chat, the API, or third-party apps.

POST /v1/chat/completions
{
  "model": "janus",
  "messages": [{"role": "user", "content": "Explain quantum entanglement"}],
  "stream": true
}

2. Gateway Routing

Validates the request format.
Selects the target implementation (current #1 or specified).
Routes to the appropriate TEE node.

3. TEE Execution

Runs inside a Chutes CPU TEE node (isolated, attested).
Has access to platform services via whitelisted endpoints.
Generates a response using whatever logic you build.

4. Platform Service Calls

5. Response Streaming

Reasoning tokens via reasoning_content field.
Content tokens via content field.
Continuous streaming, not batched.

6. User Receives Response

The gateway streams responses back to the user's client with real-time updates and final completion metadata.

Platform services

Platform Services

Your implementation can call these services from inside the container. All other outbound access is blocked.

Web Proxy

Endpoint: https://proxy.janus.rodeo

Fetch web pages for research and information gathering.

import httpx

response = httpx.get(
    "https://proxy.janus.rodeo/fetch",
    params={"url": "https://example.com/article"}
)
content = response.json()["content"]  # Markdown-formatted

Features

Converts HTML to clean markdown.
Respects robots.txt.
Rate limited: 10 requests/minute.
Max page size: 1MB.

Search API

Endpoint: https://search.janus.rodeo

Web search for finding relevant information.

import httpx

response = httpx.post(
    "https://search.janus.rodeo/search",
    json={"query": "quantum entanglement explained", "num_results": 10}
)
results = response.json()["results"]
# [{"title": "...", "url": "...", "snippet": "..."}, ...]

Features

Powered by Brave Search API.
Returns title, URL, snippet.
Rate limited: 20 searches/minute.

Vector Index

Endpoint: https://vector.janus.rodeo

Semantic search over indexed knowledge bases.

import httpx

response = httpx.post(
    "https://vector.janus.rodeo/query",
    json={"query": "How does TCP handshake work?", "top_k": 5}
)
chunks = response.json()["chunks"]
# [{"content": "...", "source": "...", "score": 0.92}, ...]

Features

Pre-indexed documentation (Chutes, Bittensor, common frameworks).
Custom index upload (future feature).
Rate limited: 50 queries/minute.

Code Sandbox

Endpoint: https://sandbox.janus.rodeo

Execute code safely in an isolated environment.

import httpx

response = httpx.post(
    "https://sandbox.janus.rodeo/execute",
    json={
        "language": "python",
        "code": "print(2 + 2)",
        "timeout": 30
    }
)
result = response.json()
# {"stdout": "4\n", "stderr": "", "exit_code": 0}

Features

Supported languages: Python, JavaScript, Bash, Go, Rust.
Timeout: max 60 seconds.
Memory: max 512MB.
File I/O available within sandbox.
Network access: none (sandbox is isolated).

Chutes Inference

Endpoint: https://api.chutes.ai

Call any model available on Chutes.

import os
from openai import OpenAI

client = OpenAI(
    api_key=os.environ["CHUTES_API_KEY"],
    base_url="https://api.chutes.ai/v1"
)

response = client.chat.completions.create(
    model="gpt-4o-mini",
    messages=[{"role": "user", "content": "Summarize this..."}]
)

Features

OpenAI: gpt-4o, gpt-4o-mini, o1, o1-mini.
Open models: Llama, Mistral, Qwen, DeepSeek.
Specialized: code, vision, embedding models.

Your implementation receives a CHUTES_API_KEY environment variable with credits for platform use.

Chutes Model Catalog

Security model

Security Model

Janus runs submissions inside a secure, isolated environment with strict egress controls and operational monitoring.

TEE Isolation

Memory encryption: RAM is encrypted; host cannot read your data.
Attestation: proof that your code runs unmodified.
Isolation: no access to host filesystem or other containers.

Secrets Management

CHUTES_API_KEY injected as an environment variable.
No hardcoded secrets; use env vars in your code.
Platform keys rotate regularly.

Monitoring

Request/response logging (content redacted).
Resource usage tracking.
Anomaly detection for unusual patterns.

Network egress control

All outbound connections are routed through a proxy.
Only whitelisted domains are allowed.
Connection attempts to other hosts are logged and blocked.

Requirements

Technical Requirements

Validate your container against the required API contract, streaming behavior, and resource limits before you submit.

API endpoints

Endpoint	Method	Required
/v1/chat/completions	POST	Yes
/health	GET	Yes
/v1/models	GET	No (recommended)

Streaming requirements

Must support stream: true for SSE responses.
Continuous output: tokens should flow continuously, not in batches.
Reasoning tokens: use reasoning_content for thinking/planning.
Finish reason: always include finish_reason in the final chunk.

Resource limits

Resource	Limit
Memory	16 GB
CPU	4 cores
Disk	50 GB
Network	Whitelisted egress only
Timeout	5 minutes per request

Whitelisted egress

Only these services are reachable from the container. All other outbound traffic is blocked.

api.chutes.ai
proxy.janus.rodeo
search.janus.rodeo
sandbox.janus.rodeo
vector.janus.rodeo

Bench runner integration

Test Your Implementation

Run the public dev suite locally to validate your container before submitting. The same tooling powers the official leaderboard.

Bench Runner UI Bench Runner API CLI Documentation

Quick start

# Install bench runner
pip install janus-bench

# Run quick suite (5 minutes)
janus-bench run --target http://localhost:8000 --suite quick

# Run full suite (2 hours)
janus-bench run --target http://localhost:8000 --suite full

# Run specific category
janus-bench run --target http://localhost:8000 --suite coding

Benchmark integration

How Scoring Works

Bench runner evaluates your implementation through the same infrastructure used for the official leaderboard.

Benchmark flow

Load suite: bench runner loads test cases from the benchmark suite.
Execute tests: each test sends a request to your API.
Collect responses: responses are captured with timing data.
Evaluate quality: LLM judges or exact match evaluate correctness.
Calculate metrics: quality, speed, cost, streaming scores computed.
Update leaderboard: composite score published.

Benchmark transparency

Public benchmarks are open source.
Evaluation prompts are published.
Scoring formulas are documented.
You can reproduce any score locally.

Submission requirements

What It Means to Submit to Janus

Your submission is an open source Docker implementation that speaks the OpenAI Chat Completions API. The competition rewards incremental progress, transparency, and reproducibility.

What You're Submitting

A Janus submission is a Docker container that implements an OpenAI-compatible Chat Completions API. Behind that API, your implementation can use any technology to generate responses.

It is not "an agent" - it is an implementation of intelligence.
It is not "a miner" - the miner is you; the submission is your creation.
It is a Docker container - portable, reproducible, isolated.
It exposes a standard API - POST /v1/chat/completions, GET /health.

What Happens to Your Submission

Your container is pulled and deployed to a Chutes CPU TEE node.
It connects to platform services: web proxy, search, sandbox, inference.
Benchmarks run against it via the same API users call.
Results are published to the leaderboard.

Build on What Exists

The competition encourages incremental improvement. Start from the current leader, make it better, and push the frontier forward.

Why incremental?

Lower barrier: improve a slice instead of rebuilding everything.
Faster progress: small improvements compound into big gains.
Community learning: each submission teaches the next.
Reduced risk: if your change fails, the delta is small.

How to start

Fork the baseline: git clone https://github.com/chutesai/janus-baseline
Study the leader: review the current #1 source code.
Identify a weakness: use benchmark breakdowns to find gaps.
Make your improvement: prompts, routing, new capabilities.
Test locally: run janus-bench to validate.
Submit your enhanced version.

Improvement cycle

Open Source Requirement

All Janus submissions must be open source. This is non-negotiable.

Rationale

Community progress: everyone learns and improves faster.
Transparency: users can inspect how requests are handled.
Security: open code can be audited.
Bittensor ethos: the network is built on openness.

Acceptable licenses

License	Allowed	Notes
MIT	Yes	Recommended
Apache 2.0	Yes	Recommended
GPL v3	Yes	Derivative works must also be GPL
BSD 3-Clause	Yes
AGPL v3	Yes	Network use triggers copyleft
Proprietary	No	Not allowed
No license	No	Defaults to proprietary

What must be open

Source code that runs inside the container.
Prompts, few-shot examples, templates.
Configuration and routing rules.
Dependency lists and Dockerfiles.

What can stay private

API keys stored in environment variables.
Training data for fine-tuned models.
Calls to proprietary models or APIs.

How Contributors Earn

The Competition Model

Submit your AI implementation to the Janus competition
Your code runs benchmarks against other submissions
Top performers earn from the prize pool
All submissions are open source, fostering community learning

Why Decentralized?

Open access: Anyone can compete - no gatekeepers or approval processes
Transparent scoring: All benchmarks and results are public
Fair rewards: Earnings distributed automatically based on performance
Community-driven: The best ideas rise to the top through open competition

Getting Started

To participate, you'll need a Bittensor wallet address (similar to a crypto wallet). This is used for:

Attribution on the leaderboard
Receiving prize pool payouts
Building your reputation across submissions

Don't have a wallet yet? Get started with Bittensor - it takes just a few minutes.

Review Process

Current Phase: manual review before a submission appears on the leaderboard.

Review checklist

Docker image accessible from the specified registry.
API compliance for /v1/chat/completions and /health.
Source code repository is public and matches the image.
License file is OSI-approved.
No obvious malicious code or data exfiltration.
No unapproved egress beyond whitelisted services.
Hotkey is valid and registered on Bittensor.
Dockerfile can reproduce the image from source.

Timeline

Submission received - review begins within 24 hours.
Review complete - benchmarks run within 48 hours.
Results published - leaderboard updated within 72 hours.

Future Phase: Decentralized Review

Phase 2 introduces a decentralized judging panel with consensus-based approvals and staking-backed incentives.

Validator set of trusted community reviewers.
Randomized assignment of submissions to reviewers.
Consensus approval required to pass.
Slashing for malicious or negligent reviewers.
Appeals handled by the broader panel.

This phase will be specified separately once governance is finalized.

Submission form

Submit Your Implementation for Review

Submissions are manually reviewed before running on private benchmarks. Use this form to share your Docker image, hotkey, source code, and license details.

Submission fields

Field	Required	Description
Implementation Name	Yes	Unique identifier (e.g., "turbo-reasoner-v2")
Docker Image	Yes	Full image reference (e.g., ghcr.io/user/janus-impl:v2)
Bittensor Hotkey	Yes	SS58 address for attribution and payout
Source Code URL	Yes	Link to public repository (GitHub, GitLab, etc.)
License	Yes	OSI-approved license identifier (e.g., "MIT")
Description	Yes	Brief description of your approach (100-500 chars)
Changelog	No	What differs from baseline or previous version
Contact	No	Discord handle or email for review communication

Example submission

name: "turbo-reasoner-v2"
image: "ghcr.io/alice/janus-turbo:2.0.1"
hotkey: "5FHneW46xGXgs5mUiveU4sbTyGBzmstUspZC92UhjJM694ty"
source: "https://github.com/alice/janus-turbo"
license: "MIT"
description: "Enhanced reasoning via chain-of-thought decomposition and parallel tool execution. Improves on baseline-v3 with 15% better GSM8K scores."
changelog: "Added CoT decomposition, parallel tool calls, improved code generation prompts"
contact: "alice#1234"

FAQ

Frequently Asked Questions

Need a quick answer before you ship? Click a category to explore.

Helpful Links

Submission Portal - Submit your implementation
Leaderboard - See current rankings
Benchmark Docs - Detailed benchmark information
janus-bench on PyPI - Local testing tool
Baseline Repository - Start from the reference implementation
Discord Community - Get help and discuss strategies
Marketplace Waitlist - Early access to components

Search tips: open ai, openai, leader board, hot key, market place, bench mark, intelligence agent.

Enter access password

Compete to Build the Best Intelligence Engine

Today's Benchmark Pulse

What is the Janus Competition?

How It Works

What Gets Evaluated

Rodeo Rankings

Arena Preference Ladder

Five Steps to the Janus Rodeo

Build

Evaluate

Submit

Compete

Earn

Reference Baselines

CLI Agent Baseline

LangChain Baseline

The Prize Pool

How It Works

Why This Model?

Pool Transparency

Payout Process

Claim Rules

Component Marketplace

What are components?

How it works

Reward sharing

Component requirements

Coming soon

Scoring Categories

Composite Score

Benchmark Suites

Ready to Test Your Implementation?

How the Janus Architecture Fits Together

High-Level Architecture

Request Flow

Platform Services

Web Proxy

Search API

Vector Index

Code Sandbox

Chutes Inference

Security Model

Technical Requirements

Test Your Implementation

How Scoring Works

What It Means to Submit to Janus

What You're Submitting

What Happens to Your Submission

Build on What Exists

Open Source Requirement

How Contributors Earn

Review Process

Future Phase: Decentralized Review

Submit Your Implementation for Review

Frequently Asked Questions

General

Submissions

Scoring

Prize Pool

Technical

Marketplace (Preview)

Helpful Links

Compete to Build the Best Intelligence Engine

Today's Benchmark Pulse

What is the Janus Competition?

How It Works

What Gets Evaluated

Rodeo Rankings

Arena Preference Ladder

Five Steps to the Janus Rodeo

Build

Evaluate

Submit

Compete

Earn

Reference Baselines

CLI Agent Baseline

LangChain Baseline

The Prize Pool