CLI Agent Baseline
Sandbox-based reference implementation using a Claude Code CLI agent to handle complex work while a fast path serves simple requests.
Architecture
How It Works
- Receives OpenAI-compatible chat completion requests.
- Runs keyword-based complexity detection followed by a Nemotron routing decision.
- Routes simple prompts to fast models or complex prompts to the agent sandbox.
- Executes the Claude Code CLI agent with full tool access.
- Streams responses with reasoning content and artifacts over SSE.
Request Flow
Use the same OpenAI-compatible payloads. Simple requests route through the fast path, while complex tasks go to the CLI agent sandbox.
Simple request
curl -X POST http://localhost:8080/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
"model": "baseline",
"messages": [{"role": "user", "content": "What is 2+2?"}],
"stream": true
}'Complex request
curl -X POST http://localhost:8080/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
"model": "baseline",
"messages": [{"role": "user", "content": "Generate an image of a sunset and analyze its colors"}],
"stream": true
}'Features
Dual-path routing
Keyword checks plus an LLM decision choose both the path and model; metadata can pin a decision.
Sandboxed agent execution
Claude Code runs inside Sandy with full tool access and isolated filesystem permissions.
Streaming with artifacts
SSE responses include reasoning content plus artifact URLs for generated files.
Chutes integrations
LLM, image generation, TTS, and web search are routed through Chutes APIs.
Deep research support
Chutes search with citations powers research-heavy prompts.
Configurable routing
Fine-tune complexity thresholds, routing models, and model router behavior.
CLI Agent Capabilities
Use JANUS_BASELINE_AGENT or X-Baseline-Agent to select the CLI agent. Badges reflect currently verified capabilities.
Claude Code
Default CLI agent with full sandbox tooling.
Roo Code CLI
Experimental CLI agent for autonomous workflows.
Cline CLI
Experimental CLI agent with multi-tool support.
OpenCode
Non-interactive CLI runner (capabilities pending validation).
Codex
CLI output capture under investigation.
Aider
Code editing workflows only.
Configuration
Legacy BASELINE-prefixed variables are still accepted. Use the tables below for the complete list of available settings.
Server configuration
| Variable | Default | Description |
|---|---|---|
| BASELINE_AGENT_CLI_HOST | 0.0.0.0 | Server host binding for the API. |
| BASELINE_AGENT_CLI_PORT | 8080 | Server port for the API. |
| BASELINE_AGENT_CLI_DEBUG | false | Enable debug logging and verbose output. |
Chutes API configuration
Chutes provides OpenAI-compatible inference for LLM and tools.
| Variable | Default | Description |
|---|---|---|
| BASELINE_AGENT_CLI_OPENAI_API_KEY | - | Chutes API key (named for OpenAI client compatibility). |
| BASELINE_AGENT_CLI_OPENAI_BASE_URL | - | Legacy alias for the Chutes API base URL. |
| BASELINE_AGENT_CLI_CHUTES_API_BASE | https://llm.chutes.ai/v1 | Primary Chutes API base URL. |
| BASELINE_AGENT_CLI_MODEL | janus-router | Model name exposed to clients. |
| BASELINE_AGENT_CLI_DIRECT_MODEL | zai-org/GLM-4.7-TEE | Direct model used on the fast path when routing allows. |
Vision routing
| Variable | Default | Description |
|---|---|---|
| BASELINE_AGENT_CLI_VISION_MODEL_PRIMARY | moonshotai/Kimi-K2.5-TEE | Primary vision model for image requests. |
| BASELINE_AGENT_CLI_VISION_MODEL_FALLBACK | moonshotai/Kimi-K2.5-TEE | Fallback vision model for image requests. |
| BASELINE_AGENT_CLI_VISION_MODEL_TIMEOUT | 60.0 | Timeout for vision model requests (seconds). |
| BASELINE_AGENT_CLI_ENABLE_VISION_ROUTING | true | Route image requests to vision models. |
Sandy sandbox
| Variable | Default | Description |
|---|---|---|
| SANDY_BASE_URL | - | Sandy API base URL for agent execution. |
| SANDY_API_KEY | - | Sandy API key for sandbox access. |
| BASELINE_AGENT_CLI_SANDY_TIMEOUT | 300 | Sandbox timeout in seconds. |
| JANUS_ARTIFACT_PORT | 5173 | Sandbox artifact server port. |
| JANUS_ARTIFACTS_DIR | /workspace/artifacts | Filesystem path for sandbox artifacts. |
Agent configuration
| Variable | Default | Description |
|---|---|---|
| BASELINE_AGENT_CLI_AGENT_PACK_PATH | ./agent-pack | Path to agent documentation and prompts. |
| BASELINE_AGENT_CLI_SYSTEM_PROMPT_PATH | ./agent-pack/prompts/system.md | System prompt for the CLI agent. |
| BASELINE_AGENT_CLI_ENABLE_WEB_SEARCH | true | Enable web search tools. |
| BASELINE_AGENT_CLI_ENABLE_CODE_EXECUTION | true | Enable code execution tools. |
| BASELINE_AGENT_CLI_ENABLE_FILE_TOOLS | true | Enable file tooling. |
| JANUS_BASELINE_AGENT | claude-code | CLI agent command invoked in the sandbox. |
Routing configuration
| Variable | Default | Description |
|---|---|---|
| BASELINE_AGENT_CLI_ALWAYS_USE_AGENT | false | Force all requests onto the agent path. |
| BASELINE_AGENT_CLI_LLM_ROUTING_MODEL | nvidia/NVIDIA-Nemotron-3-Nano-30B-A3B-BF16 | Fast model used for routing decisions. |
| BASELINE_AGENT_CLI_LLM_ROUTING_TIMEOUT | 3.0 | Routing check timeout (seconds). |
| BASELINE_AGENT_CLI_COMPLEXITY_THRESHOLD | 100 | Token threshold for complexity detection. |
Model router
| Variable | Default | Description |
|---|---|---|
| BASELINE_AGENT_CLI_USE_MODEL_ROUTER | true | Enable the local composite model router. |
| BASELINE_AGENT_CLI_ROUTER_HOST | 127.0.0.1 | Router host. |
| BASELINE_AGENT_CLI_ROUTER_PORT | 8000 | Router port. |
Container overrides
Alternative environment variables honored by container deployments.
| Variable | Default | Description |
|---|---|---|
| HOST | - | Container host override. |
| PORT | - | Container port override. |
| DEBUG | - | Container debug override. |
| LOG_LEVEL | - | Container log level override. |
| OPENAI_API_KEY | - | Container OpenAI/Chutes API key alias. |
| OPENAI_BASE_URL | - | Container OpenAI/Chutes base URL alias. |
Getting Started
# Create virtual environment
python -m venv .venv
source .venv/bin/activate
# Install with dev dependencies
pip install -e ".[dev]"
# Run the baseline
python -m janus_baseline_agent_cli.main