User Guide
Learn how to navigate and get the most value from our evaluation of AI-powered developer tools.
Getting Started
- 1Explore the Radar — Visit the main page to see an interactive visualization
- 2Browse Tools — Check the Tools catalog for a complete list
- 3Browse the Timeline — See the Industry Timeline for model releases, funding rounds, launches, and shutdowns
- 4Read Briefings — Visit Briefings for weekly strategic intelligence
- 5Deep Dive — Click any tool to see detailed evaluation rationale
Understanding the Radar
Each tool is evaluated across six equally-weighted ACES v2 dimensions (1–20 scale each):
AI Autonomy
Ability to plan and execute multi-step tasks (assistive → agentic → self-directed)
Integration
Depth of integration into developer workflows (plugin → IDE → platform-native)
Contextual Understanding
Depth of understanding across repos, projects, and systems (file → repo → ecosystem)
Compliance
Enterprise governance: security, audit controls, data residency, access management
Viability
Vendor sustainability: funding, team, roadmap, market position
User Interface
Interaction maturity: keyboard → chat → multimodal ("vibe coding")
Understanding Scores
Three signals, not one composite
Rating (0–100): Capability. The average of the six ACES v2 dimension scores × 5.
Signal Level: Evidence strength — Validated, Assessed, Tracked, or Detected. How much we've validated the rating.
Evidence Grade (A–D): Evaluation quality — how fresh, deep, and hands-on our evidence is.
Why separate signals? A high Rating with a Tracked signal and Grade C evidence tells a different story than the same Rating at Validated + Grade A. Signal v4.0 retires the composite “Adjusted Score” so the tradeoffs stay visible instead of being hidden inside a multiplier.
Score Interpretation
| Score Range | Interpretation |
|---|---|
| 80–100 | Exceptional — leading capabilities |
| 60–79 | Strong — solid, production-ready |
| 40–59 | Moderate — functional with gaps |
| 20–39 | Limited — basic capabilities |
| 0–19 | Minimal — significant limitations |
Evidence Grades
Each evaluation carries an evidence grade (A–D) reflecting how much direct, recent, hands-on research backs the scoring. Read the grade alongside the Signal Level and Rating — it tells you how much weight to put on the evaluation itself.
| Grade | Profile |
|---|---|
| A | Recent (<30d), thorough evaluation, hands-on tested |
| B | Recent or thorough, with partial hands-on evidence |
| C | Moderate age or depth; limited or no hands-on testing |
| D | Stale (>90d), minimal depth, no hands-on testing |
Using Presets
The radar view offers four parallel-coordinates presets, each with a distinct axis ordering to make different questions visible:
What's Validated?
Compliance-first axis ordering. Surfaces tools with enterprise-ready governance posture before capability peaks.
Capability Leaders
Autonomy-first axis ordering. Highlights the highest-capability tools regardless of evaluation maturity.
Full Landscape
Default axis order across all six dimensions. No brushing or highlighting — a neutral view of the field.
Enterprise Gaps
Compliance-first with the compliance axis highlighted. Designed to make governance trade-offs visible at a glance.
Signal Levels
Signal level describes how much evidence supports a tool's rating — it is independent of rating. A high-rated Detected tool carries more risk than a moderate-rated Validated tool. The confidence band is the implied evidence-strength range for each level; we show it as a reference, not as a score multiplier.
| Signal Level | Evidence Band | Meaning |
|---|---|---|
| Validated | 85–100% | Production-validated in enterprise environments with named customers or audits |
| Assessed | 65–90% | Active evaluation with substantial internal and third-party evidence |
| Tracked | 50–75% | Monitored tool; research-based assessment without direct testing |
| Detected | 15–50% | Recently identified; minimal evaluation has commenced |
Safety Caps
Three retained safety caps limit scores when documented conditions apply. Caps are temporary — removed when the triggering condition is resolved with documented evidence.
| Cap | Trigger | Impact |
|---|---|---|
| critical-security-vuln | Unpatched critical CVE or active security incident | Compliance ≤ 5 |
| community-exodus | Documented mass migration away from the tool | All dimensions ≤ 12 |
| stalled-development | No releases or meaningful updates in 90+ days | All dimensions ≤ 12 |
Frequently Asked Questions
How often are tools re-evaluated?
Monthly batch re-evaluation on the 1st of each month. Significant product changes (major releases, security incidents, reliability events) can trigger ad-hoc re-evaluation between batches.
Why is there no Adjusted Score anymore?
Signal v4.0 retires the composite Adjusted Score (Rating × Confidence) in favor of three independent signals: Rating (capability), Signal Level (evidence strength: Validated/Assessed/Tracked/Detected), and Evidence Grade (evaluation recency, depth, hands-on testing). Collapsing them into one multiplied score hid useful tradeoffs — a Rating 80 at Tracked + Grade C tells a very different story than the same Rating at Validated + Grade A, but the old Adjusted Score obscured that. Read the three together.
Why are some tools showing minimal scores?
Tools at the Detected signal level have only just been identified and evaluation hasn't meaningfully commenced. They appear in the catalog with provisional scores and a low confidence band (15–50%) until more evidence is gathered.
How can I suggest a tool?
Use the Submit page to suggest a new tool. Provide details about capabilities and your use case.
Can I share my custom tool selection?
Yes — URLs preserve your selection state. Copy and share the URL to let others see the same comparison.
Why do some dimensions have capped scores?
Three safety caps limit scores when specific, documented conditions apply: critical-security-vuln (unpatched CVE), community-exodus (mass migration away), and stalled-development (90+ days no releases). Caps are temporary — they're removed when the triggering condition is resolved with documented evidence. The cap, if active, is shown on the tool's detail page.