Skip to main content

User Guide

Learn how to navigate and get the most value from our evaluation of AI-powered developer tools.

Getting Started

  1. 1Explore the Radar — Visit the main page to see an interactive visualization
  2. 2Browse Tools — Check the Tools catalog for a complete list
  3. 3Browse the Timeline — See the Industry Timeline for model releases, funding rounds, launches, and shutdowns
  4. 4Read Briefings — Visit Briefings for weekly strategic intelligence
  5. 5Deep Dive — Click any tool to see detailed evaluation rationale

Understanding the Radar

Each tool is evaluated across six equally-weighted ACES v2 dimensions (1–20 scale each):

AI Autonomy

Ability to plan and execute multi-step tasks (assistive → agentic → self-directed)

Integration

Depth of integration into developer workflows (plugin → IDE → platform-native)

Contextual Understanding

Depth of understanding across repos, projects, and systems (file → repo → ecosystem)

Compliance

Enterprise governance: security, audit controls, data residency, access management

Viability

Vendor sustainability: funding, team, roadmap, market position

User Interface

Interaction maturity: keyboard → chat → multimodal ("vibe coding")

Understanding Scores

Three signals, not one composite

Rating (0–100): Capability. The average of the six ACES v2 dimension scores × 5.

Signal Level: Evidence strength — Validated, Assessed, Tracked, or Detected. How much we've validated the rating.

Evidence Grade (A–D): Evaluation quality — how fresh, deep, and hands-on our evidence is.

Why separate signals? A high Rating with a Tracked signal and Grade C evidence tells a different story than the same Rating at Validated + Grade A. Signal v4.0 retires the composite “Adjusted Score” so the tradeoffs stay visible instead of being hidden inside a multiplier.

Score Interpretation

Score RangeInterpretation
80–100Exceptional — leading capabilities
60–79Strong — solid, production-ready
40–59Moderate — functional with gaps
20–39Limited — basic capabilities
0–19Minimal — significant limitations

Evidence Grades

Each evaluation carries an evidence grade (A–D) reflecting how much direct, recent, hands-on research backs the scoring. Read the grade alongside the Signal Level and Rating — it tells you how much weight to put on the evaluation itself.

GradeProfile
ARecent (<30d), thorough evaluation, hands-on tested
BRecent or thorough, with partial hands-on evidence
CModerate age or depth; limited or no hands-on testing
DStale (>90d), minimal depth, no hands-on testing

Using Presets

The radar view offers four parallel-coordinates presets, each with a distinct axis ordering to make different questions visible:

What's Validated?

Compliance-first axis ordering. Surfaces tools with enterprise-ready governance posture before capability peaks.

Capability Leaders

Autonomy-first axis ordering. Highlights the highest-capability tools regardless of evaluation maturity.

Full Landscape

Default axis order across all six dimensions. No brushing or highlighting — a neutral view of the field.

Enterprise Gaps

Compliance-first with the compliance axis highlighted. Designed to make governance trade-offs visible at a glance.

Signal Levels

Signal level describes how much evidence supports a tool's rating — it is independent of rating. A high-rated Detected tool carries more risk than a moderate-rated Validated tool. The confidence band is the implied evidence-strength range for each level; we show it as a reference, not as a score multiplier.

Signal LevelEvidence BandMeaning
Validated85–100%Production-validated in enterprise environments with named customers or audits
Assessed65–90%Active evaluation with substantial internal and third-party evidence
Tracked50–75%Monitored tool; research-based assessment without direct testing
Detected15–50%Recently identified; minimal evaluation has commenced

Safety Caps

Three retained safety caps limit scores when documented conditions apply. Caps are temporary — removed when the triggering condition is resolved with documented evidence.

CapTriggerImpact
critical-security-vulnUnpatched critical CVE or active security incidentCompliance ≤ 5
community-exodusDocumented mass migration away from the toolAll dimensions ≤ 12
stalled-developmentNo releases or meaningful updates in 90+ daysAll dimensions ≤ 12

Frequently Asked Questions

How often are tools re-evaluated?

Monthly batch re-evaluation on the 1st of each month. Significant product changes (major releases, security incidents, reliability events) can trigger ad-hoc re-evaluation between batches.

Why is there no Adjusted Score anymore?

Signal v4.0 retires the composite Adjusted Score (Rating × Confidence) in favor of three independent signals: Rating (capability), Signal Level (evidence strength: Validated/Assessed/Tracked/Detected), and Evidence Grade (evaluation recency, depth, hands-on testing). Collapsing them into one multiplied score hid useful tradeoffs — a Rating 80 at Tracked + Grade C tells a very different story than the same Rating at Validated + Grade A, but the old Adjusted Score obscured that. Read the three together.

Why are some tools showing minimal scores?

Tools at the Detected signal level have only just been identified and evaluation hasn't meaningfully commenced. They appear in the catalog with provisional scores and a low confidence band (15–50%) until more evidence is gathered.

How can I suggest a tool?

Use the Submit page to suggest a new tool. Provide details about capabilities and your use case.

Can I share my custom tool selection?

Yes — URLs preserve your selection state. Copy and share the URL to let others see the same comparison.

Why do some dimensions have capped scores?

Three safety caps limit scores when specific, documented conditions apply: critical-security-vuln (unpatched CVE), community-exodus (mass migration away), and stalled-development (90+ days no releases). Caps are temporary — they're removed when the triggering condition is resolved with documented evidence. The cap, if active, is shown on the tool's detail page.

Learn More