Release Notes

Complete version history and changelog. Most recent first.

Version 5.1.0 - Comparative Recalibration

June 2026

The ACES scoring framework moved from feature-count anchors to comparative, evidence-gated grading (ACES v3 Phases A+B), and the full catalog was re-scored under the new rubric on June 10. Scores and tiers changed materially across the radar; this release documents what moved and why, plus the pipeline-integrity layer shipped alongside it.

Scoring recalibration (ACES v3)

5-band comparative rubric — every dimension is graded relative to the tool's category cohort, not by counting features. Band 3 must be earned; band 4 requires named comparative evidence vs ≥2 peers; band 5 (17–20) requires hands-on or independent third-party proof.
Full-catalog re-score — all 78 tools re-graded in single-sitting category batches. Tier shifts: Leading thinned from 13 tools to 4 (only evidence-defended tops remain); 25 tools changed tier; mean rating 62.9 → 58.8. Radar shapes are now distinct per tool instead of near-identical hexagons.
Anti-clustering QA gate — a distribution checker (middle-band share, mode mass, within-cohort spread, distinct values) runs before any scoring batch commits; the recalibrated catalog passes on every dimension for the first time.
Hands-on testing program — a 13-tool wave with a published protocol; tested tools become the first eligible for band-5 capability scores.

Pipeline integrity

Fail-closed publication gates — briefing grounding validation blocks publication (unverifiable citations are held, never shipped); tool-data pushes pass schema, cap-ceiling, and snapshot validation before landing.
Calibration loop — a weekly human audit samples sources, briefing items, and timeline promotions; verdicts feed a calibration report measuring the pipeline's automated judgments over time.
Shadow-phase rehearsal — the intelligence pipeline is rehearsing fully-autonomous operation; daily output publishes to an observation branch during the window, so /brief updates pause briefly and resume at the flip.

Version 5.0.0 - Public Launch

June 2026

Signal's first full public release. The entire v4 platform goes live on the public domain in one consolidated launch, current as of June 1, 2026. The v4.0–v4.4 line shipped to the signal branch over April–May but was never promoted to the public site; v5.0.0 brings it all live at once, on fresh data.

The platform, live

Dimensional radar — parallel-coordinates comparison of 78 agentic developer tools across the six ACES v2 dimensions, with persona presets, brushing, pinning, and shareable URL state.
Evidence-graded methodology — every score is one click from its rubric; Signal Level (Validated / Assessed / Tracked / Detected) and Evidence Grade (A–D) are visible in-context. All 78 tools freshly evaluated on Claude Opus 4.8.
Intelligence surfaces — /intelligence positions, /briefings, /intelligence/digests, and a live /timeline of industry events, fed by an automated daily pipeline.
MCP server — query the radar's tool data, intelligence, and methodology directly from an MCP client.
Labs — experimental surfaces under /labs/* (instrument-panel aquarium, constellation, history & velocity).

Folded in since the last public deploy

Consolidates everything from v4.0 “Signal Relaunch” through v4.4 “Knowledge-Base Currency” (see entries below), which shipped to the signal branch but was never promoted to the public site.
Plus the immediate pre-launch work: the Opus 4.8 evaluation re-baseline, the restored and auto-publishing /timeline, and a light-only UI (dark mode removed to match the instrument-panel design system).

Version 4.4.0 - Knowledge-Base Currency

June 2026

Pre-release audit on 2026-05-27 found the knowledge layer 91 days stale across positions/partnerships/pursuits, plus a 3-week hole in /intelligence/digests and a 5-day daily-brief blackout. This release closes the knowledge gaps so every /intelligence/* surface reflects late-May reality, hardens the pipeline (test vocab refresh + Zod type guard on timeline promotion), and tidies IA (aquarium → /labs). Three further changes landed before the 6/1 release date: all 78 tool evaluations were re-baselined on Claude Opus 4.8, the /timeline view (frozen since 2026-04-30) was restored and set to auto-publish high-signal events, and the aquarium link was removed from the primary nav.

Knowledge Refresh

6 positions refreshed to 2026-05-27 — six parallel research agents surveyed 90 days of intel + decision log + external sources. All six positions "refined, not shifted": core stances validated by evidence. Headlines: MCP became infrastructure (97M monthly downloads, donated to Linux Foundation); production multi-agent platforms shipped (Claude Managed Agents, Microsoft Agent 365 GA, OpenAI Workspace Agents); Anthropic overtook OpenAI in business share (34.4% vs 32.3%); EU AI Act August 2 enforcement locked in; Five Eyes joint agentic guidance published.
3 stale partnerships refreshed to 2026-05-28 — Cognition (now $26B / $492M ARR, reframed as single integrated platform); Glean (expanded to Work AI platform + ADLC governance layer); Microsoft-GitHub (internal Claude-Code cancellation flagged as vendor-lock-in signal; metered-billing 6/1 cutover added as FinOps trigger).
3 stale pursuits refreshed to 2026-05-28 — ai-infrastructure-advisory got 7 named time-bound deliverables; ai-native-engineering-enablement reframed around tool routing + FinOps + parallel-agent workflow training; enterprise-ai-governance-offering documented Big Four competitive entry (Agent 365 GA partners, KPMG-Anthropic alliance) while status held at Proposed pending owner assignment.
3 weekly digests backfilled — W19 (5/4–5/10), W20 (5/11–5/17), W21 (5/18–5/24) generated retroactively to close the 3-week hole in /intelligence/digests. All three carry DRAFT — NOT YET REVIEWED banners; not for ELT redistribution.
Decision log entry — 2026-05-28-v4.4-prep-data-refresh.md records the cross-cutting audit, accepted historical gaps (5/16–5/20 ingest blackout), and surfaced action items (pursuit owner assignments, FinOps framework urgency).

Evaluation Re-baseline

All 78 tool evaluations re-run on Claude Opus 4.8 ([#177](https://github.com/wwtdigital/ai-research/pull/177)) — tools were last scored 2026-05-17 on Opus 4.7; the full /evaluate pipeline (research, score, commit) re-ran on Opus 4.8 so the release ships on current scores. A new optional evalModel field records the evaluating model on each tool JSON and in tools.schema.json. Because this was a full re-research, each tool's movement blends the 4.7-to-4.8 model effect with ~11 days of real-world change — it is not a pure model delta.
Velocity safeguard — the coordinated one-day re-score is recorded in methodology-events.json and surfaced in /labs/history and /labs/history/velocity as a banner with an exclude-toggle, so the re-baseline does not read as a market-wide move. Provenance in decision log 2026-05-28-opus-4.8-eval-rebaseline.md.

Timeline Currency

Timeline restored to current ([#178](https://github.com/wwtdigital/ai-research/pull/178)) — the /timeline view had been frozen at 2026-04-30 because promotion was never wired into the active pipeline and a single out-of-enum timeline_type aborted the whole batch. Promotion was wired into the daily pipeline (new version-controlled /daily-pipeline skill), the pre-flight hardened to skip-and-warn bad types instead of aborting, 7 mis-typed sources corrected, and the ~5-week May gap backfilled (37 events; 252 to 289 total).
High-signal events auto-publish ([#179](https://github.com/wwtdigital/ai-research/pull/179)) — daily promotion now runs with --publish, so significance-4+ events post to the live timeline automatically, gated by the significance threshold, title-and-date dedup, and the valid-type pre-flight. No manual triage step; no silent freeze.

Information Architecture

`/aquarium` → `/labs/aquarium` — the physics aquarium is an experimental surface and belongs in /labs/* with the rest of the experiments. Page moved via git mv; Navbar href updated; 307 redirect added (/aquarium → /labs/aquarium) matching the labs-IA convention from 2026-05-26. Feature flag and component imports unchanged.
Aquarium removed from the primary nav ([#174](https://github.com/wwtdigital/ai-research/pull/174), [#175](https://github.com/wwtdigital/ai-research/pull/175)) — with the experiment living under /labs/*, the top-level Navbar link was redundant; removed it and added a regression test asserting it is gone.

Pipeline Hardening

ACES v2 status-vocab refresh — two stale tests in intel/tests/test_evaluation_triage.py still used legacy Watch/Backlog values; refreshed to Tracked/Detected so the compute_status_signals() contract is covered. Docstring at evaluation-triage.py:197-198 and placeholder values in test_radar_enrich.py updated to match. Turns the advisory test.yml badge green on signal.
Timeline event type Zod guard — intel/scripts/promote-timeline.py now fails loud when a candidate event carries a type outside the Zod enum (model_release, product_launch, business_move, open_source, pricing_platform, policy_research, shutdown). intel/scripts/lib/timeline_types.py is the single Python source of truth, with a regex-parse drift test against radar/scripts/generate-timeline-data.js so CI catches future drift on either side. Closes a latent build break (draft events skip Zod, but would have broken the radar build on first promotion). (Refined in [#178](https://github.com/wwtdigital/ai-research/pull/178) to skip-and-warn on a bad type rather than abort the whole batch, so one out-of-enum candidate can no longer block a month of promotions.)
`generate-tools-data.js` build-date override — new BUILD_DATE_OVERRIDE env var lets release builds stamp the snapshot with the planned release date instead of the build moment. Falls back to new Date() for ordinary builds. Used here to pre-mark the 6/1 snapshot.

Version 4.3.0 - Coherence & Maintainability

May 2026

Cleanup pass after the v4.2 surface refinement. Retires residual v3 vocabulary, fixes internal contradictions on the methodology page, and converts the 2466-line hand-typed releases page to a data-driven format over releases.json.

Version 4.2.1 - Beta-Gate Cleanup

May 2026

Closes the two pages that blocked the impeccable 28/40 beta-banner gate after the v4.2 page-by-page refinement. Median heuristic score across the 13 audited surfaces is now 30/40; every page clears 28. NEXT_PUBLIC_SIGNAL_BETA can be flipped to 0.

Version 4.1.0 - Parallel Coordinates Refinement

April 2026

Polish pass on the v4.0 primary visualization based on a UX audit. Persona presets now apply default brushes (not just axis reordering); a Top-N rating filter cuts through 78+ tools when scanning leaders; hover-fade isolates the path under the cursor; evidence grade is encoded in the line via stroke-dasharray (A solid, D dotted); and brush state, hidden categories, highlight dimension, active preset, and Top-N are all persisted to URL query params so a copied URL reproduces the full chart.

Version 4.0.0 - Signal Relaunch

April 2026

Product relaunch with the ACES v2 evaluation framework, parallel coordinates as the primary visualization, and Signal as the new brand identity. Governance split into Compliance and Viability; Collaboration renamed Integration; Status replaced with Signal Levels (Validated / Assessed / Tracked / Detected); Weighted Score replaced by Rating + Signal Level + Evidence Grade. All 78 tools re-scored.

Version 3.0.0 - Dashboard Landing Page

April 2026

New dashboard landing page replaces the cold radar redirect. First-time visitors now see a narrative-driven overview before exploring the chart. The dashboard auto-diffs the two most recent snapshots to surface gainers, decliners, and new tools each evaluation cycle.

Version 2.7.3 - April 2026 Evaluation Cycle

April 2026

Full April evaluation cycle — 78 tools scored, 21 new additions, timeline promotion, knowledge validation cleanup, and a ThemeProvider test fix.

Version 2.7.1 - Skills Migration & Tooling

March 2026

Version 2.7.0 - Markdown-to-JSON Migration

March 2026

Version 2.6.2 - Embeddings Perf & Dependency Updates

February 2026

Version 2.6.1 - MCP Transport Reliability Fix

February 2026

Version 2.6.0 - Chat Full Pass & MCP Formatters

February 2026

Version 2.5.0 - Community CTAs & MCP Expansion

February 2026

Version 2.4.1 - UX Bug Fixes & Test Coverage

February 13, 2026

Bug fixes for comparison modal and timeline empty states, plus 130 new tests.

Version 2.4.0 - Industry Timeline & Evaluation Pipeline

February 13, 2026

New Industry Timeline page, unified evaluation pipeline, data quality audit skill, and hybrid confidence architecture.

Version 2.2.1 - February 2026 Evaluations & Dependency Update

February 5, 2026

Monthly tool evaluation data refresh and MCP SDK dependency update.

Version 2.2.0 - Monthly Evaluation Skill

February 3, 2026

New /evaluate-tools skill for systematic monthly tool evaluations with research persistence and Notion integration.

Version 2.1.0 - Category Simplification & Content Quality

February 3, 2026

Simplified category taxonomy for clarity, added new tools, improved content quality across evaluations.

Version 2.0.0 - Category Reorganization & Platform Maturity

February 2, 2026

Tool categories restructured, Chat UX overhauled, and comprehensive Next.js architecture improvements signal platform maturity.

Version 1.13.0 - MCP Server Integration

February 2, 2026

New Model Context Protocol server enables AI assistants like Claude and ChatGPT to query tool evaluations programmatically.

Version 1.12.0 - Visitor Feedback & Site Footer

January 20, 2026

New user-facing feature for visitor feedback with Slack integration, plus consistent site-wide footer branding.

Version 1.11.1 - UX Polish & Error Handling

January 19, 2026

Bug fixes, UX improvements, and enhanced error observability.

Version 1.11.0 - Intelligent Chat & Visual Trends

January 17, 2026

AI-powered chat with tool calling for intelligent discovery, plus sparkline visualizations for trend analysis.

Version 1.10.0 - Performance & Data Robustness

January 15, 2026

Local favicon caching for faster page loads, robust Notion data parsing to prevent build failures, and UI refinements.

Version 1.8.2 - January 2026 Insights Refresh

January 12, 2026

Insights page refreshed with January re-evaluation data, plus Backlog column visibility fix.

Version 1.8.1 - Quality of Life Improvements

January 12, 2026

Bug fixes, documentation updates, and minor UX enhancement with reset preferences button.

Version 1.8.0 - Chat Experience Overhaul

January 8, 2026

Chat elevated to first-class navigation feature with interactive tool cards, comparison view, enhanced streaming UX, and comprehensive E2E testing infrastructure.

Version 1.7.0 - User Documentation & Contextual Help

January 6, 2026

New user documentation system with contextual help tooltips, comprehensive user guide, and streamlined admin documentation.

Version 1.6.0 - Security & Insights Redesign

January 6, 2026

Security hardening with XSS protection and rate limiting, redesigned Insights page with dynamic statistics, and infrastructure upgrade to Notion SDK v5.6.0.

Version 1.4.0 - Dark Mode & Quality

December 28, 2025

Seven new user-facing features including dark mode, mobile optimization, accessibility improvements, admin dashboard, and comprehensive testing infrastructure.

Version 1.3.0 - Export & Notifications

December 19, 2025

Three new user-facing capabilities: CSV/JSON export for data portability, Slack notifications for team communication, and a health check endpoint for system monitoring.

Version 1.2.0 - Comparison View & State Persistence

December 19, 2025

New comparison view enables side-by-side tool analysis, plus comprehensive state persistence saves user preferences across sessions.

Version 1.1.1 - Holiday Easter Egg

December 18, 2025

Time-bound holiday Easter egg with festive animations active December 1 through January 1.

Version 1.1.0 - Interactive Radar Experience

December 17, 2025

Major UX improvements to the radar visualization with clickable tool dots, floating preset pills, dimension wedge highlighting, and a unified pill button design system across all pages.

Version 1.0.2 - Tooltip Fixes & UI Refinements

December 17, 2025

Bug fixes for tooltip clipping issues, UI consistency improvements with icon standardization, and technical infrastructure enhancements.

Version 1.0.1 - Production Hardening

December 12, 2025

Critical bug fixes for production tool submission, improved error handling, and testing infrastructure to prevent regressions.

Version 1.0.0 - Production Ready

December 11, 2025

Version 0.13.0 - Shareable Presets & Architecture Refresh

December 2025

Version 0.12.0 - Submission UX & Radar Controls

December 2025

Version 0.11.0 - Documentation Navigation & Kanban Enhancements

December 2025

Version 0.10.3 - Hotfix: UI Fixes

December 2025

For complete release history including versions 0.10.0 and earlier, see the full changelog:

View Full Release Notes on GitHub