Goose
Tracked across 12 snapshots (2026-04-09 → 2026-06-12).
Rating timeline
Dimension trajectory
Autonomy
10/20Integration
12/20Context
9/20Compliance
7/20Viability
12/20Interface
12/20Cap timeline
Notable events
- 2026-04-20Cap −no-enterprise-features
- 2026-04-20Cap −unvalidated-benchmarks
- 2026-04-20Cap −no-codebase-indexing
- 2026-05-16Cap +unvalidated-benchmarks
- 2026-05-16Cap +no-codebase-indexing
- 2026-06-10Drop-7 rating
What would move this next
Upgrade if: (a) Enterprise features ship for the agent — SSO/SAML, audit logs, SOC 2 (would lift compliance out of Band 1); (b) Independent SWE-bench results published (removes unvalidated-benchmarks cap, autonomy could move to 15-16); (c) Native codebase indexing/embeddings shipped (removes no-codebase-indexing cap, context could move to 13-14); (d) Sustained release velocity + viability evidence justifies viability 13->14; (e) Hands-on validation (handsOn -> demo/tested) would raise evidence grade and could support an Assessed signal level. Downgrade if: AAIF/Block reduces investment, major unpatched security vulnerability, or reliability issues (latency, stuck agents, loops) escalate to warrant a reliability-complaints cap.