# Changelog All notable changes to Arbiter will be documented in this file. The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.1.0/), and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html). ## [Unreleased] ## [0.5.0] - 2025-06-16 ### Added - Conversational follow-up system for PR comment Q&A - Question detection in PR comments with confidence scoring - Agent routing based on keywords, finding references, and context - Agent explain() method for providing detailed follow-up explanations - Conversation storage in database (ConversationModel, ConversationMessageModel) - Webhook handling for GitHub issue_comment and GitLab note events - Worker task process_followup for async question processing - REST API endpoints for conversations - `GET /api/conversations` - List conversations with filtering - `GET /api/conversations/{id}` - Conversation detail with messages - `GET /api/conversations/review/{id}` - Conversation for specific review - Explain prompt templates for each agent (security, style, complexity) - Configuration settings for follow-up behaviour - followup_enabled - Enable/disable follow-up processing - followup_confidence_threshold - Minimum confidence to respond - followup_max_tokens_per_response - Token limit for responses - React dashboard for exploring reviews and monitoring metrics - Review list page with filtering (repository, status, verdict, author) and pagination - Review detail page with findings grouped by severity and expandable cards - Deliberation timeline showing step-by-step decision process - Metrics dashboard with charts for verdicts, severities, reviews over time - VerdictBadge and SeverityBadge components with color-coded indicators - TanStack Query for data fetching and caching - Tailwind CSS for styling with responsive layouts - Docker configuration with nginx for production builds - `GET /api/reviews/metrics` endpoint for aggregate statistics - Total/completed review counts - Average cost per review - Verdict and severity distribution - Reviews over last 30 days - Cost breakdown by agent - Idempotent comment updates for re-reviews - Comment model for representing PR/MR comments - `get_comments()` method on platform clients to list PR comments - `update_comment()` method to edit existing comments - Arbiter marker (``) embedded in review comments - Re-reviews now update existing Arbiter comment instead of posting new ones - Graceful fallback: posts new comment if fetching comments fails - Documentation - Dashboard README with component architecture and usage - Deployment guide with Docker, production, and scaling guidance - API reference with endpoint documentation and webhook schemas - Environment variable reference (.env.example) - Troubleshooting section in main README ### Changed - Increased test coverage from 74% to 86%+ - LiteLLMClient initialization and complete method tests - Static analysis runner tests with mocked subprocess execution - API dependency injection tests (get_db, get_redis, close_redis) - Extended test fixtures for platform client mocking - Version bumped to 0.5.0 ## [0.4.0] - 2025-04-13 ### Added - GitHub integration client for fetching diffs and posting review comments - Fetch PR diffs via GitHub API - Post review comments on pull requests - Update commit status checks (pending/success/failure/error) - Automatic retry with exponential backoff on transient errors - Rate limit monitoring with warnings when limits are low - GitLab integration client with equivalent functionality - Fetch MR diffs via GitLab API - Post notes on merge requests - Update commit pipeline status - URL-encoded project path handling - Review comment formatter for Markdown output - Verdict header with icon (approve/changes/comment) - Summary statistics table by severity - Findings grouped by severity level - Conflicts section for agent disagreements - Optional cost/tokens footer - Automatic truncation to stay within GitHub's 65535 char limit - Platform integration exceptions - AuthenticationError for 401/403 responses - RateLimitError for 429 responses with retry_after - NotFoundError for 404 responses - PlatformError for 5xx server errors - Extended configuration settings - github_token and gitlab_token for API authentication - github_base_url and gitlab_base_url for enterprise instances - integration_timeout and integration_max_retries - status_check_context for commit status naming - post_comments and update_status feature flags - Worker integration with platform clients - Automatic diff fetching when not provided - Commit status set to pending on job start - Review comment posted on completion - Status updated to success/failure based on verdict - Graceful degradation on platform API failures ### Changed - Worker process_review now accepts platform parameter - Webhook routes pass platform identifier to queue - Version bumped to 0.4.0 ### Dependencies - Added httpx>=0.26.0 to main dependencies - Added tenacity>=8.2.0 for retry logic - Added respx>=0.21.0 to dev dependencies for HTTP mocking ## [0.3.0] - 2025-03-29 ### Added - FastAPI application with lifespan management, CORS, and exception handlers - PostgreSQL database schema with SQLAlchemy async ORM - ReviewModel for storing PR review metadata and results - FindingModel for individual agent findings - ConflictModel for detected conflicts between agents - DeliberationStepModel for audit trail - PolicyModel for organisation configurations - Alembic migrations with async PostgreSQL support - Redis-backed arq worker for async job processing - Review task with full pipeline execution - Job deduplication by repository/PR/commit - Priority queuing (draft PRs get lower priority) - REST API endpoints - `GET /api/reviews` - List reviews with pagination and filtering - `GET /api/reviews/{id}` - Review detail with findings - `GET /api/reviews/{id}/deliberation` - Deliberation log - `POST /api/reviews` - Trigger manual review - Webhook routes for GitHub and GitLab - `POST /webhooks/github` - GitHub PR events with HMAC-SHA256 signature validation - `POST /webhooks/gitlab` - GitLab MR events with token validation - Health and metrics endpoints - `GET /health` - Liveness check - `GET /health/ready` - Readiness check (database, Redis) - `GET /health/live` - Kubernetes liveness probe - `GET /metrics` - Prometheus metrics - LLM response cache with Redis backend - Cache key based on diff, agent, prompt version, and policy - Configurable TTL (default 24 hours) - Cache statistics tracking - Cost tracking models - ReviewCost for aggregate cost tracking - AgentCost for per-agent breakdown - CostEstimate for pre-review cost estimation - Docker configuration - Multi-stage Dockerfile for API and worker - docker-compose.yml with PostgreSQL, Redis, API, worker, and migrate services - Extended configuration settings - Database connection (URL, pool size, max overflow) - Redis connection (URL, max connections) - Webhook secrets (GitHub HMAC, GitLab token) - API settings (rate limits, CORS origins) - Worker settings (max jobs, timeout, retries) ### Changed - Configuration now uses `@lru_cache` for settings singleton - Version bumped to 0.3.0 ## [0.2.0] - 2025-03-15 ### Added - Diff parser with line mapping for unified diff format - Static analysis runners for ruff, mypy, bandit, and radon - Finding merger with deduplication and proximity-based grouping - Algorithmic conflict detection between agents - LLM synthesis for ambiguous semantic conflicts - Deliberation coordinator with merge, conflict detection, synthesis, and verdict determination - Verdict rules based on severity counts and configurable thresholds - CLI flags for static analysis (--static-analysis/--no-static-analysis) and work directory (--work-dir) - Test fixtures for conflict scenarios (security vs complexity, overlapping, contradictory) - Test suites for deliberation and static analysis (85% coverage) ### Changed - CLI review command now outputs deliberation results with verdict, findings, conflicts, and resolution log - Output formats (rich, JSON, markdown) updated to display verdict and deliberation steps ## [0.1.0] - 2025-03-09 ### Added - Initial project scaffold - Project plan with architecture documentation - Core data models: Finding, Policy, AgentConfig, ReviewResult, Severity, Verdict enums - LLM client abstraction with LiteLLM integration - Prompt template system with versioned templates and registry - Agent framework with base Agent class and ReviewContext - Security, Style, and Complexity review agents - CLI with `arbiter review` command supporting diff files and stdin - Output formats: rich (terminal), JSON, markdown - Policy configuration via YAML files - Model override via CLI flag - Test suite with 92% coverage ### Changed - Consolidated to single Python service (removed separate Node.js webhook handler) - Reduced MVP scope to 3 agents: Security, Style, Complexity - Added static analysis pre-pass (ruff, mypy, bandit, radon) - Added cost management section with token budgets and model selection - Added error handling strategy with circuit breaker - Added observability section with Prometheus metrics - Added security section for prompt injection mitigation - Changed deliberation to use algorithmic conflict detection first - Specified typed DeliberationStep structure with documented metadata keys - Updated conflict examples to show real Security vs Complexity trade-offs - Added prompt_version field to Finding model for traceability - Moved caching and cost tracking to Phase 3 (requires database) - Clarified false positive evaluation methodology - Documented prompts versioning strategy (deployment-time config)