arbiter/changelog.md

# Changelog

All notable changes to Arbiter will be documented in this file.

The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.1.0/),
and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).

## [Unreleased]

## [0.5.0] - 2025-06-16

### Added

- Conversational follow-up system for PR comment Q&A
  - Question detection in PR comments with confidence scoring
  - Agent routing based on keywords, finding references, and context
  - Agent explain() method for providing detailed follow-up explanations
  - Conversation storage in database (ConversationModel, ConversationMessageModel)
  - Webhook handling for GitHub issue_comment and GitLab note events
  - Worker task process_followup for async question processing
  - REST API endpoints for conversations
    - `GET /api/conversations` - List conversations with filtering
    - `GET /api/conversations/{id}` - Conversation detail with messages
    - `GET /api/conversations/review/{id}` - Conversation for specific review
  - Explain prompt templates for each agent (security, style, complexity)
  - Configuration settings for follow-up behaviour
    - followup_enabled - Enable/disable follow-up processing
    - followup_confidence_threshold - Minimum confidence to respond
    - followup_max_tokens_per_response - Token limit for responses
- React dashboard for exploring reviews and monitoring metrics
  - Review list page with filtering (repository, status, verdict, author) and pagination
  - Review detail page with findings grouped by severity and expandable cards
  - Deliberation timeline showing step-by-step decision process
  - Metrics dashboard with charts for verdicts, severities, reviews over time
  - VerdictBadge and SeverityBadge components with color-coded indicators
  - TanStack Query for data fetching and caching
  - Tailwind CSS for styling with responsive layouts
  - Docker configuration with nginx for production builds
- `GET /api/reviews/metrics` endpoint for aggregate statistics
  - Total/completed review counts
  - Average cost per review
  - Verdict and severity distribution
  - Reviews over last 30 days
  - Cost breakdown by agent
- Idempotent comment updates for re-reviews
  - Comment model for representing PR/MR comments
  - `get_comments()` method on platform clients to list PR comments
  - `update_comment()` method to edit existing comments
  - Arbiter marker (`<!-- arbiter-review -->`) embedded in review comments
  - Re-reviews now update existing Arbiter comment instead of posting new ones
  - Graceful fallback: posts new comment if fetching comments fails
- Documentation
  - Dashboard README with component architecture and usage
  - Deployment guide with Docker, production, and scaling guidance
  - API reference with endpoint documentation and webhook schemas
  - Environment variable reference (.env.example)
  - Troubleshooting section in main README

### Changed

- Increased test coverage from 74% to 86%+
  - LiteLLMClient initialization and complete method tests
  - Static analysis runner tests with mocked subprocess execution
  - API dependency injection tests (get_db, get_redis, close_redis)
  - Extended test fixtures for platform client mocking
- Version bumped to 0.5.0

## [0.4.0] - 2025-04-13

### Added

- GitHub integration client for fetching diffs and posting review comments
  - Fetch PR diffs via GitHub API
  - Post review comments on pull requests
  - Update commit status checks (pending/success/failure/error)
  - Automatic retry with exponential backoff on transient errors
  - Rate limit monitoring with warnings when limits are low
- GitLab integration client with equivalent functionality
  - Fetch MR diffs via GitLab API
  - Post notes on merge requests
  - Update commit pipeline status
  - URL-encoded project path handling
- Review comment formatter for Markdown output
  - Verdict header with icon (approve/changes/comment)
  - Summary statistics table by severity
  - Findings grouped by severity level
  - Conflicts section for agent disagreements
  - Optional cost/tokens footer
  - Automatic truncation to stay within GitHub's 65535 char limit
- Platform integration exceptions
  - AuthenticationError for 401/403 responses
  - RateLimitError for 429 responses with retry_after
  - NotFoundError for 404 responses
  - PlatformError for 5xx server errors
- Extended configuration settings
  - github_token and gitlab_token for API authentication
  - github_base_url and gitlab_base_url for enterprise instances
  - integration_timeout and integration_max_retries
  - status_check_context for commit status naming
  - post_comments and update_status feature flags
- Worker integration with platform clients
  - Automatic diff fetching when not provided
  - Commit status set to pending on job start
  - Review comment posted on completion
  - Status updated to success/failure based on verdict
  - Graceful degradation on platform API failures

### Changed

- Worker process_review now accepts platform parameter
- Webhook routes pass platform identifier to queue
- Version bumped to 0.4.0

### Dependencies

- Added httpx>=0.26.0 to main dependencies
- Added tenacity>=8.2.0 for retry logic
- Added respx>=0.21.0 to dev dependencies for HTTP mocking

## [0.3.0] - 2025-03-29

### Added

- FastAPI application with lifespan management, CORS, and exception handlers
- PostgreSQL database schema with SQLAlchemy async ORM
  - ReviewModel for storing PR review metadata and results
  - FindingModel for individual agent findings
  - ConflictModel for detected conflicts between agents
  - DeliberationStepModel for audit trail
  - PolicyModel for organisation configurations
- Alembic migrations with async PostgreSQL support
- Redis-backed arq worker for async job processing
  - Review task with full pipeline execution
  - Job deduplication by repository/PR/commit
  - Priority queuing (draft PRs get lower priority)
- REST API endpoints
  - `GET /api/reviews` - List reviews with pagination and filtering
  - `GET /api/reviews/{id}` - Review detail with findings
  - `GET /api/reviews/{id}/deliberation` - Deliberation log
  - `POST /api/reviews` - Trigger manual review
- Webhook routes for GitHub and GitLab
  - `POST /webhooks/github` - GitHub PR events with HMAC-SHA256 signature validation
  - `POST /webhooks/gitlab` - GitLab MR events with token validation
- Health and metrics endpoints
  - `GET /health` - Liveness check
  - `GET /health/ready` - Readiness check (database, Redis)
  - `GET /health/live` - Kubernetes liveness probe
  - `GET /metrics` - Prometheus metrics
- LLM response cache with Redis backend
  - Cache key based on diff, agent, prompt version, and policy
  - Configurable TTL (default 24 hours)
  - Cache statistics tracking
- Cost tracking models
  - ReviewCost for aggregate cost tracking
  - AgentCost for per-agent breakdown
  - CostEstimate for pre-review cost estimation
- Docker configuration
  - Multi-stage Dockerfile for API and worker
  - docker-compose.yml with PostgreSQL, Redis, API, worker, and migrate services
- Extended configuration settings
  - Database connection (URL, pool size, max overflow)
  - Redis connection (URL, max connections)
  - Webhook secrets (GitHub HMAC, GitLab token)
  - API settings (rate limits, CORS origins)
  - Worker settings (max jobs, timeout, retries)

### Changed

- Configuration now uses `@lru_cache` for settings singleton
- Version bumped to 0.3.0

## [0.2.0] - 2025-03-15

### Added

- Diff parser with line mapping for unified diff format
- Static analysis runners for ruff, mypy, bandit, and radon
- Finding merger with deduplication and proximity-based grouping
- Algorithmic conflict detection between agents
- LLM synthesis for ambiguous semantic conflicts
- Deliberation coordinator with merge, conflict detection, synthesis, and verdict determination
- Verdict rules based on severity counts and configurable thresholds
- CLI flags for static analysis (--static-analysis/--no-static-analysis) and work directory (--work-dir)
- Test fixtures for conflict scenarios (security vs complexity, overlapping, contradictory)
- Test suites for deliberation and static analysis (85% coverage)

### Changed

- CLI review command now outputs deliberation results with verdict, findings, conflicts, and resolution log
- Output formats (rich, JSON, markdown) updated to display verdict and deliberation steps

## [0.1.0] - 2025-03-09

### Added

- Initial project scaffold
- Project plan with architecture documentation
- Core data models: Finding, Policy, AgentConfig, ReviewResult, Severity, Verdict enums
- LLM client abstraction with LiteLLM integration
- Prompt template system with versioned templates and registry
- Agent framework with base Agent class and ReviewContext
- Security, Style, and Complexity review agents
- CLI with `arbiter review` command supporting diff files and stdin
- Output formats: rich (terminal), JSON, markdown
- Policy configuration via YAML files
- Model override via CLI flag
- Test suite with 92% coverage

### Changed

- Consolidated to single Python service (removed separate Node.js webhook handler)
- Reduced MVP scope to 3 agents: Security, Style, Complexity
- Added static analysis pre-pass (ruff, mypy, bandit, radon)
- Added cost management section with token budgets and model selection
- Added error handling strategy with circuit breaker
- Added observability section with Prometheus metrics
- Added security section for prompt injection mitigation
- Changed deliberation to use algorithmic conflict detection first
- Specified typed DeliberationStep structure with documented metadata keys
- Updated conflict examples to show real Security vs Complexity trade-offs
- Added prompt_version field to Finding model for traceability
- Moved caching and cost tracking to Phase 3 (requires database)
- Clarified false positive evaluation methodology
- Documented prompts versioning strategy (deployment-time config)