Files
arbiter/changelog.md

9.7 KiB

Changelog

All notable changes to Arbiter will be documented in this file.

The format is based on Keep a Changelog, and this project adheres to Semantic Versioning.

[Unreleased]

[0.5.0] - 2025-06-16

Added

  • Conversational follow-up system for PR comment Q&A
    • Question detection in PR comments with confidence scoring
    • Agent routing based on keywords, finding references, and context
    • Agent explain() method for providing detailed follow-up explanations
    • Conversation storage in database (ConversationModel, ConversationMessageModel)
    • Webhook handling for GitHub issue_comment and GitLab note events
    • Worker task process_followup for async question processing
    • REST API endpoints for conversations
      • GET /api/conversations - List conversations with filtering
      • GET /api/conversations/{id} - Conversation detail with messages
      • GET /api/conversations/review/{id} - Conversation for specific review
    • Explain prompt templates for each agent (security, style, complexity)
    • Configuration settings for follow-up behaviour
      • followup_enabled - Enable/disable follow-up processing
      • followup_confidence_threshold - Minimum confidence to respond
      • followup_max_tokens_per_response - Token limit for responses
  • React dashboard for exploring reviews and monitoring metrics
    • Review list page with filtering (repository, status, verdict, author) and pagination
    • Review detail page with findings grouped by severity and expandable cards
    • Deliberation timeline showing step-by-step decision process
    • Metrics dashboard with charts for verdicts, severities, reviews over time
    • VerdictBadge and SeverityBadge components with color-coded indicators
    • TanStack Query for data fetching and caching
    • Tailwind CSS for styling with responsive layouts
    • Docker configuration with nginx for production builds
  • GET /api/reviews/metrics endpoint for aggregate statistics
    • Total/completed review counts
    • Average cost per review
    • Verdict and severity distribution
    • Reviews over last 30 days
    • Cost breakdown by agent
  • Idempotent comment updates for re-reviews
    • Comment model for representing PR/MR comments
    • get_comments() method on platform clients to list PR comments
    • update_comment() method to edit existing comments
    • Arbiter marker (<!-- arbiter-review -->) embedded in review comments
    • Re-reviews now update existing Arbiter comment instead of posting new ones
    • Graceful fallback: posts new comment if fetching comments fails
  • Documentation
    • Dashboard README with component architecture and usage
    • Deployment guide with Docker, production, and scaling guidance
    • API reference with endpoint documentation and webhook schemas
    • Environment variable reference (.env.example)
    • Troubleshooting section in main README

Changed

  • Increased test coverage from 74% to 86%+
    • LiteLLMClient initialization and complete method tests
    • Static analysis runner tests with mocked subprocess execution
    • API dependency injection tests (get_db, get_redis, close_redis)
    • Extended test fixtures for platform client mocking
  • Version bumped to 0.5.0

[0.4.0] - 2025-04-13

Added

  • GitHub integration client for fetching diffs and posting review comments
    • Fetch PR diffs via GitHub API
    • Post review comments on pull requests
    • Update commit status checks (pending/success/failure/error)
    • Automatic retry with exponential backoff on transient errors
    • Rate limit monitoring with warnings when limits are low
  • GitLab integration client with equivalent functionality
    • Fetch MR diffs via GitLab API
    • Post notes on merge requests
    • Update commit pipeline status
    • URL-encoded project path handling
  • Review comment formatter for Markdown output
    • Verdict header with icon (approve/changes/comment)
    • Summary statistics table by severity
    • Findings grouped by severity level
    • Conflicts section for agent disagreements
    • Optional cost/tokens footer
    • Automatic truncation to stay within GitHub's 65535 char limit
  • Platform integration exceptions
    • AuthenticationError for 401/403 responses
    • RateLimitError for 429 responses with retry_after
    • NotFoundError for 404 responses
    • PlatformError for 5xx server errors
  • Extended configuration settings
    • github_token and gitlab_token for API authentication
    • github_base_url and gitlab_base_url for enterprise instances
    • integration_timeout and integration_max_retries
    • status_check_context for commit status naming
    • post_comments and update_status feature flags
  • Worker integration with platform clients
    • Automatic diff fetching when not provided
    • Commit status set to pending on job start
    • Review comment posted on completion
    • Status updated to success/failure based on verdict
    • Graceful degradation on platform API failures

Changed

  • Worker process_review now accepts platform parameter
  • Webhook routes pass platform identifier to queue
  • Version bumped to 0.4.0

Dependencies

  • Added httpx>=0.26.0 to main dependencies
  • Added tenacity>=8.2.0 for retry logic
  • Added respx>=0.21.0 to dev dependencies for HTTP mocking

[0.3.0] - 2025-03-29

Added

  • FastAPI application with lifespan management, CORS, and exception handlers
  • PostgreSQL database schema with SQLAlchemy async ORM
    • ReviewModel for storing PR review metadata and results
    • FindingModel for individual agent findings
    • ConflictModel for detected conflicts between agents
    • DeliberationStepModel for audit trail
    • PolicyModel for organisation configurations
  • Alembic migrations with async PostgreSQL support
  • Redis-backed arq worker for async job processing
    • Review task with full pipeline execution
    • Job deduplication by repository/PR/commit
    • Priority queuing (draft PRs get lower priority)
  • REST API endpoints
    • GET /api/reviews - List reviews with pagination and filtering
    • GET /api/reviews/{id} - Review detail with findings
    • GET /api/reviews/{id}/deliberation - Deliberation log
    • POST /api/reviews - Trigger manual review
  • Webhook routes for GitHub and GitLab
    • POST /webhooks/github - GitHub PR events with HMAC-SHA256 signature validation
    • POST /webhooks/gitlab - GitLab MR events with token validation
  • Health and metrics endpoints
    • GET /health - Liveness check
    • GET /health/ready - Readiness check (database, Redis)
    • GET /health/live - Kubernetes liveness probe
    • GET /metrics - Prometheus metrics
  • LLM response cache with Redis backend
    • Cache key based on diff, agent, prompt version, and policy
    • Configurable TTL (default 24 hours)
    • Cache statistics tracking
  • Cost tracking models
    • ReviewCost for aggregate cost tracking
    • AgentCost for per-agent breakdown
    • CostEstimate for pre-review cost estimation
  • Docker configuration
    • Multi-stage Dockerfile for API and worker
    • docker-compose.yml with PostgreSQL, Redis, API, worker, and migrate services
  • Extended configuration settings
    • Database connection (URL, pool size, max overflow)
    • Redis connection (URL, max connections)
    • Webhook secrets (GitHub HMAC, GitLab token)
    • API settings (rate limits, CORS origins)
    • Worker settings (max jobs, timeout, retries)

Changed

  • Configuration now uses @lru_cache for settings singleton
  • Version bumped to 0.3.0

[0.2.0] - 2025-03-15

Added

  • Diff parser with line mapping for unified diff format
  • Static analysis runners for ruff, mypy, bandit, and radon
  • Finding merger with deduplication and proximity-based grouping
  • Algorithmic conflict detection between agents
  • LLM synthesis for ambiguous semantic conflicts
  • Deliberation coordinator with merge, conflict detection, synthesis, and verdict determination
  • Verdict rules based on severity counts and configurable thresholds
  • CLI flags for static analysis (--static-analysis/--no-static-analysis) and work directory (--work-dir)
  • Test fixtures for conflict scenarios (security vs complexity, overlapping, contradictory)
  • Test suites for deliberation and static analysis (85% coverage)

Changed

  • CLI review command now outputs deliberation results with verdict, findings, conflicts, and resolution log
  • Output formats (rich, JSON, markdown) updated to display verdict and deliberation steps

[0.1.0] - 2025-03-09

Added

  • Initial project scaffold
  • Project plan with architecture documentation
  • Core data models: Finding, Policy, AgentConfig, ReviewResult, Severity, Verdict enums
  • LLM client abstraction with LiteLLM integration
  • Prompt template system with versioned templates and registry
  • Agent framework with base Agent class and ReviewContext
  • Security, Style, and Complexity review agents
  • CLI with arbiter review command supporting diff files and stdin
  • Output formats: rich (terminal), JSON, markdown
  • Policy configuration via YAML files
  • Model override via CLI flag
  • Test suite with 92% coverage

Changed

  • Consolidated to single Python service (removed separate Node.js webhook handler)
  • Reduced MVP scope to 3 agents: Security, Style, Complexity
  • Added static analysis pre-pass (ruff, mypy, bandit, radon)
  • Added cost management section with token budgets and model selection
  • Added error handling strategy with circuit breaker
  • Added observability section with Prometheus metrics
  • Added security section for prompt injection mitigation
  • Changed deliberation to use algorithmic conflict detection first
  • Specified typed DeliberationStep structure with documented metadata keys
  • Updated conflict examples to show real Security vs Complexity trade-offs
  • Added prompt_version field to Finding model for traceability
  • Moved caching and cost tracking to Phase 3 (requires database)
  • Clarified false positive evaluation methodology
  • Documented prompts versioning strategy (deployment-time config)