kschappell/arbiter

Fork 0

Files

Kai Chappell 72268ff440

initial project scaffold

2025-03-08 15:18:11 +00:00

9.7 KiB

Raw Blame History

Changelog

All notable changes to Arbiter will be documented in this file.

The format is based on Keep a Changelog, and this project adheres to Semantic Versioning.

[Unreleased]

[0.5.0] - 2025-06-16

Added

Conversational follow-up system for PR comment Q&A
- Question detection in PR comments with confidence scoring
- Agent routing based on keywords, finding references, and context
- Agent explain() method for providing detailed follow-up explanations
- Conversation storage in database (ConversationModel, ConversationMessageModel)
- Webhook handling for GitHub issue_comment and GitLab note events
- Worker task process_followup for async question processing
- REST API endpoints for conversations
  - GET /api/conversations - List conversations with filtering
  - GET /api/conversations/{id} - Conversation detail with messages
  - GET /api/conversations/review/{id} - Conversation for specific review
- Explain prompt templates for each agent (security, style, complexity)
- Configuration settings for follow-up behaviour
  - followup_enabled - Enable/disable follow-up processing
  - followup_confidence_threshold - Minimum confidence to respond
  - followup_max_tokens_per_response - Token limit for responses
React dashboard for exploring reviews and monitoring metrics
- Review list page with filtering (repository, status, verdict, author) and pagination
- Review detail page with findings grouped by severity and expandable cards
- Deliberation timeline showing step-by-step decision process
- Metrics dashboard with charts for verdicts, severities, reviews over time
- VerdictBadge and SeverityBadge components with color-coded indicators
- TanStack Query for data fetching and caching
- Tailwind CSS for styling with responsive layouts
- Docker configuration with nginx for production builds
GET /api/reviews/metrics endpoint for aggregate statistics
- Total/completed review counts
- Average cost per review
- Verdict and severity distribution
- Reviews over last 30 days
- Cost breakdown by agent
Idempotent comment updates for re-reviews
- Comment model for representing PR/MR comments
- get_comments() method on platform clients to list PR comments
- update_comment() method to edit existing comments
- Arbiter marker () embedded in review comments
- Re-reviews now update existing Arbiter comment instead of posting new ones
- Graceful fallback: posts new comment if fetching comments fails
Documentation
- Dashboard README with component architecture and usage
- Deployment guide with Docker, production, and scaling guidance
- API reference with endpoint documentation and webhook schemas
- Environment variable reference (.env.example)
- Troubleshooting section in main README

Changed

Increased test coverage from 74% to 86%+
- LiteLLMClient initialization and complete method tests
- Static analysis runner tests with mocked subprocess execution
- API dependency injection tests (get_db, get_redis, close_redis)
- Extended test fixtures for platform client mocking
Version bumped to 0.5.0

[0.4.0] - 2025-04-13

Added

GitHub integration client for fetching diffs and posting review comments
- Fetch PR diffs via GitHub API
- Post review comments on pull requests
- Update commit status checks (pending/success/failure/error)
- Automatic retry with exponential backoff on transient errors
- Rate limit monitoring with warnings when limits are low
GitLab integration client with equivalent functionality
- Fetch MR diffs via GitLab API
- Post notes on merge requests
- Update commit pipeline status
- URL-encoded project path handling
Review comment formatter for Markdown output
- Verdict header with icon (approve/changes/comment)
- Summary statistics table by severity
- Findings grouped by severity level
- Conflicts section for agent disagreements
- Optional cost/tokens footer
- Automatic truncation to stay within GitHub's 65535 char limit
Platform integration exceptions
- AuthenticationError for 401/403 responses
- RateLimitError for 429 responses with retry_after
- NotFoundError for 404 responses
- PlatformError for 5xx server errors
Extended configuration settings
- github_token and gitlab_token for API authentication
- github_base_url and gitlab_base_url for enterprise instances
- integration_timeout and integration_max_retries
- status_check_context for commit status naming
- post_comments and update_status feature flags
Worker integration with platform clients
- Automatic diff fetching when not provided
- Commit status set to pending on job start
- Review comment posted on completion
- Status updated to success/failure based on verdict
- Graceful degradation on platform API failures

Changed

Worker process_review now accepts platform parameter
Webhook routes pass platform identifier to queue
Version bumped to 0.4.0

Dependencies

Added httpx>=0.26.0 to main dependencies
Added tenacity>=8.2.0 for retry logic
Added respx>=0.21.0 to dev dependencies for HTTP mocking

[0.3.0] - 2025-03-29

Added

FastAPI application with lifespan management, CORS, and exception handlers
PostgreSQL database schema with SQLAlchemy async ORM
- ReviewModel for storing PR review metadata and results
- FindingModel for individual agent findings
- ConflictModel for detected conflicts between agents
- DeliberationStepModel for audit trail
- PolicyModel for organisation configurations
Alembic migrations with async PostgreSQL support
Redis-backed arq worker for async job processing
- Review task with full pipeline execution
- Job deduplication by repository/PR/commit
- Priority queuing (draft PRs get lower priority)
REST API endpoints
- GET /api/reviews - List reviews with pagination and filtering
- GET /api/reviews/{id} - Review detail with findings
- GET /api/reviews/{id}/deliberation - Deliberation log
- POST /api/reviews - Trigger manual review
Webhook routes for GitHub and GitLab
- POST /webhooks/github - GitHub PR events with HMAC-SHA256 signature validation
- POST /webhooks/gitlab - GitLab MR events with token validation
Health and metrics endpoints
- GET /health - Liveness check
- GET /health/ready - Readiness check (database, Redis)
- GET /health/live - Kubernetes liveness probe
- GET /metrics - Prometheus metrics
LLM response cache with Redis backend
- Cache key based on diff, agent, prompt version, and policy
- Configurable TTL (default 24 hours)
- Cache statistics tracking
Cost tracking models
- ReviewCost for aggregate cost tracking
- AgentCost for per-agent breakdown
- CostEstimate for pre-review cost estimation
Docker configuration
- Multi-stage Dockerfile for API and worker
- docker-compose.yml with PostgreSQL, Redis, API, worker, and migrate services
Extended configuration settings
- Database connection (URL, pool size, max overflow)
- Redis connection (URL, max connections)
- Webhook secrets (GitHub HMAC, GitLab token)
- API settings (rate limits, CORS origins)
- Worker settings (max jobs, timeout, retries)

Changed

Configuration now uses @lru_cache for settings singleton
Version bumped to 0.3.0

[0.2.0] - 2025-03-15

Added

Diff parser with line mapping for unified diff format
Static analysis runners for ruff, mypy, bandit, and radon
Finding merger with deduplication and proximity-based grouping
Algorithmic conflict detection between agents
LLM synthesis for ambiguous semantic conflicts
Deliberation coordinator with merge, conflict detection, synthesis, and verdict determination
Verdict rules based on severity counts and configurable thresholds
CLI flags for static analysis (--static-analysis/--no-static-analysis) and work directory (--work-dir)
Test fixtures for conflict scenarios (security vs complexity, overlapping, contradictory)
Test suites for deliberation and static analysis (85% coverage)

Changed

CLI review command now outputs deliberation results with verdict, findings, conflicts, and resolution log
Output formats (rich, JSON, markdown) updated to display verdict and deliberation steps

[0.1.0] - 2025-03-09

Added

Initial project scaffold
Project plan with architecture documentation
Core data models: Finding, Policy, AgentConfig, ReviewResult, Severity, Verdict enums
LLM client abstraction with LiteLLM integration
Prompt template system with versioned templates and registry
Agent framework with base Agent class and ReviewContext
Security, Style, and Complexity review agents
CLI with arbiter review command supporting diff files and stdin
Output formats: rich (terminal), JSON, markdown
Policy configuration via YAML files
Model override via CLI flag
Test suite with 92% coverage

Changed

Consolidated to single Python service (removed separate Node.js webhook handler)
Reduced MVP scope to 3 agents: Security, Style, Complexity
Added static analysis pre-pass (ruff, mypy, bandit, radon)
Added cost management section with token budgets and model selection
Added error handling strategy with circuit breaker
Added observability section with Prometheus metrics
Added security section for prompt injection mitigation
Changed deliberation to use algorithmic conflict detection first
Specified typed DeliberationStep structure with documented metadata keys
Updated conflict examples to show real Security vs Complexity trade-offs
Added prompt_version field to Finding model for traceability
Moved caching and cost tracking to Phase 3 (requires database)
Clarified false positive evaluation methodology
Documented prompts versioning strategy (deployment-time config)

9.7 KiB Raw Blame History

Changelog

[Unreleased]

[0.5.0] - 2025-06-16

Added

Changed

[0.4.0] - 2025-04-13

Added

Changed

Dependencies

[0.3.0] - 2025-03-29

Added

Changed

[0.2.0] - 2025-03-15

Added

Changed

[0.1.0] - 2025-03-09

Added

Changed

9.7 KiB

Raw Blame History