225 lines
9.7 KiB
Markdown
225 lines
9.7 KiB
Markdown
# Changelog
|
|
|
|
All notable changes to Arbiter will be documented in this file.
|
|
|
|
The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.1.0/),
|
|
and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).
|
|
|
|
## [Unreleased]
|
|
|
|
## [0.5.0] - 2025-06-16
|
|
|
|
### Added
|
|
|
|
- Conversational follow-up system for PR comment Q&A
|
|
- Question detection in PR comments with confidence scoring
|
|
- Agent routing based on keywords, finding references, and context
|
|
- Agent explain() method for providing detailed follow-up explanations
|
|
- Conversation storage in database (ConversationModel, ConversationMessageModel)
|
|
- Webhook handling for GitHub issue_comment and GitLab note events
|
|
- Worker task process_followup for async question processing
|
|
- REST API endpoints for conversations
|
|
- `GET /api/conversations` - List conversations with filtering
|
|
- `GET /api/conversations/{id}` - Conversation detail with messages
|
|
- `GET /api/conversations/review/{id}` - Conversation for specific review
|
|
- Explain prompt templates for each agent (security, style, complexity)
|
|
- Configuration settings for follow-up behaviour
|
|
- followup_enabled - Enable/disable follow-up processing
|
|
- followup_confidence_threshold - Minimum confidence to respond
|
|
- followup_max_tokens_per_response - Token limit for responses
|
|
- React dashboard for exploring reviews and monitoring metrics
|
|
- Review list page with filtering (repository, status, verdict, author) and pagination
|
|
- Review detail page with findings grouped by severity and expandable cards
|
|
- Deliberation timeline showing step-by-step decision process
|
|
- Metrics dashboard with charts for verdicts, severities, reviews over time
|
|
- VerdictBadge and SeverityBadge components with color-coded indicators
|
|
- TanStack Query for data fetching and caching
|
|
- Tailwind CSS for styling with responsive layouts
|
|
- Docker configuration with nginx for production builds
|
|
- `GET /api/reviews/metrics` endpoint for aggregate statistics
|
|
- Total/completed review counts
|
|
- Average cost per review
|
|
- Verdict and severity distribution
|
|
- Reviews over last 30 days
|
|
- Cost breakdown by agent
|
|
- Idempotent comment updates for re-reviews
|
|
- Comment model for representing PR/MR comments
|
|
- `get_comments()` method on platform clients to list PR comments
|
|
- `update_comment()` method to edit existing comments
|
|
- Arbiter marker (`<!-- arbiter-review -->`) embedded in review comments
|
|
- Re-reviews now update existing Arbiter comment instead of posting new ones
|
|
- Graceful fallback: posts new comment if fetching comments fails
|
|
- Documentation
|
|
- Dashboard README with component architecture and usage
|
|
- Deployment guide with Docker, production, and scaling guidance
|
|
- API reference with endpoint documentation and webhook schemas
|
|
- Environment variable reference (.env.example)
|
|
- Troubleshooting section in main README
|
|
|
|
### Changed
|
|
|
|
- Increased test coverage from 74% to 86%+
|
|
- LiteLLMClient initialization and complete method tests
|
|
- Static analysis runner tests with mocked subprocess execution
|
|
- API dependency injection tests (get_db, get_redis, close_redis)
|
|
- Extended test fixtures for platform client mocking
|
|
- Version bumped to 0.5.0
|
|
|
|
## [0.4.0] - 2025-04-13
|
|
|
|
### Added
|
|
|
|
- GitHub integration client for fetching diffs and posting review comments
|
|
- Fetch PR diffs via GitHub API
|
|
- Post review comments on pull requests
|
|
- Update commit status checks (pending/success/failure/error)
|
|
- Automatic retry with exponential backoff on transient errors
|
|
- Rate limit monitoring with warnings when limits are low
|
|
- GitLab integration client with equivalent functionality
|
|
- Fetch MR diffs via GitLab API
|
|
- Post notes on merge requests
|
|
- Update commit pipeline status
|
|
- URL-encoded project path handling
|
|
- Review comment formatter for Markdown output
|
|
- Verdict header with icon (approve/changes/comment)
|
|
- Summary statistics table by severity
|
|
- Findings grouped by severity level
|
|
- Conflicts section for agent disagreements
|
|
- Optional cost/tokens footer
|
|
- Automatic truncation to stay within GitHub's 65535 char limit
|
|
- Platform integration exceptions
|
|
- AuthenticationError for 401/403 responses
|
|
- RateLimitError for 429 responses with retry_after
|
|
- NotFoundError for 404 responses
|
|
- PlatformError for 5xx server errors
|
|
- Extended configuration settings
|
|
- github_token and gitlab_token for API authentication
|
|
- github_base_url and gitlab_base_url for enterprise instances
|
|
- integration_timeout and integration_max_retries
|
|
- status_check_context for commit status naming
|
|
- post_comments and update_status feature flags
|
|
- Worker integration with platform clients
|
|
- Automatic diff fetching when not provided
|
|
- Commit status set to pending on job start
|
|
- Review comment posted on completion
|
|
- Status updated to success/failure based on verdict
|
|
- Graceful degradation on platform API failures
|
|
|
|
### Changed
|
|
|
|
- Worker process_review now accepts platform parameter
|
|
- Webhook routes pass platform identifier to queue
|
|
- Version bumped to 0.4.0
|
|
|
|
### Dependencies
|
|
|
|
- Added httpx>=0.26.0 to main dependencies
|
|
- Added tenacity>=8.2.0 for retry logic
|
|
- Added respx>=0.21.0 to dev dependencies for HTTP mocking
|
|
|
|
## [0.3.0] - 2025-03-29
|
|
|
|
### Added
|
|
|
|
- FastAPI application with lifespan management, CORS, and exception handlers
|
|
- PostgreSQL database schema with SQLAlchemy async ORM
|
|
- ReviewModel for storing PR review metadata and results
|
|
- FindingModel for individual agent findings
|
|
- ConflictModel for detected conflicts between agents
|
|
- DeliberationStepModel for audit trail
|
|
- PolicyModel for organisation configurations
|
|
- Alembic migrations with async PostgreSQL support
|
|
- Redis-backed arq worker for async job processing
|
|
- Review task with full pipeline execution
|
|
- Job deduplication by repository/PR/commit
|
|
- Priority queuing (draft PRs get lower priority)
|
|
- REST API endpoints
|
|
- `GET /api/reviews` - List reviews with pagination and filtering
|
|
- `GET /api/reviews/{id}` - Review detail with findings
|
|
- `GET /api/reviews/{id}/deliberation` - Deliberation log
|
|
- `POST /api/reviews` - Trigger manual review
|
|
- Webhook routes for GitHub and GitLab
|
|
- `POST /webhooks/github` - GitHub PR events with HMAC-SHA256 signature validation
|
|
- `POST /webhooks/gitlab` - GitLab MR events with token validation
|
|
- Health and metrics endpoints
|
|
- `GET /health` - Liveness check
|
|
- `GET /health/ready` - Readiness check (database, Redis)
|
|
- `GET /health/live` - Kubernetes liveness probe
|
|
- `GET /metrics` - Prometheus metrics
|
|
- LLM response cache with Redis backend
|
|
- Cache key based on diff, agent, prompt version, and policy
|
|
- Configurable TTL (default 24 hours)
|
|
- Cache statistics tracking
|
|
- Cost tracking models
|
|
- ReviewCost for aggregate cost tracking
|
|
- AgentCost for per-agent breakdown
|
|
- CostEstimate for pre-review cost estimation
|
|
- Docker configuration
|
|
- Multi-stage Dockerfile for API and worker
|
|
- docker-compose.yml with PostgreSQL, Redis, API, worker, and migrate services
|
|
- Extended configuration settings
|
|
- Database connection (URL, pool size, max overflow)
|
|
- Redis connection (URL, max connections)
|
|
- Webhook secrets (GitHub HMAC, GitLab token)
|
|
- API settings (rate limits, CORS origins)
|
|
- Worker settings (max jobs, timeout, retries)
|
|
|
|
### Changed
|
|
|
|
- Configuration now uses `@lru_cache` for settings singleton
|
|
- Version bumped to 0.3.0
|
|
|
|
## [0.2.0] - 2025-03-15
|
|
|
|
### Added
|
|
|
|
- Diff parser with line mapping for unified diff format
|
|
- Static analysis runners for ruff, mypy, bandit, and radon
|
|
- Finding merger with deduplication and proximity-based grouping
|
|
- Algorithmic conflict detection between agents
|
|
- LLM synthesis for ambiguous semantic conflicts
|
|
- Deliberation coordinator with merge, conflict detection, synthesis, and verdict determination
|
|
- Verdict rules based on severity counts and configurable thresholds
|
|
- CLI flags for static analysis (--static-analysis/--no-static-analysis) and work directory (--work-dir)
|
|
- Test fixtures for conflict scenarios (security vs complexity, overlapping, contradictory)
|
|
- Test suites for deliberation and static analysis (85% coverage)
|
|
|
|
### Changed
|
|
|
|
- CLI review command now outputs deliberation results with verdict, findings, conflicts, and resolution log
|
|
- Output formats (rich, JSON, markdown) updated to display verdict and deliberation steps
|
|
|
|
## [0.1.0] - 2025-03-09
|
|
|
|
### Added
|
|
|
|
- Initial project scaffold
|
|
- Project plan with architecture documentation
|
|
- Core data models: Finding, Policy, AgentConfig, ReviewResult, Severity, Verdict enums
|
|
- LLM client abstraction with LiteLLM integration
|
|
- Prompt template system with versioned templates and registry
|
|
- Agent framework with base Agent class and ReviewContext
|
|
- Security, Style, and Complexity review agents
|
|
- CLI with `arbiter review` command supporting diff files and stdin
|
|
- Output formats: rich (terminal), JSON, markdown
|
|
- Policy configuration via YAML files
|
|
- Model override via CLI flag
|
|
- Test suite with 92% coverage
|
|
|
|
### Changed
|
|
|
|
- Consolidated to single Python service (removed separate Node.js webhook handler)
|
|
- Reduced MVP scope to 3 agents: Security, Style, Complexity
|
|
- Added static analysis pre-pass (ruff, mypy, bandit, radon)
|
|
- Added cost management section with token budgets and model selection
|
|
- Added error handling strategy with circuit breaker
|
|
- Added observability section with Prometheus metrics
|
|
- Added security section for prompt injection mitigation
|
|
- Changed deliberation to use algorithmic conflict detection first
|
|
- Specified typed DeliberationStep structure with documented metadata keys
|
|
- Updated conflict examples to show real Security vs Complexity trade-offs
|
|
- Added prompt_version field to Finding model for traceability
|
|
- Moved caching and cost tracking to Phase 3 (requires database)
|
|
- Clarified false positive evaluation methodology
|
|
- Documented prompts versioning strategy (deployment-time config)
|