initial project scaffold
This commit is contained in:
68
.gitignore
vendored
Normal file
68
.gitignore
vendored
Normal file
@@ -0,0 +1,68 @@
|
||||
# Python
|
||||
__pycache__/
|
||||
*.py[cod]
|
||||
*$py.class
|
||||
*.so
|
||||
.Python
|
||||
build/
|
||||
develop-eggs/
|
||||
dist/
|
||||
downloads/
|
||||
eggs/
|
||||
.eggs/
|
||||
lib/
|
||||
lib64/
|
||||
parts/
|
||||
sdist/
|
||||
var/
|
||||
wheels/
|
||||
*.egg-info/
|
||||
.installed.cfg
|
||||
*.egg
|
||||
.venv/
|
||||
venv/
|
||||
ENV/
|
||||
|
||||
# Node
|
||||
node_modules/
|
||||
npm-debug.log*
|
||||
yarn-debug.log*
|
||||
yarn-error.log*
|
||||
.npm
|
||||
|
||||
# Build outputs
|
||||
*.js.map
|
||||
.next/
|
||||
out/
|
||||
|
||||
# IDE
|
||||
.idea/
|
||||
.vscode/
|
||||
*.swp
|
||||
*.swo
|
||||
*~
|
||||
|
||||
# Environment
|
||||
.env
|
||||
.env.local
|
||||
.env.*.local
|
||||
|
||||
# Testing
|
||||
.coverage
|
||||
htmlcov/
|
||||
.pytest_cache/
|
||||
.mypy_cache/
|
||||
.ruff_cache/
|
||||
coverage/
|
||||
|
||||
# Database
|
||||
*.db
|
||||
*.sqlite3
|
||||
|
||||
# OS
|
||||
.DS_Store
|
||||
Thumbs.db
|
||||
|
||||
# Logs
|
||||
*.log
|
||||
logs/
|
||||
224
changelog.md
Normal file
224
changelog.md
Normal file
@@ -0,0 +1,224 @@
|
||||
# Changelog
|
||||
|
||||
All notable changes to Arbiter will be documented in this file.
|
||||
|
||||
The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.1.0/),
|
||||
and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).
|
||||
|
||||
## [Unreleased]
|
||||
|
||||
## [0.5.0] - 2025-06-16
|
||||
|
||||
### Added
|
||||
|
||||
- Conversational follow-up system for PR comment Q&A
|
||||
- Question detection in PR comments with confidence scoring
|
||||
- Agent routing based on keywords, finding references, and context
|
||||
- Agent explain() method for providing detailed follow-up explanations
|
||||
- Conversation storage in database (ConversationModel, ConversationMessageModel)
|
||||
- Webhook handling for GitHub issue_comment and GitLab note events
|
||||
- Worker task process_followup for async question processing
|
||||
- REST API endpoints for conversations
|
||||
- `GET /api/conversations` - List conversations with filtering
|
||||
- `GET /api/conversations/{id}` - Conversation detail with messages
|
||||
- `GET /api/conversations/review/{id}` - Conversation for specific review
|
||||
- Explain prompt templates for each agent (security, style, complexity)
|
||||
- Configuration settings for follow-up behaviour
|
||||
- followup_enabled - Enable/disable follow-up processing
|
||||
- followup_confidence_threshold - Minimum confidence to respond
|
||||
- followup_max_tokens_per_response - Token limit for responses
|
||||
- React dashboard for exploring reviews and monitoring metrics
|
||||
- Review list page with filtering (repository, status, verdict, author) and pagination
|
||||
- Review detail page with findings grouped by severity and expandable cards
|
||||
- Deliberation timeline showing step-by-step decision process
|
||||
- Metrics dashboard with charts for verdicts, severities, reviews over time
|
||||
- VerdictBadge and SeverityBadge components with color-coded indicators
|
||||
- TanStack Query for data fetching and caching
|
||||
- Tailwind CSS for styling with responsive layouts
|
||||
- Docker configuration with nginx for production builds
|
||||
- `GET /api/reviews/metrics` endpoint for aggregate statistics
|
||||
- Total/completed review counts
|
||||
- Average cost per review
|
||||
- Verdict and severity distribution
|
||||
- Reviews over last 30 days
|
||||
- Cost breakdown by agent
|
||||
- Idempotent comment updates for re-reviews
|
||||
- Comment model for representing PR/MR comments
|
||||
- `get_comments()` method on platform clients to list PR comments
|
||||
- `update_comment()` method to edit existing comments
|
||||
- Arbiter marker (`<!-- arbiter-review -->`) embedded in review comments
|
||||
- Re-reviews now update existing Arbiter comment instead of posting new ones
|
||||
- Graceful fallback: posts new comment if fetching comments fails
|
||||
- Documentation
|
||||
- Dashboard README with component architecture and usage
|
||||
- Deployment guide with Docker, production, and scaling guidance
|
||||
- API reference with endpoint documentation and webhook schemas
|
||||
- Environment variable reference (.env.example)
|
||||
- Troubleshooting section in main README
|
||||
|
||||
### Changed
|
||||
|
||||
- Increased test coverage from 74% to 86%+
|
||||
- LiteLLMClient initialization and complete method tests
|
||||
- Static analysis runner tests with mocked subprocess execution
|
||||
- API dependency injection tests (get_db, get_redis, close_redis)
|
||||
- Extended test fixtures for platform client mocking
|
||||
- Version bumped to 0.5.0
|
||||
|
||||
## [0.4.0] - 2025-04-13
|
||||
|
||||
### Added
|
||||
|
||||
- GitHub integration client for fetching diffs and posting review comments
|
||||
- Fetch PR diffs via GitHub API
|
||||
- Post review comments on pull requests
|
||||
- Update commit status checks (pending/success/failure/error)
|
||||
- Automatic retry with exponential backoff on transient errors
|
||||
- Rate limit monitoring with warnings when limits are low
|
||||
- GitLab integration client with equivalent functionality
|
||||
- Fetch MR diffs via GitLab API
|
||||
- Post notes on merge requests
|
||||
- Update commit pipeline status
|
||||
- URL-encoded project path handling
|
||||
- Review comment formatter for Markdown output
|
||||
- Verdict header with icon (approve/changes/comment)
|
||||
- Summary statistics table by severity
|
||||
- Findings grouped by severity level
|
||||
- Conflicts section for agent disagreements
|
||||
- Optional cost/tokens footer
|
||||
- Automatic truncation to stay within GitHub's 65535 char limit
|
||||
- Platform integration exceptions
|
||||
- AuthenticationError for 401/403 responses
|
||||
- RateLimitError for 429 responses with retry_after
|
||||
- NotFoundError for 404 responses
|
||||
- PlatformError for 5xx server errors
|
||||
- Extended configuration settings
|
||||
- github_token and gitlab_token for API authentication
|
||||
- github_base_url and gitlab_base_url for enterprise instances
|
||||
- integration_timeout and integration_max_retries
|
||||
- status_check_context for commit status naming
|
||||
- post_comments and update_status feature flags
|
||||
- Worker integration with platform clients
|
||||
- Automatic diff fetching when not provided
|
||||
- Commit status set to pending on job start
|
||||
- Review comment posted on completion
|
||||
- Status updated to success/failure based on verdict
|
||||
- Graceful degradation on platform API failures
|
||||
|
||||
### Changed
|
||||
|
||||
- Worker process_review now accepts platform parameter
|
||||
- Webhook routes pass platform identifier to queue
|
||||
- Version bumped to 0.4.0
|
||||
|
||||
### Dependencies
|
||||
|
||||
- Added httpx>=0.26.0 to main dependencies
|
||||
- Added tenacity>=8.2.0 for retry logic
|
||||
- Added respx>=0.21.0 to dev dependencies for HTTP mocking
|
||||
|
||||
## [0.3.0] - 2025-03-29
|
||||
|
||||
### Added
|
||||
|
||||
- FastAPI application with lifespan management, CORS, and exception handlers
|
||||
- PostgreSQL database schema with SQLAlchemy async ORM
|
||||
- ReviewModel for storing PR review metadata and results
|
||||
- FindingModel for individual agent findings
|
||||
- ConflictModel for detected conflicts between agents
|
||||
- DeliberationStepModel for audit trail
|
||||
- PolicyModel for organisation configurations
|
||||
- Alembic migrations with async PostgreSQL support
|
||||
- Redis-backed arq worker for async job processing
|
||||
- Review task with full pipeline execution
|
||||
- Job deduplication by repository/PR/commit
|
||||
- Priority queuing (draft PRs get lower priority)
|
||||
- REST API endpoints
|
||||
- `GET /api/reviews` - List reviews with pagination and filtering
|
||||
- `GET /api/reviews/{id}` - Review detail with findings
|
||||
- `GET /api/reviews/{id}/deliberation` - Deliberation log
|
||||
- `POST /api/reviews` - Trigger manual review
|
||||
- Webhook routes for GitHub and GitLab
|
||||
- `POST /webhooks/github` - GitHub PR events with HMAC-SHA256 signature validation
|
||||
- `POST /webhooks/gitlab` - GitLab MR events with token validation
|
||||
- Health and metrics endpoints
|
||||
- `GET /health` - Liveness check
|
||||
- `GET /health/ready` - Readiness check (database, Redis)
|
||||
- `GET /health/live` - Kubernetes liveness probe
|
||||
- `GET /metrics` - Prometheus metrics
|
||||
- LLM response cache with Redis backend
|
||||
- Cache key based on diff, agent, prompt version, and policy
|
||||
- Configurable TTL (default 24 hours)
|
||||
- Cache statistics tracking
|
||||
- Cost tracking models
|
||||
- ReviewCost for aggregate cost tracking
|
||||
- AgentCost for per-agent breakdown
|
||||
- CostEstimate for pre-review cost estimation
|
||||
- Docker configuration
|
||||
- Multi-stage Dockerfile for API and worker
|
||||
- docker-compose.yml with PostgreSQL, Redis, API, worker, and migrate services
|
||||
- Extended configuration settings
|
||||
- Database connection (URL, pool size, max overflow)
|
||||
- Redis connection (URL, max connections)
|
||||
- Webhook secrets (GitHub HMAC, GitLab token)
|
||||
- API settings (rate limits, CORS origins)
|
||||
- Worker settings (max jobs, timeout, retries)
|
||||
|
||||
### Changed
|
||||
|
||||
- Configuration now uses `@lru_cache` for settings singleton
|
||||
- Version bumped to 0.3.0
|
||||
|
||||
## [0.2.0] - 2025-03-15
|
||||
|
||||
### Added
|
||||
|
||||
- Diff parser with line mapping for unified diff format
|
||||
- Static analysis runners for ruff, mypy, bandit, and radon
|
||||
- Finding merger with deduplication and proximity-based grouping
|
||||
- Algorithmic conflict detection between agents
|
||||
- LLM synthesis for ambiguous semantic conflicts
|
||||
- Deliberation coordinator with merge, conflict detection, synthesis, and verdict determination
|
||||
- Verdict rules based on severity counts and configurable thresholds
|
||||
- CLI flags for static analysis (--static-analysis/--no-static-analysis) and work directory (--work-dir)
|
||||
- Test fixtures for conflict scenarios (security vs complexity, overlapping, contradictory)
|
||||
- Test suites for deliberation and static analysis (85% coverage)
|
||||
|
||||
### Changed
|
||||
|
||||
- CLI review command now outputs deliberation results with verdict, findings, conflicts, and resolution log
|
||||
- Output formats (rich, JSON, markdown) updated to display verdict and deliberation steps
|
||||
|
||||
## [0.1.0] - 2025-03-09
|
||||
|
||||
### Added
|
||||
|
||||
- Initial project scaffold
|
||||
- Project plan with architecture documentation
|
||||
- Core data models: Finding, Policy, AgentConfig, ReviewResult, Severity, Verdict enums
|
||||
- LLM client abstraction with LiteLLM integration
|
||||
- Prompt template system with versioned templates and registry
|
||||
- Agent framework with base Agent class and ReviewContext
|
||||
- Security, Style, and Complexity review agents
|
||||
- CLI with `arbiter review` command supporting diff files and stdin
|
||||
- Output formats: rich (terminal), JSON, markdown
|
||||
- Policy configuration via YAML files
|
||||
- Model override via CLI flag
|
||||
- Test suite with 92% coverage
|
||||
|
||||
### Changed
|
||||
|
||||
- Consolidated to single Python service (removed separate Node.js webhook handler)
|
||||
- Reduced MVP scope to 3 agents: Security, Style, Complexity
|
||||
- Added static analysis pre-pass (ruff, mypy, bandit, radon)
|
||||
- Added cost management section with token budgets and model selection
|
||||
- Added error handling strategy with circuit breaker
|
||||
- Added observability section with Prometheus metrics
|
||||
- Added security section for prompt injection mitigation
|
||||
- Changed deliberation to use algorithmic conflict detection first
|
||||
- Specified typed DeliberationStep structure with documented metadata keys
|
||||
- Updated conflict examples to show real Security vs Complexity trade-offs
|
||||
- Added prompt_version field to Finding model for traceability
|
||||
- Moved caching and cost tracking to Phase 3 (requires database)
|
||||
- Clarified false positive evaluation methodology
|
||||
- Documented prompts versioning strategy (deployment-time config)
|
||||
278
readme.md
Normal file
278
readme.md
Normal file
@@ -0,0 +1,278 @@
|
||||
# Arbiter
|
||||
|
||||
A multi-agent code review system that shows its work.
|
||||
|
||||
## What is this?
|
||||
|
||||
Arbiter is a code review tool where specialised AI agents independently analyse pull
|
||||
requests, then deliberate to produce unified feedback. Unlike black-box AI reviewers,
|
||||
Arbiter exposes the reasoning process — you see how agents disagree, weigh trade-offs,
|
||||
and reach consensus.
|
||||
|
||||
## Why?
|
||||
|
||||
Current AI code review tools give you a verdict but hide their reasoning. When they
|
||||
flag something, you can't tell if it's a security expert's concern or a style nitpick.
|
||||
Arbiter surfaces the editorial board's discussion.
|
||||
|
||||
## Features
|
||||
|
||||
- **Static analysis pre-pass** — ruff, mypy, bandit, radon run first
|
||||
- **Specialised agents** — Security, Style, Complexity (LLM-powered)
|
||||
- **Transparent deliberation** — See how agents reason and resolve conflicts
|
||||
- **Configurable policies** — Adapt to your team's standards
|
||||
- **Cost controls** — Token budgets, model selection, response caching
|
||||
- **GitHub/GitLab integration** — Webhook-driven, posts comments to PRs
|
||||
|
||||
## Architecture
|
||||
|
||||
```
|
||||
GitHub/GitLab
|
||||
│
|
||||
│ Webhook (PR opened/updated)
|
||||
▼
|
||||
┌─────────────────────────────────────────────┐
|
||||
│ FastAPI Application │
|
||||
│ │
|
||||
│ Webhook ──► Redis Queue ──► Worker │
|
||||
│ │ │
|
||||
│ ▼ │
|
||||
│ ┌───────────────────────────────────────┐ │
|
||||
│ │ Review Orchestrator │ │
|
||||
│ │ │ │
|
||||
│ │ 1. Static analysis (ruff, mypy...) │ │
|
||||
│ │ 2. Agents in parallel │ │
|
||||
│ │ 3. Deliberation │ │
|
||||
│ │ 4. Post results │ │
|
||||
│ │ │ │
|
||||
│ │ ┌──────────┐ ┌───────┐ ┌──────────┐ │ │
|
||||
│ │ │ Security │ │ Style │ │Complexity│ │ │
|
||||
│ │ └────┬─────┘ └───┬───┘ └────┬─────┘ │ │
|
||||
│ │ └───────────┼──────────┘ │ │
|
||||
│ │ ▼ │ │
|
||||
│ │ ┌─────────────┐ │ │
|
||||
│ │ │ Coordinator │ │ │
|
||||
│ │ └─────────────┘ │ │
|
||||
│ └───────────────────────────────────────┘ │
|
||||
└─────────────────────────────────────────────┘
|
||||
│
|
||||
├──► PR Comment
|
||||
├──► Database (history)
|
||||
└──► Metrics
|
||||
```
|
||||
|
||||
## Tech Stack
|
||||
|
||||
| Component | Technology |
|
||||
|-----------|------------|
|
||||
| Backend | Python 3.12, FastAPI |
|
||||
| Queue | Redis, arq |
|
||||
| Database | PostgreSQL |
|
||||
| LLM | LiteLLM (OpenAI, Anthropic, local) |
|
||||
| Static analysis | ruff, mypy, bandit, radon |
|
||||
|
||||
## Quick Start
|
||||
|
||||
```bash
|
||||
# Clone the repository
|
||||
git clone https://gitea.kschappell.com/kschappell/arbiter.git
|
||||
cd arbiter
|
||||
|
||||
# Start infrastructure
|
||||
docker compose up -d db redis
|
||||
|
||||
# Install dependencies
|
||||
pip install -e ".[dev]"
|
||||
|
||||
# Run migrations
|
||||
alembic upgrade head
|
||||
|
||||
# Start API server
|
||||
uvicorn src.arbiter.main:app --reload
|
||||
|
||||
# Start worker (separate terminal)
|
||||
arq src.arbiter.worker.tasks.WorkerSettings
|
||||
```
|
||||
|
||||
## CLI Usage
|
||||
|
||||
Review a local diff without running the full server:
|
||||
|
||||
```bash
|
||||
# Review a diff file
|
||||
arbiter review changes.diff --policy .arbiter/policy.yaml
|
||||
|
||||
# Review staged changes
|
||||
git diff --cached | arbiter review - --policy .arbiter/policy.yaml
|
||||
```
|
||||
|
||||
## Configuration
|
||||
|
||||
Create `.arbiter/policy.yaml` in your repository:
|
||||
|
||||
```yaml
|
||||
version: "1.0"
|
||||
|
||||
static_analysis:
|
||||
ruff:
|
||||
enabled: true
|
||||
mypy:
|
||||
enabled: true
|
||||
bandit:
|
||||
enabled: true
|
||||
severity_threshold: medium
|
||||
|
||||
agents:
|
||||
security:
|
||||
enabled: true
|
||||
model: "gpt-4o"
|
||||
severity_threshold: medium
|
||||
|
||||
style:
|
||||
enabled: true
|
||||
model: "gpt-4o-mini"
|
||||
config:
|
||||
naming_convention: snake_case
|
||||
|
||||
complexity:
|
||||
enabled: true
|
||||
model: "gpt-4o-mini"
|
||||
thresholds:
|
||||
max_cyclomatic: 10
|
||||
|
||||
deliberation:
|
||||
conflict_resolution: security_first
|
||||
minimum_confidence: 0.7
|
||||
|
||||
cost_controls:
|
||||
max_tokens_per_review: 50000
|
||||
max_cost_per_review_usd: 0.50
|
||||
cache_similar_diffs: true
|
||||
```
|
||||
|
||||
## Example Output
|
||||
|
||||
```markdown
|
||||
## Arbiter Review
|
||||
|
||||
**Verdict:** Request changes (confidence: 92%)
|
||||
|
||||
### Static Analysis
|
||||
- **bandit** B105: Possible hardcoded password (line 52)
|
||||
- **radon** CC: Function `process_data` has complexity 12 (threshold: 10)
|
||||
|
||||
### Agent Findings
|
||||
|
||||
🔒 **Security** (High)
|
||||
Line 47: Endpoint `/api/admin/export` has no authentication decorator.
|
||||
→ All admin endpoints should use `@require_admin` per project patterns.
|
||||
|
||||
📐 **Style** (Low)
|
||||
Line 23: Function name `getData` doesn't match snake_case convention.
|
||||
|
||||
### Deliberation
|
||||
|
||||
All agents agree authentication is missing. Static analysis confirms
|
||||
hardcoded password on line 52. Both issues require resolution.
|
||||
```
|
||||
|
||||
## Dashboard
|
||||
|
||||
Arbiter includes a React dashboard for exploring reviews and monitoring metrics:
|
||||
|
||||
- **Review List** — Browse all reviews with filtering by repository, status, verdict, and author
|
||||
- **Review Detail** — View findings grouped by severity with expandable cards
|
||||
- **Deliberation Explorer** — Step-by-step timeline of how agents reached their verdict
|
||||
- **Metrics** — Charts showing verdicts, severities, and review trends over time
|
||||
|
||||
Start the dashboard:
|
||||
|
||||
```bash
|
||||
cd dashboard
|
||||
npm install
|
||||
npm run dev
|
||||
```
|
||||
|
||||
Access at `http://localhost:5173`. Configure the API URL via `VITE_API_URL` environment variable.
|
||||
|
||||
## API Documentation
|
||||
|
||||
The API server provides interactive documentation:
|
||||
|
||||
- **Swagger UI** — `http://localhost:8000/docs`
|
||||
- **ReDoc** — `http://localhost:8000/redoc`
|
||||
- **OpenAPI Schema** — `http://localhost:8000/openapi.json`
|
||||
|
||||
For detailed endpoint documentation, see [docs/api.md](docs/api.md).
|
||||
|
||||
## Environment Variables
|
||||
|
||||
Quick reference of key environment variables (prefix with `ARBITER_`):
|
||||
|
||||
| Variable | Description | Default |
|
||||
|----------|-------------|---------|
|
||||
| `DATABASE_URL` | PostgreSQL connection URL | `postgresql+asyncpg://arbiter:arbiter@localhost:5432/arbiter` |
|
||||
| `REDIS_URL` | Redis connection URL | `redis://localhost:6379/0` |
|
||||
| `DEFAULT_MODEL` | LLM model for agents | `gpt-4o` |
|
||||
| `GITHUB_TOKEN` | GitHub API token | - |
|
||||
| `GITHUB_WEBHOOK_SECRET` | Webhook HMAC secret | - |
|
||||
| `GITLAB_TOKEN` | GitLab API token | - |
|
||||
| `GITLAB_WEBHOOK_TOKEN` | Webhook verification token | - |
|
||||
| `POST_COMMENTS` | Post review comments to PRs | `true` |
|
||||
| `UPDATE_STATUS` | Update commit status checks | `true` |
|
||||
|
||||
See [.env.example](.env.example) for the complete list.
|
||||
|
||||
## Deployment
|
||||
|
||||
For production deployment instructions, see [docs/deployment.md](docs/deployment.md).
|
||||
|
||||
## Troubleshooting
|
||||
|
||||
### Worker not processing jobs
|
||||
|
||||
Check that Redis is running and accessible:
|
||||
|
||||
```bash
|
||||
redis-cli ping # Should return PONG
|
||||
```
|
||||
|
||||
Verify the worker is connected:
|
||||
|
||||
```bash
|
||||
arq src.arbiter.worker.tasks.WorkerSettings --check
|
||||
```
|
||||
|
||||
### Webhook not receiving events
|
||||
|
||||
1. Verify the webhook URL is publicly accessible
|
||||
2. Check that webhook secrets match between GitHub/GitLab and your configuration
|
||||
3. Inspect webhook deliveries in GitHub/GitLab settings for error responses
|
||||
|
||||
### LLM timeouts
|
||||
|
||||
Increase timeout and reduce model complexity:
|
||||
|
||||
```bash
|
||||
export ARBITER_LLM_TIMEOUT=120
|
||||
export ARBITER_DEFAULT_MODEL=gpt-4o-mini
|
||||
```
|
||||
|
||||
### Database connection errors
|
||||
|
||||
Ensure PostgreSQL is running and the connection URL is correct:
|
||||
|
||||
```bash
|
||||
psql $DATABASE_URL -c "SELECT 1" # Test connection
|
||||
alembic upgrade head # Run pending migrations
|
||||
```
|
||||
|
||||
### Review not appearing in dashboard
|
||||
|
||||
1. Check that the API server is running
|
||||
2. Verify CORS settings include your dashboard URL
|
||||
3. Check browser console for API errors
|
||||
|
||||
## License
|
||||
|
||||
MIT
|
||||
Reference in New Issue
Block a user