initial project scaffold

2025-03-08 15:18:11 +00:00
commit 72268ff440
3 changed files with 570 additions and 0 deletions
@@ -0,0 +1,68 @@
+# Python
+__pycache__/
+*.py[cod]
+*$py.class
+*.so
+.Python
+build/
+develop-eggs/
+dist/
+downloads/
+eggs/
+.eggs/
+lib/
+lib64/
+parts/
+sdist/
+var/
+wheels/
+*.egg-info/
+.installed.cfg
+*.egg
+.venv/
+venv/
+ENV/
+
+# Node
+node_modules/
+npm-debug.log*
+yarn-debug.log*
+yarn-error.log*
+.npm
+
+# Build outputs
+*.js.map
+.next/
+out/
+
+# IDE
+.idea/
+.vscode/
+*.swp
+*.swo
+*~
+
+# Environment
+.env
+.env.local
+.env.*.local
+
+# Testing
+.coverage
+htmlcov/
+.pytest_cache/
+.mypy_cache/
+.ruff_cache/
+coverage/
+
+# Database
+*.db
+*.sqlite3
+
+# OS
+.DS_Store
+Thumbs.db
+
+# Logs
+*.log
+logs/
@@ -0,0 +1,224 @@
+# Changelog
+
+All notable changes to Arbiter will be documented in this file.
+
+The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.1.0/),
+and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).
+
+## [Unreleased]
+
+## [0.5.0] - 2025-06-16
+
+### Added
+
+- Conversational follow-up system for PR comment Q&A
+  - Question detection in PR comments with confidence scoring
+  - Agent routing based on keywords, finding references, and context
+  - Agent explain() method for providing detailed follow-up explanations
+  - Conversation storage in database (ConversationModel, ConversationMessageModel)
+  - Webhook handling for GitHub issue_comment and GitLab note events
+  - Worker task process_followup for async question processing
+  - REST API endpoints for conversations
+    - `GET /api/conversations` - List conversations with filtering
+    - `GET /api/conversations/{id}` - Conversation detail with messages
+    - `GET /api/conversations/review/{id}` - Conversation for specific review
+  - Explain prompt templates for each agent (security, style, complexity)
+  - Configuration settings for follow-up behaviour
+    - followup_enabled - Enable/disable follow-up processing
+    - followup_confidence_threshold - Minimum confidence to respond
+    - followup_max_tokens_per_response - Token limit for responses
+- React dashboard for exploring reviews and monitoring metrics
+  - Review list page with filtering (repository, status, verdict, author) and pagination
+  - Review detail page with findings grouped by severity and expandable cards
+  - Deliberation timeline showing step-by-step decision process
+  - Metrics dashboard with charts for verdicts, severities, reviews over time
+  - VerdictBadge and SeverityBadge components with color-coded indicators
+  - TanStack Query for data fetching and caching
+  - Tailwind CSS for styling with responsive layouts
+  - Docker configuration with nginx for production builds
+- `GET /api/reviews/metrics` endpoint for aggregate statistics
+  - Total/completed review counts
+  - Average cost per review
+  - Verdict and severity distribution
+  - Reviews over last 30 days
+  - Cost breakdown by agent
+- Idempotent comment updates for re-reviews
+  - Comment model for representing PR/MR comments
+  - `get_comments()` method on platform clients to list PR comments
+  - `update_comment()` method to edit existing comments
+  - Arbiter marker (`<!-- arbiter-review -->`) embedded in review comments
+  - Re-reviews now update existing Arbiter comment instead of posting new ones
+  - Graceful fallback: posts new comment if fetching comments fails
+- Documentation
+  - Dashboard README with component architecture and usage
+  - Deployment guide with Docker, production, and scaling guidance
+  - API reference with endpoint documentation and webhook schemas
+  - Environment variable reference (.env.example)
+  - Troubleshooting section in main README
+
+### Changed
+
+- Increased test coverage from 74% to 86%+
+  - LiteLLMClient initialization and complete method tests
+  - Static analysis runner tests with mocked subprocess execution
+  - API dependency injection tests (get_db, get_redis, close_redis)
+  - Extended test fixtures for platform client mocking
+- Version bumped to 0.5.0
+
+## [0.4.0] - 2025-04-13
+
+### Added
+
+- GitHub integration client for fetching diffs and posting review comments
+  - Fetch PR diffs via GitHub API
+  - Post review comments on pull requests
+  - Update commit status checks (pending/success/failure/error)
+  - Automatic retry with exponential backoff on transient errors
+  - Rate limit monitoring with warnings when limits are low
+- GitLab integration client with equivalent functionality
+  - Fetch MR diffs via GitLab API
+  - Post notes on merge requests
+  - Update commit pipeline status
+  - URL-encoded project path handling
+- Review comment formatter for Markdown output
+  - Verdict header with icon (approve/changes/comment)
+  - Summary statistics table by severity
+  - Findings grouped by severity level
+  - Conflicts section for agent disagreements
+  - Optional cost/tokens footer
+  - Automatic truncation to stay within GitHub's 65535 char limit
+- Platform integration exceptions
+  - AuthenticationError for 401/403 responses
+  - RateLimitError for 429 responses with retry_after
+  - NotFoundError for 404 responses
+  - PlatformError for 5xx server errors
+- Extended configuration settings
+  - github_token and gitlab_token for API authentication
+  - github_base_url and gitlab_base_url for enterprise instances
+  - integration_timeout and integration_max_retries
+  - status_check_context for commit status naming
+  - post_comments and update_status feature flags
+- Worker integration with platform clients
+  - Automatic diff fetching when not provided
+  - Commit status set to pending on job start
+  - Review comment posted on completion
+  - Status updated to success/failure based on verdict
+  - Graceful degradation on platform API failures
+
+### Changed
+
+- Worker process_review now accepts platform parameter
+- Webhook routes pass platform identifier to queue
+- Version bumped to 0.4.0
+
+### Dependencies
+
+- Added httpx>=0.26.0 to main dependencies
+- Added tenacity>=8.2.0 for retry logic
+- Added respx>=0.21.0 to dev dependencies for HTTP mocking
+
+## [0.3.0] - 2025-03-29
+
+### Added
+
+- FastAPI application with lifespan management, CORS, and exception handlers
+- PostgreSQL database schema with SQLAlchemy async ORM
+  - ReviewModel for storing PR review metadata and results
+  - FindingModel for individual agent findings
+  - ConflictModel for detected conflicts between agents
+  - DeliberationStepModel for audit trail
+  - PolicyModel for organisation configurations
+- Alembic migrations with async PostgreSQL support
+- Redis-backed arq worker for async job processing
+  - Review task with full pipeline execution
+  - Job deduplication by repository/PR/commit
+  - Priority queuing (draft PRs get lower priority)
+- REST API endpoints
+  - `GET /api/reviews` - List reviews with pagination and filtering
+  - `GET /api/reviews/{id}` - Review detail with findings
+  - `GET /api/reviews/{id}/deliberation` - Deliberation log
+  - `POST /api/reviews` - Trigger manual review
+- Webhook routes for GitHub and GitLab
+  - `POST /webhooks/github` - GitHub PR events with HMAC-SHA256 signature validation
+  - `POST /webhooks/gitlab` - GitLab MR events with token validation
+- Health and metrics endpoints
+  - `GET /health` - Liveness check
+  - `GET /health/ready` - Readiness check (database, Redis)
+  - `GET /health/live` - Kubernetes liveness probe
+  - `GET /metrics` - Prometheus metrics
+- LLM response cache with Redis backend
+  - Cache key based on diff, agent, prompt version, and policy
+  - Configurable TTL (default 24 hours)
+  - Cache statistics tracking
+- Cost tracking models
+  - ReviewCost for aggregate cost tracking
+  - AgentCost for per-agent breakdown
+  - CostEstimate for pre-review cost estimation
+- Docker configuration
+  - Multi-stage Dockerfile for API and worker
+  - docker-compose.yml with PostgreSQL, Redis, API, worker, and migrate services
+- Extended configuration settings
+  - Database connection (URL, pool size, max overflow)
+  - Redis connection (URL, max connections)
+  - Webhook secrets (GitHub HMAC, GitLab token)
+  - API settings (rate limits, CORS origins)
+  - Worker settings (max jobs, timeout, retries)
+
+### Changed
+
+- Configuration now uses `@lru_cache` for settings singleton
+- Version bumped to 0.3.0
+
+## [0.2.0] - 2025-03-15
+
+### Added
+
+- Diff parser with line mapping for unified diff format
+- Static analysis runners for ruff, mypy, bandit, and radon
+- Finding merger with deduplication and proximity-based grouping
+- Algorithmic conflict detection between agents
+- LLM synthesis for ambiguous semantic conflicts
+- Deliberation coordinator with merge, conflict detection, synthesis, and verdict determination
+- Verdict rules based on severity counts and configurable thresholds
+- CLI flags for static analysis (--static-analysis/--no-static-analysis) and work directory (--work-dir)
+- Test fixtures for conflict scenarios (security vs complexity, overlapping, contradictory)
+- Test suites for deliberation and static analysis (85% coverage)
+
+### Changed
+
+- CLI review command now outputs deliberation results with verdict, findings, conflicts, and resolution log
+- Output formats (rich, JSON, markdown) updated to display verdict and deliberation steps
+
+## [0.1.0] - 2025-03-09
+
+### Added
+
+- Initial project scaffold
+- Project plan with architecture documentation
+- Core data models: Finding, Policy, AgentConfig, ReviewResult, Severity, Verdict enums
+- LLM client abstraction with LiteLLM integration
+- Prompt template system with versioned templates and registry
+- Agent framework with base Agent class and ReviewContext
+- Security, Style, and Complexity review agents
+- CLI with `arbiter review` command supporting diff files and stdin
+- Output formats: rich (terminal), JSON, markdown
+- Policy configuration via YAML files
+- Model override via CLI flag
+- Test suite with 92% coverage
+
+### Changed
+
+- Consolidated to single Python service (removed separate Node.js webhook handler)
+- Reduced MVP scope to 3 agents: Security, Style, Complexity
+- Added static analysis pre-pass (ruff, mypy, bandit, radon)
+- Added cost management section with token budgets and model selection
+- Added error handling strategy with circuit breaker
+- Added observability section with Prometheus metrics
+- Added security section for prompt injection mitigation
+- Changed deliberation to use algorithmic conflict detection first
+- Specified typed DeliberationStep structure with documented metadata keys
+- Updated conflict examples to show real Security vs Complexity trade-offs
+- Added prompt_version field to Finding model for traceability
+- Moved caching and cost tracking to Phase 3 (requires database)
+- Clarified false positive evaluation methodology
+- Documented prompts versioning strategy (deployment-time config)
@@ -0,0 +1,278 @@
+# Arbiter
+
+A multi-agent code review system that shows its work.
+
+## What is this?
+
+Arbiter is a code review tool where specialised AI agents independently analyse pull
+requests, then deliberate to produce unified feedback. Unlike black-box AI reviewers,
+Arbiter exposes the reasoning process — you see how agents disagree, weigh trade-offs,
+and reach consensus.
+
+## Why?
+
+Current AI code review tools give you a verdict but hide their reasoning. When they
+flag something, you can't tell if it's a security expert's concern or a style nitpick.
+Arbiter surfaces the editorial board's discussion.
+
+## Features
+
+- **Static analysis pre-pass** — ruff, mypy, bandit, radon run first
+- **Specialised agents** — Security, Style, Complexity (LLM-powered)
+- **Transparent deliberation** — See how agents reason and resolve conflicts
+- **Configurable policies** — Adapt to your team's standards
+- **Cost controls** — Token budgets, model selection, response caching
+- **GitHub/GitLab integration** — Webhook-driven, posts comments to PRs
+
+## Architecture
+
+```
+GitHub/GitLab
+     │
+     │ Webhook (PR opened/updated)
+     ▼
+┌─────────────────────────────────────────────┐
+│           FastAPI Application               │
+│                                             │
+│  Webhook ──► Redis Queue ──► Worker         │
+│                                │            │
+│                                ▼            │
+│  ┌───────────────────────────────────────┐  │
+│  │        Review Orchestrator            │  │
+│  │                                       │  │
+│  │  1. Static analysis (ruff, mypy...)  │  │
+│  │  2. Agents in parallel               │  │
+│  │  3. Deliberation                     │  │
+│  │  4. Post results                     │  │
+│  │                                       │  │
+│  │  ┌──────────┐ ┌───────┐ ┌──────────┐ │  │
+│  │  │ Security │ │ Style │ │Complexity│ │  │
+│  │  └────┬─────┘ └───┬───┘ └────┬─────┘ │  │
+│  │       └───────────┼──────────┘       │  │
+│  │                   ▼                  │  │
+│  │           ┌─────────────┐            │  │
+│  │           │ Coordinator │            │  │
+│  │           └─────────────┘            │  │
+│  └───────────────────────────────────────┘  │
+└─────────────────────────────────────────────┘
+     │
+     ├──► PR Comment
+     ├──► Database (history)
+     └──► Metrics
+```
+
+## Tech Stack
+
+| Component | Technology |
+|-----------|------------|
+| Backend | Python 3.12, FastAPI |
+| Queue | Redis, arq |
+| Database | PostgreSQL |
+| LLM | LiteLLM (OpenAI, Anthropic, local) |
+| Static analysis | ruff, mypy, bandit, radon |
+
+## Quick Start
+
+```bash
+# Clone the repository
+git clone https://gitea.kschappell.com/kschappell/arbiter.git
+cd arbiter
+
+# Start infrastructure
+docker compose up -d db redis
+
+# Install dependencies
+pip install -e ".[dev]"
+
+# Run migrations
+alembic upgrade head
+
+# Start API server
+uvicorn src.arbiter.main:app --reload
+
+# Start worker (separate terminal)
+arq src.arbiter.worker.tasks.WorkerSettings
+```
+
+## CLI Usage
+
+Review a local diff without running the full server:
+
+```bash
+# Review a diff file
+arbiter review changes.diff --policy .arbiter/policy.yaml
+
+# Review staged changes
+git diff --cached | arbiter review - --policy .arbiter/policy.yaml
+```
+
+## Configuration
+
+Create `.arbiter/policy.yaml` in your repository:
+
+```yaml
+version: "1.0"
+
+static_analysis:
+  ruff:
+    enabled: true
+  mypy:
+    enabled: true
+  bandit:
+    enabled: true
+    severity_threshold: medium
+
+agents:
+  security:
+    enabled: true
+    model: "gpt-4o"
+    severity_threshold: medium
+
+  style:
+    enabled: true
+    model: "gpt-4o-mini"
+    config:
+      naming_convention: snake_case
+
+  complexity:
+    enabled: true
+    model: "gpt-4o-mini"
+    thresholds:
+      max_cyclomatic: 10
+
+deliberation:
+  conflict_resolution: security_first
+  minimum_confidence: 0.7
+
+cost_controls:
+  max_tokens_per_review: 50000
+  max_cost_per_review_usd: 0.50
+  cache_similar_diffs: true
+```
+
+## Example Output
+
+```markdown
+## Arbiter Review
+
+**Verdict:** Request changes (confidence: 92%)
+
+### Static Analysis
+- **bandit** B105: Possible hardcoded password (line 52)
+- **radon** CC: Function `process_data` has complexity 12 (threshold: 10)
+
+### Agent Findings
+
+🔒 **Security** (High)
+Line 47: Endpoint `/api/admin/export` has no authentication decorator.
+→ All admin endpoints should use `@require_admin` per project patterns.
+
+📐 **Style** (Low)
+Line 23: Function name `getData` doesn't match snake_case convention.
+
+### Deliberation
+
+All agents agree authentication is missing. Static analysis confirms
+hardcoded password on line 52. Both issues require resolution.
+```
+
+## Dashboard
+
+Arbiter includes a React dashboard for exploring reviews and monitoring metrics:
+
+- **Review List** — Browse all reviews with filtering by repository, status, verdict, and author
+- **Review Detail** — View findings grouped by severity with expandable cards
+- **Deliberation Explorer** — Step-by-step timeline of how agents reached their verdict
+- **Metrics** — Charts showing verdicts, severities, and review trends over time
+
+Start the dashboard:
+
+```bash
+cd dashboard
+npm install
+npm run dev
+```
+
+Access at `http://localhost:5173`. Configure the API URL via `VITE_API_URL` environment variable.
+
+## API Documentation
+
+The API server provides interactive documentation:
+
+- **Swagger UI** — `http://localhost:8000/docs`
+- **ReDoc** — `http://localhost:8000/redoc`
+- **OpenAPI Schema** — `http://localhost:8000/openapi.json`
+
+For detailed endpoint documentation, see [docs/api.md](docs/api.md).
+
+## Environment Variables
+
+Quick reference of key environment variables (prefix with `ARBITER_`):
+
+| Variable | Description | Default |
+|----------|-------------|---------|
+| `DATABASE_URL` | PostgreSQL connection URL | `postgresql+asyncpg://arbiter:arbiter@localhost:5432/arbiter` |
+| `REDIS_URL` | Redis connection URL | `redis://localhost:6379/0` |
+| `DEFAULT_MODEL` | LLM model for agents | `gpt-4o` |
+| `GITHUB_TOKEN` | GitHub API token | - |
+| `GITHUB_WEBHOOK_SECRET` | Webhook HMAC secret | - |
+| `GITLAB_TOKEN` | GitLab API token | - |
+| `GITLAB_WEBHOOK_TOKEN` | Webhook verification token | - |
+| `POST_COMMENTS` | Post review comments to PRs | `true` |
+| `UPDATE_STATUS` | Update commit status checks | `true` |
+
+See [.env.example](.env.example) for the complete list.
+
+## Deployment
+
+For production deployment instructions, see [docs/deployment.md](docs/deployment.md).
+
+## Troubleshooting
+
+### Worker not processing jobs
+
+Check that Redis is running and accessible:
+
+```bash
+redis-cli ping  # Should return PONG
+```
+
+Verify the worker is connected:
+
+```bash
+arq src.arbiter.worker.tasks.WorkerSettings --check
+```
+
+### Webhook not receiving events
+
+1. Verify the webhook URL is publicly accessible
+2. Check that webhook secrets match between GitHub/GitLab and your configuration
+3. Inspect webhook deliveries in GitHub/GitLab settings for error responses
+
+### LLM timeouts
+
+Increase timeout and reduce model complexity:
+
+```bash
+export ARBITER_LLM_TIMEOUT=120
+export ARBITER_DEFAULT_MODEL=gpt-4o-mini
+```
+
+### Database connection errors
+
+Ensure PostgreSQL is running and the connection URL is correct:
+
+```bash
+psql $DATABASE_URL -c "SELECT 1"  # Test connection
+alembic upgrade head  # Run pending migrations
+```
+
+### Review not appearing in dashboard
+
+1. Check that the API server is running
+2. Verify CORS settings include your dashboard URL
+3. Check browser console for API errors
+
+## License
+
+MIT