initial project scaffold

2025-03-08 15:18:11 +00:00
commit 72268ff440
3 changed files with 570 additions and 0 deletions
@@ -0,0 +1,68 @@
 # Python
 __pycache__/
 *.py[cod]
 *$py.class
 *.so
 .Python
 build/
 develop-eggs/
 dist/
 downloads/
 eggs/
 .eggs/
 lib/
 lib64/
 parts/
 sdist/
 var/
 wheels/
 *.egg-info/
 .installed.cfg
 *.egg
 .venv/
 venv/
 ENV/
 # Node
 node_modules/
 npm-debug.log*
 yarn-debug.log*
 yarn-error.log*
 .npm
 # Build outputs
 *.js.map
 .next/
 out/
 # IDE
 .idea/
 .vscode/
 *.swp
 *.swo
 *~
 # Environment
 .env
 .env.local
 .env.*.local
 # Testing
 .coverage
 htmlcov/
 .pytest_cache/
 .mypy_cache/
 .ruff_cache/
 coverage/
 # Database
 *.db
 *.sqlite3
 # OS
 .DS_Store
 Thumbs.db
 # Logs
 *.log
 logs/
@@ -0,0 +1,224 @@
 # Changelog
 All notable changes to Arbiter will be documented in this file.
 The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.1.0/),
 and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).
 ## [Unreleased]
 ## [0.5.0] - 2025-06-16
 ### Added
 - Conversational follow-up system for PR comment Q&A
  - Question detection in PR comments with confidence scoring
  - Agent routing based on keywords, finding references, and context
  - Agent explain() method for providing detailed follow-up explanations
  - Conversation storage in database (ConversationModel, ConversationMessageModel)
  - Webhook handling for GitHub issue_comment and GitLab note events
  - Worker task process_followup for async question processing
  - REST API endpoints for conversations
    - `GET /api/conversations` - List conversations with filtering
    - `GET /api/conversations/{id}` - Conversation detail with messages
    - `GET /api/conversations/review/{id}` - Conversation for specific review
  - Explain prompt templates for each agent (security, style, complexity)
  - Configuration settings for follow-up behaviour
    - followup_enabled - Enable/disable follow-up processing
    - followup_confidence_threshold - Minimum confidence to respond
    - followup_max_tokens_per_response - Token limit for responses
 - React dashboard for exploring reviews and monitoring metrics
  - Review list page with filtering (repository, status, verdict, author) and pagination
  - Review detail page with findings grouped by severity and expandable cards
  - Deliberation timeline showing step-by-step decision process
  - Metrics dashboard with charts for verdicts, severities, reviews over time
  - VerdictBadge and SeverityBadge components with color-coded indicators
  - TanStack Query for data fetching and caching
  - Tailwind CSS for styling with responsive layouts
  - Docker configuration with nginx for production builds
 - `GET /api/reviews/metrics` endpoint for aggregate statistics
  - Total/completed review counts
  - Average cost per review
  - Verdict and severity distribution
  - Reviews over last 30 days
  - Cost breakdown by agent
 - Idempotent comment updates for re-reviews
  - Comment model for representing PR/MR comments
  - `get_comments()` method on platform clients to list PR comments
  - `update_comment()` method to edit existing comments
  - Arbiter marker (`<!-- arbiter-review -->`) embedded in review comments
  - Re-reviews now update existing Arbiter comment instead of posting new ones
  - Graceful fallback: posts new comment if fetching comments fails
 - Documentation
  - Dashboard README with component architecture and usage
  - Deployment guide with Docker, production, and scaling guidance
  - API reference with endpoint documentation and webhook schemas
  - Environment variable reference (.env.example)
  - Troubleshooting section in main README
 ### Changed
 - Increased test coverage from 74% to 86%+
  - LiteLLMClient initialization and complete method tests
  - Static analysis runner tests with mocked subprocess execution
  - API dependency injection tests (get_db, get_redis, close_redis)
  - Extended test fixtures for platform client mocking
 - Version bumped to 0.5.0
 ## [0.4.0] - 2025-04-13
 ### Added
 - GitHub integration client for fetching diffs and posting review comments
  - Fetch PR diffs via GitHub API
  - Post review comments on pull requests
  - Update commit status checks (pending/success/failure/error)
  - Automatic retry with exponential backoff on transient errors
  - Rate limit monitoring with warnings when limits are low
 - GitLab integration client with equivalent functionality
  - Fetch MR diffs via GitLab API
  - Post notes on merge requests
  - Update commit pipeline status
  - URL-encoded project path handling
 - Review comment formatter for Markdown output
  - Verdict header with icon (approve/changes/comment)
  - Summary statistics table by severity
  - Findings grouped by severity level
  - Conflicts section for agent disagreements
  - Optional cost/tokens footer
  - Automatic truncation to stay within GitHub's 65535 char limit
 - Platform integration exceptions
  - AuthenticationError for 401/403 responses
  - RateLimitError for 429 responses with retry_after
  - NotFoundError for 404 responses
  - PlatformError for 5xx server errors
 - Extended configuration settings
  - github_token and gitlab_token for API authentication
  - github_base_url and gitlab_base_url for enterprise instances
  - integration_timeout and integration_max_retries
  - status_check_context for commit status naming
  - post_comments and update_status feature flags
 - Worker integration with platform clients
  - Automatic diff fetching when not provided
  - Commit status set to pending on job start
  - Review comment posted on completion
  - Status updated to success/failure based on verdict
  - Graceful degradation on platform API failures
 ### Changed
 - Worker process_review now accepts platform parameter
 - Webhook routes pass platform identifier to queue
 - Version bumped to 0.4.0
 ### Dependencies
 - Added httpx>=0.26.0 to main dependencies
 - Added tenacity>=8.2.0 for retry logic
 - Added respx>=0.21.0 to dev dependencies for HTTP mocking
 ## [0.3.0] - 2025-03-29
 ### Added
 - FastAPI application with lifespan management, CORS, and exception handlers
 - PostgreSQL database schema with SQLAlchemy async ORM
  - ReviewModel for storing PR review metadata and results
  - FindingModel for individual agent findings
  - ConflictModel for detected conflicts between agents
  - DeliberationStepModel for audit trail
  - PolicyModel for organisation configurations
 - Alembic migrations with async PostgreSQL support
 - Redis-backed arq worker for async job processing
  - Review task with full pipeline execution
  - Job deduplication by repository/PR/commit
  - Priority queuing (draft PRs get lower priority)
 - REST API endpoints
  - `GET /api/reviews` - List reviews with pagination and filtering
  - `GET /api/reviews/{id}` - Review detail with findings
  - `GET /api/reviews/{id}/deliberation` - Deliberation log
  - `POST /api/reviews` - Trigger manual review
 - Webhook routes for GitHub and GitLab
  - `POST /webhooks/github` - GitHub PR events with HMAC-SHA256 signature validation
  - `POST /webhooks/gitlab` - GitLab MR events with token validation
 - Health and metrics endpoints
  - `GET /health` - Liveness check
  - `GET /health/ready` - Readiness check (database, Redis)
  - `GET /health/live` - Kubernetes liveness probe
  - `GET /metrics` - Prometheus metrics
 - LLM response cache with Redis backend
  - Cache key based on diff, agent, prompt version, and policy
  - Configurable TTL (default 24 hours)
  - Cache statistics tracking
 - Cost tracking models
  - ReviewCost for aggregate cost tracking
  - AgentCost for per-agent breakdown
  - CostEstimate for pre-review cost estimation
 - Docker configuration
  - Multi-stage Dockerfile for API and worker
  - docker-compose.yml with PostgreSQL, Redis, API, worker, and migrate services
 - Extended configuration settings
  - Database connection (URL, pool size, max overflow)
  - Redis connection (URL, max connections)
  - Webhook secrets (GitHub HMAC, GitLab token)
  - API settings (rate limits, CORS origins)
  - Worker settings (max jobs, timeout, retries)
 ### Changed
 - Configuration now uses `@lru_cache` for settings singleton
 - Version bumped to 0.3.0
 ## [0.2.0] - 2025-03-15
 ### Added
 - Diff parser with line mapping for unified diff format
 - Static analysis runners for ruff, mypy, bandit, and radon
 - Finding merger with deduplication and proximity-based grouping
 - Algorithmic conflict detection between agents
 - LLM synthesis for ambiguous semantic conflicts
 - Deliberation coordinator with merge, conflict detection, synthesis, and verdict determination
 - Verdict rules based on severity counts and configurable thresholds
 - CLI flags for static analysis (--static-analysis/--no-static-analysis) and work directory (--work-dir)
 - Test fixtures for conflict scenarios (security vs complexity, overlapping, contradictory)
 - Test suites for deliberation and static analysis (85% coverage)
 ### Changed
 - CLI review command now outputs deliberation results with verdict, findings, conflicts, and resolution log
 - Output formats (rich, JSON, markdown) updated to display verdict and deliberation steps
 ## [0.1.0] - 2025-03-09
 ### Added
 - Initial project scaffold
 - Project plan with architecture documentation
 - Core data models: Finding, Policy, AgentConfig, ReviewResult, Severity, Verdict enums
 - LLM client abstraction with LiteLLM integration
 - Prompt template system with versioned templates and registry
 - Agent framework with base Agent class and ReviewContext
 - Security, Style, and Complexity review agents
 - CLI with `arbiter review` command supporting diff files and stdin
 - Output formats: rich (terminal), JSON, markdown
 - Policy configuration via YAML files
 - Model override via CLI flag
 - Test suite with 92% coverage
 ### Changed
 - Consolidated to single Python service (removed separate Node.js webhook handler)
 - Reduced MVP scope to 3 agents: Security, Style, Complexity
 - Added static analysis pre-pass (ruff, mypy, bandit, radon)
 - Added cost management section with token budgets and model selection
 - Added error handling strategy with circuit breaker
 - Added observability section with Prometheus metrics
 - Added security section for prompt injection mitigation
 - Changed deliberation to use algorithmic conflict detection first
 - Specified typed DeliberationStep structure with documented metadata keys
 - Updated conflict examples to show real Security vs Complexity trade-offs
 - Added prompt_version field to Finding model for traceability
 - Moved caching and cost tracking to Phase 3 (requires database)
 - Clarified false positive evaluation methodology
 - Documented prompts versioning strategy (deployment-time config)
@@ -0,0 +1,278 @@
 # Arbiter
 A multi-agent code review system that shows its work.
 ## What is this?
 Arbiter is a code review tool where specialised AI agents independently analyse pull
 requests, then deliberate to produce unified feedback. Unlike black-box AI reviewers,
 Arbiter exposes the reasoning process — you see how agents disagree, weigh trade-offs,
 and reach consensus.
 ## Why?
 Current AI code review tools give you a verdict but hide their reasoning. When they
 flag something, you can't tell if it's a security expert's concern or a style nitpick.
 Arbiter surfaces the editorial board's discussion.
 ## Features
 - **Static analysis pre-pass** — ruff, mypy, bandit, radon run first
 - **Specialised agents** — Security, Style, Complexity (LLM-powered)
 - **Transparent deliberation** — See how agents reason and resolve conflicts
 - **Configurable policies** — Adapt to your team's standards
 - **Cost controls** — Token budgets, model selection, response caching
 - **GitHub/GitLab integration** — Webhook-driven, posts comments to PRs
 ## Architecture
 ```
 GitHub/GitLab
     │
     │ Webhook (PR opened/updated)
     ▼
 ┌─────────────────────────────────────────────┐
 │           FastAPI Application               │
 │                                             │
 │  Webhook ──► Redis Queue ──► Worker         │
 │                                │            │
 │                                ▼            │
 │  ┌───────────────────────────────────────┐  │
 │  │        Review Orchestrator            │  │
 │  │                                       │  │
 │  │  1. Static analysis (ruff, mypy...)  │  │
 │  │  2. Agents in parallel               │  │
 │  │  3. Deliberation                     │  │
 │  │  4. Post results                     │  │
 │  │                                       │  │
 │  │  ┌──────────┐ ┌───────┐ ┌──────────┐ │  │
 │  │  │ Security │ │ Style │ │Complexity│ │  │
 │  │  └────┬─────┘ └───┬───┘ └────┬─────┘ │  │
 │  │       └───────────┼──────────┘       │  │
 │  │                   ▼                  │  │
 │  │           ┌─────────────┐            │  │
 │  │           │ Coordinator │            │  │
 │  │           └─────────────┘            │  │
 │  └───────────────────────────────────────┘  │
 └─────────────────────────────────────────────┘
     │
     ├──► PR Comment
     ├──► Database (history)
     └──► Metrics
 ```
 ## Tech Stack
 | Component | Technology |
 |-----------|------------|
 | Backend | Python 3.12, FastAPI |
 | Queue | Redis, arq |
 | Database | PostgreSQL |
 | LLM | LiteLLM (OpenAI, Anthropic, local) |
 | Static analysis | ruff, mypy, bandit, radon |
 ## Quick Start
 ```bash
 # Clone the repository
 git clone https://gitea.kschappell.com/kschappell/arbiter.git
 cd arbiter
 # Start infrastructure
 docker compose up -d db redis
 # Install dependencies
 pip install -e ".[dev]"
 # Run migrations
 alembic upgrade head
 # Start API server
 uvicorn src.arbiter.main:app --reload
 # Start worker (separate terminal)
 arq src.arbiter.worker.tasks.WorkerSettings
 ```
 ## CLI Usage
 Review a local diff without running the full server:
 ```bash
 # Review a diff file
 arbiter review changes.diff --policy .arbiter/policy.yaml
 # Review staged changes
 git diff --cached | arbiter review - --policy .arbiter/policy.yaml
 ```
 ## Configuration
 Create `.arbiter/policy.yaml` in your repository:
 ```yaml
 version: "1.0"
 static_analysis:
  ruff:
    enabled: true
  mypy:
    enabled: true
  bandit:
    enabled: true
    severity_threshold: medium
 agents:
  security:
    enabled: true
    model: "gpt-4o"
    severity_threshold: medium
  style:
    enabled: true
    model: "gpt-4o-mini"
    config:
      naming_convention: snake_case
  complexity:
    enabled: true
    model: "gpt-4o-mini"
    thresholds:
      max_cyclomatic: 10
 deliberation:
  conflict_resolution: security_first
  minimum_confidence: 0.7
 cost_controls:
  max_tokens_per_review: 50000
  max_cost_per_review_usd: 0.50
  cache_similar_diffs: true
 ```
 ## Example Output
 ```markdown
 ## Arbiter Review
 **Verdict:** Request changes (confidence: 92%)
 ### Static Analysis
 - **bandit** B105: Possible hardcoded password (line 52)
 - **radon** CC: Function `process_data` has complexity 12 (threshold: 10)
 ### Agent Findings
 🔒 **Security** (High)
 Line 47: Endpoint `/api/admin/export` has no authentication decorator.
 → All admin endpoints should use `@require_admin` per project patterns.
 📐 **Style** (Low)
 Line 23: Function name `getData` doesn't match snake_case convention.
 ### Deliberation
 All agents agree authentication is missing. Static analysis confirms
 hardcoded password on line 52. Both issues require resolution.
 ```
 ## Dashboard
 Arbiter includes a React dashboard for exploring reviews and monitoring metrics:
 - **Review List** — Browse all reviews with filtering by repository, status, verdict, and author
 - **Review Detail** — View findings grouped by severity with expandable cards
 - **Deliberation Explorer** — Step-by-step timeline of how agents reached their verdict
 - **Metrics** — Charts showing verdicts, severities, and review trends over time
 Start the dashboard:
 ```bash
 cd dashboard
 npm install
 npm run dev
 ```
 Access at `http://localhost:5173`. Configure the API URL via `VITE_API_URL` environment variable.
 ## API Documentation
 The API server provides interactive documentation:
 - **Swagger UI** — `http://localhost:8000/docs`
 - **ReDoc** — `http://localhost:8000/redoc`
 - **OpenAPI Schema** — `http://localhost:8000/openapi.json`
 For detailed endpoint documentation, see [docs/api.md](docs/api.md).
 ## Environment Variables
 Quick reference of key environment variables (prefix with `ARBITER_`):
 | Variable | Description | Default |
 |----------|-------------|---------|
 | `DATABASE_URL` | PostgreSQL connection URL | `postgresql+asyncpg://arbiter:arbiter@localhost:5432/arbiter` |
 | `REDIS_URL` | Redis connection URL | `redis://localhost:6379/0` |
 | `DEFAULT_MODEL` | LLM model for agents | `gpt-4o` |
 | `GITHUB_TOKEN` | GitHub API token | - |
 | `GITHUB_WEBHOOK_SECRET` | Webhook HMAC secret | - |
 | `GITLAB_TOKEN` | GitLab API token | - |
 | `GITLAB_WEBHOOK_TOKEN` | Webhook verification token | - |
 | `POST_COMMENTS` | Post review comments to PRs | `true` |
 | `UPDATE_STATUS` | Update commit status checks | `true` |
 See [.env.example](.env.example) for the complete list.
 ## Deployment
 For production deployment instructions, see [docs/deployment.md](docs/deployment.md).
 ## Troubleshooting
 ### Worker not processing jobs
 Check that Redis is running and accessible:
 ```bash
 redis-cli ping  # Should return PONG
 ```
 Verify the worker is connected:
 ```bash
 arq src.arbiter.worker.tasks.WorkerSettings --check
 ```
 ### Webhook not receiving events
 1. Verify the webhook URL is publicly accessible
 2. Check that webhook secrets match between GitHub/GitLab and your configuration
 3. Inspect webhook deliveries in GitHub/GitLab settings for error responses
 ### LLM timeouts
 Increase timeout and reduce model complexity:
 ```bash
 export ARBITER_LLM_TIMEOUT=120
 export ARBITER_DEFAULT_MODEL=gpt-4o-mini
 ```
 ### Database connection errors
 Ensure PostgreSQL is running and the connection URL is correct:
 ```bash
 psql $DATABASE_URL -c "SELECT 1"  # Test connection
 alembic upgrade head  # Run pending migrations
 ```
 ### Review not appearing in dashboard
 1. Check that the API server is running
 2. Verify CORS settings include your dashboard URL
 3. Check browser console for API errors
 ## License
 MIT