# CLAUDE.md Guidelines for working on the Veritext project. ## Project Overview Veritext is a semantic text validation framework for Python. It validates text outputs against quality criteria using metrics like BLEU, ROUGE, and semantic similarity. ## Directory Structure ``` veritext/ ├── src/veritext/ # Package source │ ├── core/ # Shared types, tokenisation, config │ ├── metrics/ # BLEU, ROUGE, lexical, readability │ ├── semantic/ # Optional embedding-based similarity │ ├── validators/ # Composable validation checks │ ├── benchmark/ # Quality tracking & regression detection │ ├── pytest_plugin/ # Native pytest integration │ └── cli/ # Command-line interface ├── tests/ # Test suite (mirrors src structure) ├── docs/ # Project documentation └── examples/ # Usage examples ``` ## Code Style ### Python Conventions - **Python 3.11+** with modern type hints - **UK English** in all text (colour, behaviour, summarisation, tokenisation) - **snake_case** for variables, functions, modules - **PascalCase** for classes - Absolute imports from package root: `from veritext.core.types import ...` ### Quality Gates All must pass with zero issues before any commit: ```bash uv run ruff check . # Linting uv run ruff format --check . # Formatting uv run mypy src/ # Type checking uv run pytest # Tests ``` ### Documentation - Docstrings for all public APIs (Google style) - Type hints on all function signatures - Keep docstrings concise; let types speak where possible ## Architecture ### Layer Dependencies ``` CLI / pytest_plugin (presentation) ↓ validators / benchmark (decision logic) ↓ metrics (pure computation) ↓ core (shared types, tokenisation) ``` Each layer depends only on layers below it. ### Metrics vs Validators | Concept | Responsibility | Output | |---------|----------------|--------| | **Metric** | Compute a score | Typed result (e.g., `BleuResult`) | | **Validator** | Make pass/fail decision | `ValidationResult` with diagnostics | ### Edge Case Handling - Empty text: Metrics return zero scores; validators fail - Empty reference: Comparison metrics raise `ValueError` - Whitespace-only: Treated as empty after tokenisation - Unicode: NFC normalisation by default ## Git Workflow ### Before Starting Work When starting work from a plan, create a new branch matching the plan's scope before making any changes. Do not reuse an existing branch from previous work, even if related. ### Commits - Format: `type(scope): description` - Types: feat, fix, chore, refactor, docs, test - Atomic: ≤3 new files, ≤150 LOC per commit - Update changelog.md before completing a task ### Branches - `feat/kebab-case` — new features - `fix/kebab-case` — bug fixes - `chore/` — maintenance - `refactor/` — code restructure - `docs/` — documentation only ## Testing - Test files mirror source structure: `tests/test_core/test_types.py` - Use pytest fixtures for common setup - Target ≥80% coverage - Include edge cases: empty input, Unicode, boundary values ## Pre-Completion Checklist Before marking ANY task complete: - [ ] All linting/formatting/type checks pass - [ ] Tests pass with adequate coverage - [ ] changelog.md updated if user-facing changes - [ ] Filenames are lowercase (except CLAUDE.md) - [ ] Commit follows `type(scope): description` format