# CLAUDE.md

Guidelines for working on the Veritext project.

## Project Overview

Veritext is a semantic text validation framework for Python. It validates text outputs
against quality criteria using metrics like BLEU, ROUGE, and semantic similarity.

## Directory Structure

```
veritext/
├── src/veritext/          # Package source
│   ├── core/              # Shared types, tokenisation, config
│   ├── metrics/           # BLEU, ROUGE, lexical, readability
│   ├── semantic/          # Optional embedding-based similarity
│   ├── validators/        # Composable validation checks
│   ├── benchmark/         # Quality tracking & regression detection
│   ├── pytest_plugin/     # Native pytest integration
│   └── cli/               # Command-line interface
├── tests/                 # Test suite (mirrors src structure)
├── docs/                  # Project documentation
└── examples/              # Usage examples
```

## Code Style

### Python Conventions

- **Python 3.11+** with modern type hints
- **UK English** in all text (colour, behaviour, summarisation, tokenisation)
- **snake_case** for variables, functions, modules
- **PascalCase** for classes
- Absolute imports from package root: `from veritext.core.types import ...`

### Quality Gates

All must pass with zero issues before any commit:

```bash
uv run ruff check .              # Linting
uv run ruff format --check .     # Formatting
uv run mypy src/                 # Type checking
uv run pytest                    # Tests
```

### Documentation

- Docstrings for all public APIs (Google style)
- Type hints on all function signatures
- Keep docstrings concise; let types speak where possible

## Architecture

### Layer Dependencies

```
CLI / pytest_plugin  (presentation)
        ↓
validators / benchmark  (decision logic)
        ↓
metrics  (pure computation)
        ↓
core  (shared types, tokenisation)
```

Each layer depends only on layers below it.

### Metrics vs Validators

| Concept | Responsibility | Output |
|---------|----------------|--------|
| **Metric** | Compute a score | Typed result (e.g., `BleuResult`) |
| **Validator** | Make pass/fail decision | `ValidationResult` with diagnostics |

### Edge Case Handling

- Empty text: Metrics return zero scores; validators fail
- Empty reference: Comparison metrics raise `ValueError`
- Whitespace-only: Treated as empty after tokenisation
- Unicode: NFC normalisation by default

## Git Workflow

### Before Starting Work

When starting work from a plan, create a new branch matching the plan's scope before
making any changes. Do not reuse an existing branch from previous work, even if related.

### Commits

- Format: `type(scope): description`
- Types: feat, fix, chore, refactor, docs, test
- Atomic: ≤3 new files, ≤150 LOC per commit
- Update changelog.md before completing a task

### Branches

- `feat/kebab-case` — new features
- `fix/kebab-case` — bug fixes
- `chore/` — maintenance
- `refactor/` — code restructure
- `docs/` — documentation only

## Testing

- Test files mirror source structure: `tests/test_core/test_types.py`
- Use pytest fixtures for common setup
- Target ≥80% coverage
- Include edge cases: empty input, Unicode, boundary values

## Pre-Completion Checklist

Before marking ANY task complete:

- [ ] All linting/formatting/type checks pass
- [ ] Tests pass with adequate coverage
- [ ] changelog.md updated if user-facing changes
- [ ] Filenames are lowercase (except CLAUDE.md)
- [ ] Commit follows `type(scope): description` format