project setup: pyproject.toml, deps, tooling

Configure Python project with pydantic, structlog, typer, rich dependencies. Set up ruff, mypy, pytest tooling with strict type checking.
2025-03-08 14:03:32 +00:00
commit bf5884cb27
4 changed files with 1924 additions and 0 deletions
--- a/changelog.md
+++ b/changelog.md
@@ -0,0 +1,114 @@
+# Changelog
+
+All notable changes to Veritext will be documented in this file.
+
+The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.1.0/),
+and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).
+
+## [Unreleased]
+
+### Changed
+
+- Refactored CLI metric computation to eliminate code duplication
+- Version format updated from `0.1.0-dev` to `0.1.0.dev0` (PEP 440 compliance)
+- Settings instance is now cached via `@lru_cache` for better performance
+- Documented composite validators' intentional deviation from `Check` protocol return type
+
+### Fixed
+
+- Consolidated redundant empty checks in ROUGE-L computation
+- Fixed README example using incorrect property names (`grade_level` → `flesch_kincaid_grade`, `reading_ease` → `flesch_reading_ease`)
+
+### Documentation
+
+- Added Phase 10 (Portfolio Demos) to implementation plan: Streamlit demo and Jupyter notebooks
+- Updated project plan with portfolio demo section
+- Fixed potential crash in ROUGE metric when all references are empty after tokenisation
+- Fixed potential division by zero in readability metric when text has no sentence endings
+- Fixed unbounded cache growth in `SemanticSimilarity` by implementing LRU eviction with configurable max size
+- Fixed mutable list aliasing in `AllOf` and `AnyOf` composite validators
+- Fixed regex pattern validation in `ContainsValidator` and `ExcludesValidator` to fail at init time rather than during `check()`
+- Fixed pytest plugin tests failing with duplicate plugin registration error
+
+### Added
+
+- Added `.score` property to `LexicalResult` for API consistency with other result types
+- Added `cache_max_size` parameter to `SemanticSimilarity` (default: 1000 embeddings)
+- Added test coverage for `core/config.py` and `core/logging.py` modules
+
+## [0.1.0] — 2025-05-17
+
+Initial release of Veritext, a semantic text validation framework for Python.
+
+### Added
+
+#### Core
+
+- Project scaffold with pyproject.toml and development tooling
+- Core exception hierarchy (`VeritextError` and subclasses)
+- Core types: `ValidationContext`, `CheckResult`, `ValidationResult`
+- Word tokeniser with Unicode normalisation support
+- Configuration module with pydantic-settings
+- Structured logging with structlog
+
+#### Metrics
+
+- Metrics module with `Metric` protocol, `AggregateStats`, and `BatchResult` types
+- BLEU metric implementation (BLEU-1 through BLEU-4 with brevity penalty)
+- ROUGE metric (ROUGE-1, ROUGE-2, ROUGE-L with precision/recall/F-measure)
+- Lexical similarity metric (Jaccard similarity and token overlap)
+- Flesch-Kincaid readability metrics (grade level and reading ease)
+- Batch scoring with aggregate statistics for all metrics
+
+#### Validators
+
+- Validators module with `Check` protocol for validation checks
+- Metric-based validators: `BleuValidator`, `RougeValidator`, `LexicalValidator`
+- Constraint validators: `LengthValidator`, `ReadabilityValidator`, `ContainsValidator`, `ExcludesValidator`
+- Composite validators: `AllOf` (all checks must pass), `AnyOf` (any check must pass)
+- Factory functions for clean validator API (`bleu()`, `rouge()`, `lexical()`, `length()`, `readability()`, `contains()`, `excludes()`, `all_of()`, `any_of()`)
+
+#### Semantic Similarity
+
+- Semantic similarity module with embedding-based text comparison (requires `veritext[semantic]` extra)
+- `SemanticSimilarity` metric using sentence-transformers for semantic relatedness
+- `SemanticValidator` for threshold-based semantic similarity validation
+- `semantic()` factory function for creating semantic validators
+- Embedding caching for performance optimisation in repeated comparisons
+
+#### Pytest Plugin
+
+- Native pytest plugin for CI/CD integration (entry point: `pytest11`)
+- `validate_text()` assertion function for expressive test assertions
+- `text_validation` marker for filtering validation tests
+- Pytest fixtures: `text_validator` factory and `validation_context` helper
+- Detailed failure messages with text preview and check diagnostics
+
+#### Benchmarking
+
+- Benchmark module for quality tracking and regression detection
+- `Benchmark` class for evaluating text quality over time with metric storage
+- `BenchmarkRun` and `RegressionReport` data models for tracking runs
+- SQLite storage backend with WAL mode for concurrent access
+- Rolling window baseline computation for historical comparison
+- `check_regression()` for statistical comparison against baseline
+- `assert_no_regression()` raises `RegressionDetectedError` for CI integration
+- Customisable tolerance threshold and window size for regression detection
+- Metadata support for tracking git SHA, model versions, etc.
+
+#### CLI
+
+- Command-line interface (CLI) via `veritext` command
+- `veritext validate` command for inline and file-based text validation
+- JSONL input format support for batch validation (`--file` option)
+- Separate candidate/reference file support (`--reference-file` option)
+- Multiple output formats: table (default), JSON, and simple text
+- `veritext benchmark run` command for running evaluations and storing results
+- `veritext benchmark show` command for viewing benchmark history
+- `veritext benchmark check` command for regression detection with exit code 1 on failure
+- Rich-formatted terminal output with tables and coloured panels
+
+#### Documentation
+
+- Readme with usage examples
+- Example scripts: basic validation, chatbot testing, benchmark regression