project setup: pyproject.toml, deps, tooling

Configure Python project with pydantic, structlog, typer, rich dependencies.
Set up ruff, mypy, pytest tooling with strict type checking.
This commit is contained in:
2025-03-08 14:03:32 +00:00
commit bf5884cb27
4 changed files with 1924 additions and 0 deletions

114
changelog.md Normal file
View File

@@ -0,0 +1,114 @@
# Changelog
All notable changes to Veritext will be documented in this file.
The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.1.0/),
and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).
## [Unreleased]
### Changed
- Refactored CLI metric computation to eliminate code duplication
- Version format updated from `0.1.0-dev` to `0.1.0.dev0` (PEP 440 compliance)
- Settings instance is now cached via `@lru_cache` for better performance
- Documented composite validators' intentional deviation from `Check` protocol return type
### Fixed
- Consolidated redundant empty checks in ROUGE-L computation
- Fixed README example using incorrect property names (`grade_level``flesch_kincaid_grade`, `reading_ease``flesch_reading_ease`)
### Documentation
- Added Phase 10 (Portfolio Demos) to implementation plan: Streamlit demo and Jupyter notebooks
- Updated project plan with portfolio demo section
- Fixed potential crash in ROUGE metric when all references are empty after tokenisation
- Fixed potential division by zero in readability metric when text has no sentence endings
- Fixed unbounded cache growth in `SemanticSimilarity` by implementing LRU eviction with configurable max size
- Fixed mutable list aliasing in `AllOf` and `AnyOf` composite validators
- Fixed regex pattern validation in `ContainsValidator` and `ExcludesValidator` to fail at init time rather than during `check()`
- Fixed pytest plugin tests failing with duplicate plugin registration error
### Added
- Added `.score` property to `LexicalResult` for API consistency with other result types
- Added `cache_max_size` parameter to `SemanticSimilarity` (default: 1000 embeddings)
- Added test coverage for `core/config.py` and `core/logging.py` modules
## [0.1.0] — 2025-05-17
Initial release of Veritext, a semantic text validation framework for Python.
### Added
#### Core
- Project scaffold with pyproject.toml and development tooling
- Core exception hierarchy (`VeritextError` and subclasses)
- Core types: `ValidationContext`, `CheckResult`, `ValidationResult`
- Word tokeniser with Unicode normalisation support
- Configuration module with pydantic-settings
- Structured logging with structlog
#### Metrics
- Metrics module with `Metric` protocol, `AggregateStats`, and `BatchResult` types
- BLEU metric implementation (BLEU-1 through BLEU-4 with brevity penalty)
- ROUGE metric (ROUGE-1, ROUGE-2, ROUGE-L with precision/recall/F-measure)
- Lexical similarity metric (Jaccard similarity and token overlap)
- Flesch-Kincaid readability metrics (grade level and reading ease)
- Batch scoring with aggregate statistics for all metrics
#### Validators
- Validators module with `Check` protocol for validation checks
- Metric-based validators: `BleuValidator`, `RougeValidator`, `LexicalValidator`
- Constraint validators: `LengthValidator`, `ReadabilityValidator`, `ContainsValidator`, `ExcludesValidator`
- Composite validators: `AllOf` (all checks must pass), `AnyOf` (any check must pass)
- Factory functions for clean validator API (`bleu()`, `rouge()`, `lexical()`, `length()`, `readability()`, `contains()`, `excludes()`, `all_of()`, `any_of()`)
#### Semantic Similarity
- Semantic similarity module with embedding-based text comparison (requires `veritext[semantic]` extra)
- `SemanticSimilarity` metric using sentence-transformers for semantic relatedness
- `SemanticValidator` for threshold-based semantic similarity validation
- `semantic()` factory function for creating semantic validators
- Embedding caching for performance optimisation in repeated comparisons
#### Pytest Plugin
- Native pytest plugin for CI/CD integration (entry point: `pytest11`)
- `validate_text()` assertion function for expressive test assertions
- `text_validation` marker for filtering validation tests
- Pytest fixtures: `text_validator` factory and `validation_context` helper
- Detailed failure messages with text preview and check diagnostics
#### Benchmarking
- Benchmark module for quality tracking and regression detection
- `Benchmark` class for evaluating text quality over time with metric storage
- `BenchmarkRun` and `RegressionReport` data models for tracking runs
- SQLite storage backend with WAL mode for concurrent access
- Rolling window baseline computation for historical comparison
- `check_regression()` for statistical comparison against baseline
- `assert_no_regression()` raises `RegressionDetectedError` for CI integration
- Customisable tolerance threshold and window size for regression detection
- Metadata support for tracking git SHA, model versions, etc.
#### CLI
- Command-line interface (CLI) via `veritext` command
- `veritext validate` command for inline and file-based text validation
- JSONL input format support for batch validation (`--file` option)
- Separate candidate/reference file support (`--reference-file` option)
- Multiple output formats: table (default), JSON, and simple text
- `veritext benchmark run` command for running evaluations and storing results
- `veritext benchmark show` command for viewing benchmark history
- `veritext benchmark check` command for regression detection with exit code 1 on failure
- Rich-formatted terminal output with tables and coloured panels
#### Documentation
- Readme with usage examples
- Example scripts: basic validation, chatbot testing, benchmark regression