Configure Python project with pydantic, structlog, typer, rich dependencies. Set up ruff, mypy, pytest tooling with strict type checking.
5.3 KiB
5.3 KiB
Changelog
All notable changes to Veritext will be documented in this file.
The format is based on Keep a Changelog, and this project adheres to Semantic Versioning.
[Unreleased]
Changed
- Refactored CLI metric computation to eliminate code duplication
- Version format updated from
0.1.0-devto0.1.0.dev0(PEP 440 compliance) - Settings instance is now cached via
@lru_cachefor better performance - Documented composite validators' intentional deviation from
Checkprotocol return type
Fixed
- Consolidated redundant empty checks in ROUGE-L computation
- Fixed README example using incorrect property names (
grade_level→flesch_kincaid_grade,reading_ease→flesch_reading_ease)
Documentation
- Added Phase 10 (Portfolio Demos) to implementation plan: Streamlit demo and Jupyter notebooks
- Updated project plan with portfolio demo section
- Fixed potential crash in ROUGE metric when all references are empty after tokenisation
- Fixed potential division by zero in readability metric when text has no sentence endings
- Fixed unbounded cache growth in
SemanticSimilarityby implementing LRU eviction with configurable max size - Fixed mutable list aliasing in
AllOfandAnyOfcomposite validators - Fixed regex pattern validation in
ContainsValidatorandExcludesValidatorto fail at init time rather than duringcheck() - Fixed pytest plugin tests failing with duplicate plugin registration error
Added
- Added
.scoreproperty toLexicalResultfor API consistency with other result types - Added
cache_max_sizeparameter toSemanticSimilarity(default: 1000 embeddings) - Added test coverage for
core/config.pyandcore/logging.pymodules
[0.1.0] — 2025-05-17
Initial release of Veritext, a semantic text validation framework for Python.
Added
Core
- Project scaffold with pyproject.toml and development tooling
- Core exception hierarchy (
VeritextErrorand subclasses) - Core types:
ValidationContext,CheckResult,ValidationResult - Word tokeniser with Unicode normalisation support
- Configuration module with pydantic-settings
- Structured logging with structlog
Metrics
- Metrics module with
Metricprotocol,AggregateStats, andBatchResulttypes - BLEU metric implementation (BLEU-1 through BLEU-4 with brevity penalty)
- ROUGE metric (ROUGE-1, ROUGE-2, ROUGE-L with precision/recall/F-measure)
- Lexical similarity metric (Jaccard similarity and token overlap)
- Flesch-Kincaid readability metrics (grade level and reading ease)
- Batch scoring with aggregate statistics for all metrics
Validators
- Validators module with
Checkprotocol for validation checks - Metric-based validators:
BleuValidator,RougeValidator,LexicalValidator - Constraint validators:
LengthValidator,ReadabilityValidator,ContainsValidator,ExcludesValidator - Composite validators:
AllOf(all checks must pass),AnyOf(any check must pass) - Factory functions for clean validator API (
bleu(),rouge(),lexical(),length(),readability(),contains(),excludes(),all_of(),any_of())
Semantic Similarity
- Semantic similarity module with embedding-based text comparison (requires
veritext[semantic]extra) SemanticSimilaritymetric using sentence-transformers for semantic relatednessSemanticValidatorfor threshold-based semantic similarity validationsemantic()factory function for creating semantic validators- Embedding caching for performance optimisation in repeated comparisons
Pytest Plugin
- Native pytest plugin for CI/CD integration (entry point:
pytest11) validate_text()assertion function for expressive test assertionstext_validationmarker for filtering validation tests- Pytest fixtures:
text_validatorfactory andvalidation_contexthelper - Detailed failure messages with text preview and check diagnostics
Benchmarking
- Benchmark module for quality tracking and regression detection
Benchmarkclass for evaluating text quality over time with metric storageBenchmarkRunandRegressionReportdata models for tracking runs- SQLite storage backend with WAL mode for concurrent access
- Rolling window baseline computation for historical comparison
check_regression()for statistical comparison against baselineassert_no_regression()raisesRegressionDetectedErrorfor CI integration- Customisable tolerance threshold and window size for regression detection
- Metadata support for tracking git SHA, model versions, etc.
CLI
- Command-line interface (CLI) via
veritextcommand veritext validatecommand for inline and file-based text validation- JSONL input format support for batch validation (
--fileoption) - Separate candidate/reference file support (
--reference-fileoption) - Multiple output formats: table (default), JSON, and simple text
veritext benchmark runcommand for running evaluations and storing resultsveritext benchmark showcommand for viewing benchmark historyveritext benchmark checkcommand for regression detection with exit code 1 on failure- Rich-formatted terminal output with tables and coloured panels
Documentation
- Readme with usage examples
- Example scripts: basic validation, chatbot testing, benchmark regression