The pytest plugin is already loaded via the entry point, so explicitly declaring it in conftest causes a duplicate registration error.
4.8 KiB
4.8 KiB
Changelog
All notable changes to Veritext will be documented in this file.
The format is based on Keep a Changelog, and this project adheres to Semantic Versioning.
[Unreleased]
Fixed
- Fixed README example using incorrect property names (
grade_level→flesch_kincaid_grade,reading_ease→flesch_reading_ease) - Fixed potential crash in ROUGE metric when all references are empty after tokenisation
- Fixed potential division by zero in readability metric when text has no sentence endings
- Fixed unbounded cache growth in
SemanticSimilarityby implementing LRU eviction with configurable max size - Fixed mutable list aliasing in
AllOfandAnyOfcomposite validators - Fixed regex pattern validation in
ContainsValidatorandExcludesValidatorto fail at init time rather than duringcheck() - Fixed pytest plugin tests failing with duplicate plugin registration error
Added
- Added
.scoreproperty toLexicalResultfor API consistency with other result types - Added
cache_max_sizeparameter toSemanticSimilarity(default: 1000 embeddings) - Added test coverage for
core/config.pyandcore/logging.pymodules
[0.1.0] — 2026-02-03
Initial release of Veritext, a semantic text validation framework for Python.
Added
Core
- Project scaffold with pyproject.toml and development tooling
- Core exception hierarchy (
VeritextErrorand subclasses) - Core types:
ValidationContext,CheckResult,ValidationResult - Word tokeniser with Unicode normalisation support
- Configuration module with pydantic-settings
- Structured logging with structlog
Metrics
- Metrics module with
Metricprotocol,AggregateStats, andBatchResulttypes - BLEU metric implementation (BLEU-1 through BLEU-4 with brevity penalty)
- ROUGE metric (ROUGE-1, ROUGE-2, ROUGE-L with precision/recall/F-measure)
- Lexical similarity metric (Jaccard similarity and token overlap)
- Flesch-Kincaid readability metrics (grade level and reading ease)
- Batch scoring with aggregate statistics for all metrics
Validators
- Validators module with
Checkprotocol for validation checks - Metric-based validators:
BleuValidator,RougeValidator,LexicalValidator - Constraint validators:
LengthValidator,ReadabilityValidator,ContainsValidator,ExcludesValidator - Composite validators:
AllOf(all checks must pass),AnyOf(any check must pass) - Factory functions for clean validator API (
bleu(),rouge(),lexical(),length(),readability(),contains(),excludes(),all_of(),any_of())
Semantic Similarity
- Semantic similarity module with embedding-based text comparison (requires
veritext[semantic]extra) SemanticSimilaritymetric using sentence-transformers for semantic relatednessSemanticValidatorfor threshold-based semantic similarity validationsemantic()factory function for creating semantic validators- Embedding caching for performance optimisation in repeated comparisons
Pytest Plugin
- Native pytest plugin for CI/CD integration (entry point:
pytest11) validate_text()assertion function for expressive test assertionstext_validationmarker for filtering validation tests- Pytest fixtures:
text_validatorfactory andvalidation_contexthelper - Detailed failure messages with text preview and check diagnostics
Benchmarking
- Benchmark module for quality tracking and regression detection
Benchmarkclass for evaluating text quality over time with metric storageBenchmarkRunandRegressionReportdata models for tracking runs- SQLite storage backend with WAL mode for concurrent access
- Rolling window baseline computation for historical comparison
check_regression()for statistical comparison against baselineassert_no_regression()raisesRegressionDetectedErrorfor CI integration- Customisable tolerance threshold and window size for regression detection
- Metadata support for tracking git SHA, model versions, etc.
CLI
- Command-line interface (CLI) via
veritextcommand veritext validatecommand for inline and file-based text validation- JSONL input format support for batch validation (
--fileoption) - Separate candidate/reference file support (
--reference-fileoption) - Multiple output formats: table (default), JSON, and simple text
veritext benchmark runcommand for running evaluations and storing resultsveritext benchmark showcommand for viewing benchmark historyveritext benchmark checkcommand for regression detection with exit code 1 on failure- Rich-formatted terminal output with tables and coloured panels
Documentation
- Comprehensive readme with usage examples
- Example scripts: basic validation, chatbot testing, benchmark regression