project setup: pyproject.toml, deps, tooling
Configure Python project with pydantic, structlog, typer, rich dependencies. Set up ruff, mypy, pytest tooling with strict type checking.
This commit is contained in:
114
changelog.md
Normal file
114
changelog.md
Normal file
@@ -0,0 +1,114 @@
|
|||||||
|
# Changelog
|
||||||
|
|
||||||
|
All notable changes to Veritext will be documented in this file.
|
||||||
|
|
||||||
|
The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.1.0/),
|
||||||
|
and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).
|
||||||
|
|
||||||
|
## [Unreleased]
|
||||||
|
|
||||||
|
### Changed
|
||||||
|
|
||||||
|
- Refactored CLI metric computation to eliminate code duplication
|
||||||
|
- Version format updated from `0.1.0-dev` to `0.1.0.dev0` (PEP 440 compliance)
|
||||||
|
- Settings instance is now cached via `@lru_cache` for better performance
|
||||||
|
- Documented composite validators' intentional deviation from `Check` protocol return type
|
||||||
|
|
||||||
|
### Fixed
|
||||||
|
|
||||||
|
- Consolidated redundant empty checks in ROUGE-L computation
|
||||||
|
- Fixed README example using incorrect property names (`grade_level` → `flesch_kincaid_grade`, `reading_ease` → `flesch_reading_ease`)
|
||||||
|
|
||||||
|
### Documentation
|
||||||
|
|
||||||
|
- Added Phase 10 (Portfolio Demos) to implementation plan: Streamlit demo and Jupyter notebooks
|
||||||
|
- Updated project plan with portfolio demo section
|
||||||
|
- Fixed potential crash in ROUGE metric when all references are empty after tokenisation
|
||||||
|
- Fixed potential division by zero in readability metric when text has no sentence endings
|
||||||
|
- Fixed unbounded cache growth in `SemanticSimilarity` by implementing LRU eviction with configurable max size
|
||||||
|
- Fixed mutable list aliasing in `AllOf` and `AnyOf` composite validators
|
||||||
|
- Fixed regex pattern validation in `ContainsValidator` and `ExcludesValidator` to fail at init time rather than during `check()`
|
||||||
|
- Fixed pytest plugin tests failing with duplicate plugin registration error
|
||||||
|
|
||||||
|
### Added
|
||||||
|
|
||||||
|
- Added `.score` property to `LexicalResult` for API consistency with other result types
|
||||||
|
- Added `cache_max_size` parameter to `SemanticSimilarity` (default: 1000 embeddings)
|
||||||
|
- Added test coverage for `core/config.py` and `core/logging.py` modules
|
||||||
|
|
||||||
|
## [0.1.0] — 2025-05-17
|
||||||
|
|
||||||
|
Initial release of Veritext, a semantic text validation framework for Python.
|
||||||
|
|
||||||
|
### Added
|
||||||
|
|
||||||
|
#### Core
|
||||||
|
|
||||||
|
- Project scaffold with pyproject.toml and development tooling
|
||||||
|
- Core exception hierarchy (`VeritextError` and subclasses)
|
||||||
|
- Core types: `ValidationContext`, `CheckResult`, `ValidationResult`
|
||||||
|
- Word tokeniser with Unicode normalisation support
|
||||||
|
- Configuration module with pydantic-settings
|
||||||
|
- Structured logging with structlog
|
||||||
|
|
||||||
|
#### Metrics
|
||||||
|
|
||||||
|
- Metrics module with `Metric` protocol, `AggregateStats`, and `BatchResult` types
|
||||||
|
- BLEU metric implementation (BLEU-1 through BLEU-4 with brevity penalty)
|
||||||
|
- ROUGE metric (ROUGE-1, ROUGE-2, ROUGE-L with precision/recall/F-measure)
|
||||||
|
- Lexical similarity metric (Jaccard similarity and token overlap)
|
||||||
|
- Flesch-Kincaid readability metrics (grade level and reading ease)
|
||||||
|
- Batch scoring with aggregate statistics for all metrics
|
||||||
|
|
||||||
|
#### Validators
|
||||||
|
|
||||||
|
- Validators module with `Check` protocol for validation checks
|
||||||
|
- Metric-based validators: `BleuValidator`, `RougeValidator`, `LexicalValidator`
|
||||||
|
- Constraint validators: `LengthValidator`, `ReadabilityValidator`, `ContainsValidator`, `ExcludesValidator`
|
||||||
|
- Composite validators: `AllOf` (all checks must pass), `AnyOf` (any check must pass)
|
||||||
|
- Factory functions for clean validator API (`bleu()`, `rouge()`, `lexical()`, `length()`, `readability()`, `contains()`, `excludes()`, `all_of()`, `any_of()`)
|
||||||
|
|
||||||
|
#### Semantic Similarity
|
||||||
|
|
||||||
|
- Semantic similarity module with embedding-based text comparison (requires `veritext[semantic]` extra)
|
||||||
|
- `SemanticSimilarity` metric using sentence-transformers for semantic relatedness
|
||||||
|
- `SemanticValidator` for threshold-based semantic similarity validation
|
||||||
|
- `semantic()` factory function for creating semantic validators
|
||||||
|
- Embedding caching for performance optimisation in repeated comparisons
|
||||||
|
|
||||||
|
#### Pytest Plugin
|
||||||
|
|
||||||
|
- Native pytest plugin for CI/CD integration (entry point: `pytest11`)
|
||||||
|
- `validate_text()` assertion function for expressive test assertions
|
||||||
|
- `text_validation` marker for filtering validation tests
|
||||||
|
- Pytest fixtures: `text_validator` factory and `validation_context` helper
|
||||||
|
- Detailed failure messages with text preview and check diagnostics
|
||||||
|
|
||||||
|
#### Benchmarking
|
||||||
|
|
||||||
|
- Benchmark module for quality tracking and regression detection
|
||||||
|
- `Benchmark` class for evaluating text quality over time with metric storage
|
||||||
|
- `BenchmarkRun` and `RegressionReport` data models for tracking runs
|
||||||
|
- SQLite storage backend with WAL mode for concurrent access
|
||||||
|
- Rolling window baseline computation for historical comparison
|
||||||
|
- `check_regression()` for statistical comparison against baseline
|
||||||
|
- `assert_no_regression()` raises `RegressionDetectedError` for CI integration
|
||||||
|
- Customisable tolerance threshold and window size for regression detection
|
||||||
|
- Metadata support for tracking git SHA, model versions, etc.
|
||||||
|
|
||||||
|
#### CLI
|
||||||
|
|
||||||
|
- Command-line interface (CLI) via `veritext` command
|
||||||
|
- `veritext validate` command for inline and file-based text validation
|
||||||
|
- JSONL input format support for batch validation (`--file` option)
|
||||||
|
- Separate candidate/reference file support (`--reference-file` option)
|
||||||
|
- Multiple output formats: table (default), JSON, and simple text
|
||||||
|
- `veritext benchmark run` command for running evaluations and storing results
|
||||||
|
- `veritext benchmark show` command for viewing benchmark history
|
||||||
|
- `veritext benchmark check` command for regression detection with exit code 1 on failure
|
||||||
|
- Rich-formatted terminal output with tables and coloured panels
|
||||||
|
|
||||||
|
#### Documentation
|
||||||
|
|
||||||
|
- Readme with usage examples
|
||||||
|
- Example scripts: basic validation, chatbot testing, benchmark regression
|
||||||
121
pyproject.toml
Normal file
121
pyproject.toml
Normal file
@@ -0,0 +1,121 @@
|
|||||||
|
[project]
|
||||||
|
name = "veritext"
|
||||||
|
version = "0.1.0-dev"
|
||||||
|
description = "Semantic text validation framework"
|
||||||
|
readme = "readme.md"
|
||||||
|
requires-python = ">=3.11"
|
||||||
|
license = "MIT"
|
||||||
|
authors = [{ name = "Kai Chappell", email = "git@kschappell.com" }]
|
||||||
|
keywords = ["validation", "text", "nlp", "testing", "quality"]
|
||||||
|
classifiers = [
|
||||||
|
"Development Status :: 3 - Alpha",
|
||||||
|
"Intended Audience :: Developers",
|
||||||
|
"License :: OSI Approved :: MIT License",
|
||||||
|
"Programming Language :: Python :: 3",
|
||||||
|
"Programming Language :: Python :: 3.11",
|
||||||
|
"Programming Language :: Python :: 3.12",
|
||||||
|
"Programming Language :: Python :: 3.13",
|
||||||
|
"Topic :: Software Development :: Testing",
|
||||||
|
"Topic :: Text Processing",
|
||||||
|
"Typing :: Typed",
|
||||||
|
]
|
||||||
|
dependencies = [
|
||||||
|
"pydantic>=2.0",
|
||||||
|
"pydantic-settings>=2.0",
|
||||||
|
"structlog>=23.0",
|
||||||
|
"typer>=0.9",
|
||||||
|
"rich>=13.0",
|
||||||
|
]
|
||||||
|
|
||||||
|
[project.optional-dependencies]
|
||||||
|
semantic = ["sentence-transformers>=2.2"]
|
||||||
|
dev = [
|
||||||
|
"pytest>=7.0",
|
||||||
|
"pytest-cov>=4.0",
|
||||||
|
"mypy>=1.0",
|
||||||
|
"ruff>=0.1",
|
||||||
|
]
|
||||||
|
all = ["veritext[semantic]"]
|
||||||
|
|
||||||
|
[project.scripts]
|
||||||
|
veritext = "veritext.cli.main:app"
|
||||||
|
|
||||||
|
[project.entry-points.pytest11]
|
||||||
|
veritext = "veritext.pytest_plugin"
|
||||||
|
|
||||||
|
[project.urls]
|
||||||
|
Repository = "https://gitea.kschappell.com/kschappell/veritext"
|
||||||
|
|
||||||
|
[build-system]
|
||||||
|
requires = ["hatchling"]
|
||||||
|
build-backend = "hatchling.build"
|
||||||
|
|
||||||
|
[tool.hatch.build.targets.wheel]
|
||||||
|
packages = ["src/veritext"]
|
||||||
|
|
||||||
|
[tool.ruff]
|
||||||
|
line-length = 88
|
||||||
|
target-version = "py311"
|
||||||
|
src = ["src", "tests"]
|
||||||
|
|
||||||
|
[tool.ruff.lint]
|
||||||
|
select = [
|
||||||
|
"E", # pycodestyle errors
|
||||||
|
"W", # pycodestyle warnings
|
||||||
|
"F", # pyflakes
|
||||||
|
"I", # isort
|
||||||
|
"B", # flake8-bugbear
|
||||||
|
"C4", # flake8-comprehensions
|
||||||
|
"UP", # pyupgrade
|
||||||
|
"ARG", # flake8-unused-arguments
|
||||||
|
"SIM", # flake8-simplify
|
||||||
|
"TCH", # flake8-type-checking
|
||||||
|
"PTH", # flake8-use-pathlib
|
||||||
|
"RUF", # ruff-specific
|
||||||
|
]
|
||||||
|
ignore = [
|
||||||
|
"E501", # line too long (handled by formatter)
|
||||||
|
]
|
||||||
|
|
||||||
|
[tool.ruff.lint.isort]
|
||||||
|
known-first-party = ["veritext"]
|
||||||
|
|
||||||
|
[tool.mypy]
|
||||||
|
python_version = "3.11"
|
||||||
|
mypy_path = ["src"]
|
||||||
|
strict = true
|
||||||
|
warn_return_any = true
|
||||||
|
warn_unused_ignores = true
|
||||||
|
disallow_untyped_defs = true
|
||||||
|
disallow_incomplete_defs = true
|
||||||
|
check_untyped_defs = true
|
||||||
|
disallow_untyped_decorators = true
|
||||||
|
no_implicit_optional = true
|
||||||
|
warn_redundant_casts = true
|
||||||
|
warn_unused_configs = true
|
||||||
|
show_error_codes = true
|
||||||
|
files = ["src/veritext"]
|
||||||
|
|
||||||
|
[[tool.mypy.overrides]]
|
||||||
|
module = ["sentence_transformers.*"]
|
||||||
|
ignore_missing_imports = true
|
||||||
|
|
||||||
|
[[tool.mypy.overrides]]
|
||||||
|
module = ["structlog", "structlog.*"]
|
||||||
|
ignore_missing_imports = true
|
||||||
|
|
||||||
|
[tool.pytest.ini_options]
|
||||||
|
testpaths = ["tests"]
|
||||||
|
addopts = "-v --tb=short"
|
||||||
|
pythonpath = ["src"]
|
||||||
|
|
||||||
|
[tool.coverage.run]
|
||||||
|
source = ["src/veritext"]
|
||||||
|
branch = true
|
||||||
|
|
||||||
|
[tool.coverage.report]
|
||||||
|
exclude_lines = [
|
||||||
|
"pragma: no cover",
|
||||||
|
"if TYPE_CHECKING:",
|
||||||
|
"raise NotImplementedError",
|
||||||
|
]
|
||||||
50
readme.md
Normal file
50
readme.md
Normal file
@@ -0,0 +1,50 @@
|
|||||||
|
# Veritext
|
||||||
|
|
||||||
|
Semantic text validation framework for Python.
|
||||||
|
|
||||||
|
Validates text outputs against quality criteria using metrics like BLEU, ROUGE,
|
||||||
|
and semantic similarity. Designed for developers building systems that produce
|
||||||
|
text (chatbots, content generators, summarisation tools) who need automated
|
||||||
|
quality assurance beyond simple string matching.
|
||||||
|
|
||||||
|
## Status
|
||||||
|
|
||||||
|
Under active development. See [changelog.md](changelog.md) for progress.
|
||||||
|
|
||||||
|
## Installation
|
||||||
|
|
||||||
|
```bash
|
||||||
|
pip install veritext
|
||||||
|
|
||||||
|
# With semantic similarity support
|
||||||
|
pip install veritext[semantic]
|
||||||
|
```
|
||||||
|
|
||||||
|
## Quick Start
|
||||||
|
|
||||||
|
```python
|
||||||
|
from veritext import validators as v
|
||||||
|
from veritext.core.types import ValidationContext
|
||||||
|
|
||||||
|
# Create validators
|
||||||
|
validator = v.all_of([
|
||||||
|
v.bleu(min_score=0.7),
|
||||||
|
v.length(max_chars=500),
|
||||||
|
])
|
||||||
|
|
||||||
|
# Validate text
|
||||||
|
context = ValidationContext(reference="The cat sat on the mat.")
|
||||||
|
result = validator.check("A cat is sitting on the mat.", context)
|
||||||
|
|
||||||
|
if not result.passed:
|
||||||
|
print(result.failure_summary)
|
||||||
|
```
|
||||||
|
|
||||||
|
## Documentation
|
||||||
|
|
||||||
|
- [Project Plan](docs/project-plan.md)
|
||||||
|
- [Implementation Plan](docs/implementation-plan.md)
|
||||||
|
|
||||||
|
## Licence
|
||||||
|
|
||||||
|
MIT
|
||||||
Reference in New Issue
Block a user