project setup: pyproject.toml, deps, tooling

Configure Python project with pydantic, structlog, typer, rich dependencies.
Set up ruff, mypy, pytest tooling with strict type checking.
This commit is contained in:
2025-03-08 14:03:32 +00:00
commit bf5884cb27
4 changed files with 1924 additions and 0 deletions

114
changelog.md Normal file
View File

@@ -0,0 +1,114 @@
# Changelog
All notable changes to Veritext will be documented in this file.
The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.1.0/),
and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).
## [Unreleased]
### Changed
- Refactored CLI metric computation to eliminate code duplication
- Version format updated from `0.1.0-dev` to `0.1.0.dev0` (PEP 440 compliance)
- Settings instance is now cached via `@lru_cache` for better performance
- Documented composite validators' intentional deviation from `Check` protocol return type
### Fixed
- Consolidated redundant empty checks in ROUGE-L computation
- Fixed README example using incorrect property names (`grade_level``flesch_kincaid_grade`, `reading_ease``flesch_reading_ease`)
### Documentation
- Added Phase 10 (Portfolio Demos) to implementation plan: Streamlit demo and Jupyter notebooks
- Updated project plan with portfolio demo section
- Fixed potential crash in ROUGE metric when all references are empty after tokenisation
- Fixed potential division by zero in readability metric when text has no sentence endings
- Fixed unbounded cache growth in `SemanticSimilarity` by implementing LRU eviction with configurable max size
- Fixed mutable list aliasing in `AllOf` and `AnyOf` composite validators
- Fixed regex pattern validation in `ContainsValidator` and `ExcludesValidator` to fail at init time rather than during `check()`
- Fixed pytest plugin tests failing with duplicate plugin registration error
### Added
- Added `.score` property to `LexicalResult` for API consistency with other result types
- Added `cache_max_size` parameter to `SemanticSimilarity` (default: 1000 embeddings)
- Added test coverage for `core/config.py` and `core/logging.py` modules
## [0.1.0] — 2025-05-17
Initial release of Veritext, a semantic text validation framework for Python.
### Added
#### Core
- Project scaffold with pyproject.toml and development tooling
- Core exception hierarchy (`VeritextError` and subclasses)
- Core types: `ValidationContext`, `CheckResult`, `ValidationResult`
- Word tokeniser with Unicode normalisation support
- Configuration module with pydantic-settings
- Structured logging with structlog
#### Metrics
- Metrics module with `Metric` protocol, `AggregateStats`, and `BatchResult` types
- BLEU metric implementation (BLEU-1 through BLEU-4 with brevity penalty)
- ROUGE metric (ROUGE-1, ROUGE-2, ROUGE-L with precision/recall/F-measure)
- Lexical similarity metric (Jaccard similarity and token overlap)
- Flesch-Kincaid readability metrics (grade level and reading ease)
- Batch scoring with aggregate statistics for all metrics
#### Validators
- Validators module with `Check` protocol for validation checks
- Metric-based validators: `BleuValidator`, `RougeValidator`, `LexicalValidator`
- Constraint validators: `LengthValidator`, `ReadabilityValidator`, `ContainsValidator`, `ExcludesValidator`
- Composite validators: `AllOf` (all checks must pass), `AnyOf` (any check must pass)
- Factory functions for clean validator API (`bleu()`, `rouge()`, `lexical()`, `length()`, `readability()`, `contains()`, `excludes()`, `all_of()`, `any_of()`)
#### Semantic Similarity
- Semantic similarity module with embedding-based text comparison (requires `veritext[semantic]` extra)
- `SemanticSimilarity` metric using sentence-transformers for semantic relatedness
- `SemanticValidator` for threshold-based semantic similarity validation
- `semantic()` factory function for creating semantic validators
- Embedding caching for performance optimisation in repeated comparisons
#### Pytest Plugin
- Native pytest plugin for CI/CD integration (entry point: `pytest11`)
- `validate_text()` assertion function for expressive test assertions
- `text_validation` marker for filtering validation tests
- Pytest fixtures: `text_validator` factory and `validation_context` helper
- Detailed failure messages with text preview and check diagnostics
#### Benchmarking
- Benchmark module for quality tracking and regression detection
- `Benchmark` class for evaluating text quality over time with metric storage
- `BenchmarkRun` and `RegressionReport` data models for tracking runs
- SQLite storage backend with WAL mode for concurrent access
- Rolling window baseline computation for historical comparison
- `check_regression()` for statistical comparison against baseline
- `assert_no_regression()` raises `RegressionDetectedError` for CI integration
- Customisable tolerance threshold and window size for regression detection
- Metadata support for tracking git SHA, model versions, etc.
#### CLI
- Command-line interface (CLI) via `veritext` command
- `veritext validate` command for inline and file-based text validation
- JSONL input format support for batch validation (`--file` option)
- Separate candidate/reference file support (`--reference-file` option)
- Multiple output formats: table (default), JSON, and simple text
- `veritext benchmark run` command for running evaluations and storing results
- `veritext benchmark show` command for viewing benchmark history
- `veritext benchmark check` command for regression detection with exit code 1 on failure
- Rich-formatted terminal output with tables and coloured panels
#### Documentation
- Readme with usage examples
- Example scripts: basic validation, chatbot testing, benchmark regression

121
pyproject.toml Normal file
View File

@@ -0,0 +1,121 @@
[project]
name = "veritext"
version = "0.1.0-dev"
description = "Semantic text validation framework"
readme = "readme.md"
requires-python = ">=3.11"
license = "MIT"
authors = [{ name = "Kai Chappell", email = "git@kschappell.com" }]
keywords = ["validation", "text", "nlp", "testing", "quality"]
classifiers = [
"Development Status :: 3 - Alpha",
"Intended Audience :: Developers",
"License :: OSI Approved :: MIT License",
"Programming Language :: Python :: 3",
"Programming Language :: Python :: 3.11",
"Programming Language :: Python :: 3.12",
"Programming Language :: Python :: 3.13",
"Topic :: Software Development :: Testing",
"Topic :: Text Processing",
"Typing :: Typed",
]
dependencies = [
"pydantic>=2.0",
"pydantic-settings>=2.0",
"structlog>=23.0",
"typer>=0.9",
"rich>=13.0",
]
[project.optional-dependencies]
semantic = ["sentence-transformers>=2.2"]
dev = [
"pytest>=7.0",
"pytest-cov>=4.0",
"mypy>=1.0",
"ruff>=0.1",
]
all = ["veritext[semantic]"]
[project.scripts]
veritext = "veritext.cli.main:app"
[project.entry-points.pytest11]
veritext = "veritext.pytest_plugin"
[project.urls]
Repository = "https://gitea.kschappell.com/kschappell/veritext"
[build-system]
requires = ["hatchling"]
build-backend = "hatchling.build"
[tool.hatch.build.targets.wheel]
packages = ["src/veritext"]
[tool.ruff]
line-length = 88
target-version = "py311"
src = ["src", "tests"]
[tool.ruff.lint]
select = [
"E", # pycodestyle errors
"W", # pycodestyle warnings
"F", # pyflakes
"I", # isort
"B", # flake8-bugbear
"C4", # flake8-comprehensions
"UP", # pyupgrade
"ARG", # flake8-unused-arguments
"SIM", # flake8-simplify
"TCH", # flake8-type-checking
"PTH", # flake8-use-pathlib
"RUF", # ruff-specific
]
ignore = [
"E501", # line too long (handled by formatter)
]
[tool.ruff.lint.isort]
known-first-party = ["veritext"]
[tool.mypy]
python_version = "3.11"
mypy_path = ["src"]
strict = true
warn_return_any = true
warn_unused_ignores = true
disallow_untyped_defs = true
disallow_incomplete_defs = true
check_untyped_defs = true
disallow_untyped_decorators = true
no_implicit_optional = true
warn_redundant_casts = true
warn_unused_configs = true
show_error_codes = true
files = ["src/veritext"]
[[tool.mypy.overrides]]
module = ["sentence_transformers.*"]
ignore_missing_imports = true
[[tool.mypy.overrides]]
module = ["structlog", "structlog.*"]
ignore_missing_imports = true
[tool.pytest.ini_options]
testpaths = ["tests"]
addopts = "-v --tb=short"
pythonpath = ["src"]
[tool.coverage.run]
source = ["src/veritext"]
branch = true
[tool.coverage.report]
exclude_lines = [
"pragma: no cover",
"if TYPE_CHECKING:",
"raise NotImplementedError",
]

50
readme.md Normal file
View File

@@ -0,0 +1,50 @@
# Veritext
Semantic text validation framework for Python.
Validates text outputs against quality criteria using metrics like BLEU, ROUGE,
and semantic similarity. Designed for developers building systems that produce
text (chatbots, content generators, summarisation tools) who need automated
quality assurance beyond simple string matching.
## Status
Under active development. See [changelog.md](changelog.md) for progress.
## Installation
```bash
pip install veritext
# With semantic similarity support
pip install veritext[semantic]
```
## Quick Start
```python
from veritext import validators as v
from veritext.core.types import ValidationContext
# Create validators
validator = v.all_of([
v.bleu(min_score=0.7),
v.length(max_chars=500),
])
# Validate text
context = ValidationContext(reference="The cat sat on the mat.")
result = validator.check("A cat is sitting on the mat.", context)
if not result.passed:
print(result.failure_summary)
```
## Documentation
- [Project Plan](docs/project-plan.md)
- [Implementation Plan](docs/implementation-plan.md)
## Licence
MIT

1639
uv.lock generated Normal file

File diff suppressed because it is too large Load Diff