59 Commits

Author SHA1 Message Date
2519641fa3 clean up CLI, misc polish
- Refactor CLI metric computation to eliminate code duplication
- Update version format to PEP 440 compliance (0.1.0.dev0)
- Cache Settings instance via @lru_cache for performance
- Document composite validators' protocol deviation
- Consolidate redundant empty checks in ROUGE-L computation
- Add Phase 10 (Portfolio Demos) to implementation plan
2025-05-25 13:06:51 +00:00
210de7cd28 fix double-registered pytest plugin
The pytest plugin is already loaded via the entry point, so explicitly
declaring it in conftest causes a duplicate registration error.
2025-05-24 12:05:03 +00:00
8da0e34f77 changelog: QA fixes 2025-05-24 10:26:12 +00:00
3c8d599897 wip config + logging tests
Adds tests for VeritextSettings defaults, env var overrides, and the
get_logger/configure_logging functions.
2025-05-24 10:14:26 +00:00
9e9558b937 fix: validate regex at init, not match time
ContainsValidator and ExcludesValidator now pre-compile regex patterns
during initialisation and raise InvalidThresholdError if invalid.
2025-05-22 21:10:44 +00:00
7da5a46fe1 misc fixes before release
- Fix README readability example property names
- Add validation for empty references after tokenisation in ROUGE
- Guard against zero sentence count in readability metric
- Implement LRU cache with max size for semantic embeddings
- Add .score property to LexicalResult for API consistency
- Use defensive list copy in composite validators
2025-05-22 20:10:54 +00:00
4f9a480e26 changelog for 0.1.0
Initial release with metrics, validators, pytest plugin, benchmark
module, CLI, and comprehensive documentation.
2025-05-17 13:21:33 +00:00
0ab10d6812 write up README
Expands readme with detailed coverage of metrics, validators, pytest
plugin, benchmark module, CLI commands, and development setup.
2025-05-17 12:18:34 +00:00
0ea6adbbf4 example: benchmark regression
Demonstrates benchmark quality tracking with historical comparison and
CI integration using assert_no_regression() for exit code control.
2025-05-17 11:02:05 +00:00
9cf968ad36 example: chatbot testing
Demonstrates pytest integration for chatbot QA with validate_text()
assertions, fixtures, and parametrised content safety tests.
2025-05-14 20:00:17 +00:00
d726f360c1 example: basic validation
Demonstrates core Veritext functionality: metrics, validators, composites,
and constraint validators with runnable code.
2025-05-14 19:09:23 +00:00
96ff86d4a7 changelog: CLI
Document command-line interface including validate command,
benchmark subcommands, and output formatting options.
2025-05-11 14:54:57 +00:00
8511594697 cli tests
Add comprehensive test suite for validate command, benchmark commands,
input readers, and output formatters using Typer CliRunner.
2025-05-11 14:13:30 +00:00
5f619a626b cli benchmark subcommands
Add benchmark run, show, and check commands for quality tracking
with regression detection supporting CI integration.
2025-05-10 12:01:08 +00:00
b02023c8f6 cli validate command
Implement validate command with inline and file-based modes
supporting BLEU, ROUGE, and lexical metrics with multiple output formats.
2025-05-10 10:46:49 +00:00
c765cea93c rich output formatting
Add formatters for validation results (table/json/simple) and
benchmark history display with regression report panels.
2025-05-10 10:12:17 +00:00
7f2e82494c feat: JSONL and directory input readers
Add TextPair dataclass and read_jsonl/read_paired_jsonl functions
for parsing candidate-reference pairs from JSONL files.
2025-05-07 21:17:16 +00:00
ffa8658189 cli skeleton + version command
Initialise Typer app with --version flag and help text.
2025-05-07 20:02:07 +00:00
434f40f69b changelog: benchmark module
Document benchmark module features in changelog.
2025-04-20 15:52:08 +00:00
9afa499af3 benchmark tests
Comprehensive tests for models, storage, regression detection, and runner.
2025-04-20 15:04:33 +00:00
127eb9cac6 wire up benchmark exports
Public API exports for the benchmark module.
2025-04-19 12:14:37 +00:00
73a65656a1 benchmark runner
Main Benchmark class for evaluating text quality and tracking regressions.
2025-04-19 12:08:56 +00:00
32fec2b6d5 regression detection logic
Rolling window baseline computation and statistical regression detection.
2025-04-19 11:13:01 +00:00
e6f55c2781 sqlite storage for benchmarks
Persistent storage for benchmark history with WAL mode for concurrent access.
2025-04-15 19:06:55 +00:00
aad6c3497d benchmark data models
Data models for benchmark runs and regression reports using Pydantic.
2025-04-09 20:42:03 +00:00
4b0a05d00d changelog: pytest plugin 2025-04-09 20:10:02 +00:00
4d103cbe52 test pytest plugin
Cover validate_text assertions, fixture factories, marker registration,
and pytest integration using pytester for subprocess testing.
2025-04-06 15:50:41 +00:00
9f9a3da6cc validate_text assertion helper
Primary API for text validation in pytest with keyword arguments
for BLEU, ROUGE, semantic similarity, length, readability, and
pattern matching. Includes detailed failure formatting.
2025-04-06 15:13:34 +00:00
7528c44dcc pytest plugin: hooks and markers
Register text_validation marker via pytest_configure hook.
2025-04-06 14:03:06 +00:00
4d9a2f3e0c changelog: semantic similarity 2025-04-05 12:35:31 +00:00
06124c12ae test semantic similarity 2025-04-05 11:51:16 +00:00
0dffaa4817 semantic validator 2025-04-05 11:09:54 +00:00
b6c4bad96a feat: semantic similarity metric 2025-04-05 10:03:52 +00:00
40674929b9 semantic result type 2025-04-03 19:58:08 +00:00
7c508a592f changelog: validators
Document validators module with Check protocol, metric validators,
constraint validators, composite validators, and factory functions.
2025-04-03 19:05:21 +00:00
7f23586406 validator tests
Add comprehensive tests for metric validators, constraint validators,
and composite validators covering pass/fail cases and error handling.
2025-03-29 13:30:23 +00:00
8fd1dc4cd3 validator factory functions
Export all validators and provide factory functions for clean API:
bleu(), rouge(), lexical(), length(), readability(), contains(),
excludes(), all_of(), any_of().
2025-03-29 13:12:53 +00:00
62d78ab699 composite validators (all/any/pipeline)
Implement AllOf and AnyOf for combining multiple checks into
composite validation rules.
2025-03-26 18:54:51 +00:00
067cd74566 constraint validators (length, regex, contains)
Implement LengthValidator, ReadabilityValidator, ContainsValidator, and
ExcludesValidator for text constraints without reference text.
2025-03-26 18:06:03 +00:00
3ef262d357 metric validators (threshold checks)
Implement BleuValidator, RougeValidator, and LexicalValidator for
validating text against reference using metric thresholds.
2025-03-22 11:59:22 +00:00
d17d3de06d validator protocol + base types
Define the Check protocol for validation checks that compute a score
and return pass/fail results with diagnostics.
2025-03-22 10:48:29 +00:00
9f53446ca7 changelog: ROUGE and readability 2025-03-22 10:04:35 +00:00
ec48eb5bf5 tests for ROUGE and readability 2025-03-20 20:32:41 +00:00
5c2d626208 export ROUGE + readability 2025-03-20 20:13:05 +00:00
2ef8265754 readability metrics (flesch, gunning fog, etc) 2025-03-16 16:03:32 +00:00
0032f89b17 feat: ROUGE-L scorer 2025-03-16 15:50:45 +00:00
d2a9f28335 ROUGE and readability result types 2025-03-16 15:00:13 +00:00
856bcbccbb changelog: metrics module 2025-03-15 13:30:08 +00:00
afb39cf177 test BLEU and lexical metrics
Add comprehensive tests for BLEU and lexical metrics including edge
cases, batch scoring, and aggregate statistics.
2025-03-15 12:14:45 +00:00
f26e14bf20 lexical similarity (jaccard, overlap, cosine)
Implement Jaccard similarity and token overlap metrics with batch
scoring support.
2025-03-15 12:09:50 +00:00