veritext

Author	SHA1	Message	Date
kschappell	2519641fa3	clean up CLI, misc polish - Refactor CLI metric computation to eliminate code duplication - Update version format to PEP 440 compliance (0.1.0.dev0) - Cache Settings instance via @lru_cache for performance - Document composite validators' protocol deviation - Consolidate redundant empty checks in ROUGE-L computation - Add Phase 10 (Portfolio Demos) to implementation plan	2025-05-25 13:06:51 +00:00
kschappell	210de7cd28	fix double-registered pytest plugin The pytest plugin is already loaded via the entry point, so explicitly declaring it in conftest causes a duplicate registration error.	2025-05-24 12:05:03 +00:00
kschappell	8da0e34f77	changelog: QA fixes	2025-05-24 10:26:12 +00:00
kschappell	3c8d599897	wip config + logging tests Adds tests for VeritextSettings defaults, env var overrides, and the get_logger/configure_logging functions.	2025-05-24 10:14:26 +00:00
kschappell	9e9558b937	fix: validate regex at init, not match time ContainsValidator and ExcludesValidator now pre-compile regex patterns during initialisation and raise InvalidThresholdError if invalid.	2025-05-22 21:10:44 +00:00
kschappell	7da5a46fe1	misc fixes before release - Fix README readability example property names - Add validation for empty references after tokenisation in ROUGE - Guard against zero sentence count in readability metric - Implement LRU cache with max size for semantic embeddings - Add .score property to LexicalResult for API consistency - Use defensive list copy in composite validators	2025-05-22 20:10:54 +00:00
kschappell	4f9a480e26	changelog for 0.1.0 Initial release with metrics, validators, pytest plugin, benchmark module, CLI, and comprehensive documentation.	2025-05-17 13:21:33 +00:00
kschappell	0ab10d6812	write up README Expands readme with detailed coverage of metrics, validators, pytest plugin, benchmark module, CLI commands, and development setup.	2025-05-17 12:18:34 +00:00
kschappell	0ea6adbbf4	example: benchmark regression Demonstrates benchmark quality tracking with historical comparison and CI integration using assert_no_regression() for exit code control.	2025-05-17 11:02:05 +00:00
kschappell	9cf968ad36	example: chatbot testing Demonstrates pytest integration for chatbot QA with validate_text() assertions, fixtures, and parametrised content safety tests.	2025-05-14 20:00:17 +00:00
kschappell	d726f360c1	example: basic validation Demonstrates core Veritext functionality: metrics, validators, composites, and constraint validators with runnable code.	2025-05-14 19:09:23 +00:00
kschappell	96ff86d4a7	changelog: CLI Document command-line interface including validate command, benchmark subcommands, and output formatting options.	2025-05-11 14:54:57 +00:00
kschappell	8511594697	cli tests Add comprehensive test suite for validate command, benchmark commands, input readers, and output formatters using Typer CliRunner.	2025-05-11 14:13:30 +00:00
kschappell	5f619a626b	cli benchmark subcommands Add benchmark run, show, and check commands for quality tracking with regression detection supporting CI integration.	2025-05-10 12:01:08 +00:00
kschappell	b02023c8f6	cli validate command Implement validate command with inline and file-based modes supporting BLEU, ROUGE, and lexical metrics with multiple output formats.	2025-05-10 10:46:49 +00:00
kschappell	c765cea93c	rich output formatting Add formatters for validation results (table/json/simple) and benchmark history display with regression report panels.	2025-05-10 10:12:17 +00:00
kschappell	7f2e82494c	feat: JSONL and directory input readers Add TextPair dataclass and read_jsonl/read_paired_jsonl functions for parsing candidate-reference pairs from JSONL files.	2025-05-07 21:17:16 +00:00
kschappell	ffa8658189	cli skeleton + version command Initialise Typer app with --version flag and help text.	2025-05-07 20:02:07 +00:00
kschappell	434f40f69b	changelog: benchmark module Document benchmark module features in changelog.	2025-04-20 15:52:08 +00:00
kschappell	9afa499af3	benchmark tests Comprehensive tests for models, storage, regression detection, and runner.	2025-04-20 15:04:33 +00:00
kschappell	127eb9cac6	wire up benchmark exports Public API exports for the benchmark module.	2025-04-19 12:14:37 +00:00
kschappell	73a65656a1	benchmark runner Main Benchmark class for evaluating text quality and tracking regressions.	2025-04-19 12:08:56 +00:00
kschappell	32fec2b6d5	regression detection logic Rolling window baseline computation and statistical regression detection.	2025-04-19 11:13:01 +00:00
kschappell	e6f55c2781	sqlite storage for benchmarks Persistent storage for benchmark history with WAL mode for concurrent access.	2025-04-15 19:06:55 +00:00
kschappell	aad6c3497d	benchmark data models Data models for benchmark runs and regression reports using Pydantic.	2025-04-09 20:42:03 +00:00
kschappell	4b0a05d00d	changelog: pytest plugin	2025-04-09 20:10:02 +00:00
kschappell	4d103cbe52	test pytest plugin Cover validate_text assertions, fixture factories, marker registration, and pytest integration using pytester for subprocess testing.	2025-04-06 15:50:41 +00:00
kschappell	9f9a3da6cc	validate_text assertion helper Primary API for text validation in pytest with keyword arguments for BLEU, ROUGE, semantic similarity, length, readability, and pattern matching. Includes detailed failure formatting.	2025-04-06 15:13:34 +00:00
kschappell	7528c44dcc	pytest plugin: hooks and markers Register text_validation marker via pytest_configure hook.	2025-04-06 14:03:06 +00:00
kschappell	4d9a2f3e0c	changelog: semantic similarity	2025-04-05 12:35:31 +00:00
kschappell	06124c12ae	test semantic similarity	2025-04-05 11:51:16 +00:00
kschappell	0dffaa4817	semantic validator	2025-04-05 11:09:54 +00:00
kschappell	b6c4bad96a	feat: semantic similarity metric	2025-04-05 10:03:52 +00:00
kschappell	40674929b9	semantic result type	2025-04-03 19:58:08 +00:00
kschappell	7c508a592f	changelog: validators Document validators module with Check protocol, metric validators, constraint validators, composite validators, and factory functions.	2025-04-03 19:05:21 +00:00
kschappell	7f23586406	validator tests Add comprehensive tests for metric validators, constraint validators, and composite validators covering pass/fail cases and error handling.	2025-03-29 13:30:23 +00:00
kschappell	8fd1dc4cd3	validator factory functions Export all validators and provide factory functions for clean API: bleu(), rouge(), lexical(), length(), readability(), contains(), excludes(), all_of(), any_of().	2025-03-29 13:12:53 +00:00
kschappell	62d78ab699	composite validators (all/any/pipeline) Implement AllOf and AnyOf for combining multiple checks into composite validation rules.	2025-03-26 18:54:51 +00:00
kschappell	067cd74566	constraint validators (length, regex, contains) Implement LengthValidator, ReadabilityValidator, ContainsValidator, and ExcludesValidator for text constraints without reference text.	2025-03-26 18:06:03 +00:00
kschappell	3ef262d357	metric validators (threshold checks) Implement BleuValidator, RougeValidator, and LexicalValidator for validating text against reference using metric thresholds.	2025-03-22 11:59:22 +00:00
kschappell	d17d3de06d	validator protocol + base types Define the Check protocol for validation checks that compute a score and return pass/fail results with diagnostics.	2025-03-22 10:48:29 +00:00
kschappell	9f53446ca7	changelog: ROUGE and readability	2025-03-22 10:04:35 +00:00
kschappell	ec48eb5bf5	tests for ROUGE and readability	2025-03-20 20:32:41 +00:00
kschappell	5c2d626208	export ROUGE + readability	2025-03-20 20:13:05 +00:00
kschappell	2ef8265754	readability metrics (flesch, gunning fog, etc)	2025-03-16 16:03:32 +00:00
kschappell	0032f89b17	feat: ROUGE-L scorer	2025-03-16 15:50:45 +00:00
kschappell	d2a9f28335	ROUGE and readability result types	2025-03-16 15:00:13 +00:00
kschappell	856bcbccbb	changelog: metrics module	2025-03-15 13:30:08 +00:00
kschappell	afb39cf177	test BLEU and lexical metrics Add comprehensive tests for BLEU and lexical metrics including edge cases, batch scoring, and aggregate statistics.	2025-03-15 12:14:45 +00:00
kschappell	f26e14bf20	lexical similarity (jaccard, overlap, cosine) Implement Jaccard similarity and token overlap metrics with batch scoring support.	2025-03-15 12:09:50 +00:00

1 2

59 Commits