refactor: CLI cleanup and documentation updates

- Refactor CLI metric computation to eliminate code duplication
- Update version format to PEP 440 compliance (0.1.0.dev0)
- Cache Settings instance via @lru_cache for performance
- Document composite validators' protocol deviation
- Consolidate redundant empty checks in ROUGE-L computation
- Add Phase 10 (Portfolio Demos) to implementation plan
This commit is contained in:
2026-02-04 15:38:46 +00:00
parent 7de4505e31
commit 0699e97e1d
8 changed files with 224 additions and 66 deletions

View File

@@ -488,3 +488,47 @@ benchmark.assert_no_regression(tolerance=0.03)
5. **Natural portfolio narrative** — "I was building X and needed a better way to test
it, so I built this tool." Every interviewer has faced similar problems.
---
## Portfolio Demos (Future)
Interactive demos to showcase Veritext without requiring installation.
### Streamlit Demo
A quick interactive web UI for general visitors and recruiters.
**Features:**
- Text input boxes (candidate + reference)
- Metric selector (BLEU, ROUGE, lexical, readability)
- Threshold sliders for pass/fail validation
- Results table with scores and status
**Deployment:** Self-hosted on homeserver (e.g., `veritext.kschappell.com`)
**Effort:** ~30 minutes
### Jupyter Notebook Collection
Deep-dive notebooks targeting data science and ML recruiters.
**Notebooks:**
| Notebook | Purpose |
|----------|---------|
| `01-metrics-overview.ipynb` | Introduction to each metric with visualisations |
| `02-batch-evaluation.ipynb` | Evaluating model outputs at scale, statistical analysis |
| `03-regression-detection.ipynb` | Tracking quality over time, detecting degradation |
| `04-chatbot-validation.ipynb` | Real-world use case: validating chatbot responses |
**Hosting:** JupyterLite (static files, runs in browser via WebAssembly)
**Deployment:** Self-hosted alongside Streamlit demo
**Why both:**
| Demo Type | Audience | Value |
|-----------|----------|-------|
| Streamlit | General visitors | Quick, interactive, no friction |
| Notebooks | Data/ML recruiters | Shows analytical depth, speaks their language |