When idle shutdown triggered _stop_server(), it was creating a new event
loop and calling server.stop() on it, but the daemon thread was still
running loop.run_forever() on the original event loop. This left sockets
bound, causing "address already in use" on restart.
Fix by storing references to the server's event loop and thread, then
using call_soon_threadsafe(loop.stop) to signal the correct loop to exit.
The thread join ensures sockets are released before the next server starts.
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Replace pause-on-idle with full server shutdown after IDLE_SHUTDOWN_SECONDS
(default 5 minutes). Next visitor gets a fresh simulation instance.
- Idle checker stops server and clears st.cache_resource
- init_session_state detects stopped server and recreates fresh state
- Clears instruments and history for clean restart
Configurable via IDLE_SHUTDOWN_SECONDS environment variable.
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Replace module-level singleton with @st.cache_resource decorator.
This properly survives Streamlit reruns without losing the server
reference, preventing "port already in use" errors when refreshing
the browser in Docker.
The cache is tied to the Streamlit process lifecycle, so when the
process restarts, both the cache and daemon threads are cleared
together.
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Check port availability before singleton state to detect orphan servers
from previous processes. When ports are in use but singleton is None,
wait up to 5 seconds for the orphan to shut down before failing with a
clear error message.
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Add reuse_address=True to TCP server start to allow quick rebind
after process restart (TIME_WAIT state)
- Add _is_server_responsive() check to verify server is actually
responding, not just trusting the is_running flag which can be stale
if the server thread died unexpectedly
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
When Streamlit refreshes/reruns, session state is lost but the old
simulation server thread keeps running on ports 5001-5003. This caused
"address already in use" errors when trying to start a new server.
Solution: Use a module-level singleton for the simulation server that
persists across Streamlit reruns. The get_or_create_server() function
checks if a server is already running before creating a new one.
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Update libgdk-pixbuf2.0-0 to libgdk-pixbuf-2.0-0 for Debian Trixie
- Remove bundled nginx container in favor of host nginx
- Use host networking for cloudflared to reach host nginx
- Expose streamlit on localhost:8080 for host nginx proxy
- Physics pauses after IDLE_PAUSE_SECONDS (default 30s) of inactivity
- Resumes instantly when someone views the dashboard
- No container restart needed - just pauses the simulation loop
- CPU usage drops to ~0% when paused
- IDLE_TIMEOUT_MINUTES env var (default 30 min)
- restart: no policy so container stays stopped
- Optional wakeup service for auto-restart
- Document three restart options in readme
- IDLE_TIMEOUT_MINUTES env var to configure shutdown after inactivity
- Background thread monitors activity and exits when timeout reached
- Activity tracked via simulation_display fragment (runs while page open)
- Set to 0 (default) to disable auto-shutdown
- Wrap simulation controls in form to prevent page reruns on change
- Fix TempCo test configs to use 2+ temperature points
- Add Installation, Quick Start, and usage examples to README
Redesign integration test architecture to eliminate async/sync deadlock:
- Run SimulationServer in dedicated background thread with own event loop
- Rewrite TempCo tests as fully synchronous (no @pytest.mark.asyncio)
- Add ServerThread fixture in tests/integration/conftest.py
- Fix Unicode encoding errors (replace deg, mu, +/- with ASCII)
- Optimize temperature points for faster settling (23C, 25C, 27C)
All 3 TempCo integration tests now passing in ~5 minutes total.