Limitations
ReTraced is a learning-focused job scheduler designed to explore retry behavior and failure scenarios.
It is functional, but not production-hardened by design.
Core Limitations
1. At-Least-Once Delivery Only
Jobs may execute more than once if a worker crashes during processing.
Impact
- Duplicate execution is possible (for example, sending the same email twice).
Guidance
- Job handlers must be idempotent.
- Handlers should verify whether work has already been completed before executing side effects.
Future Improvements
- Idempotency keys
- Result deduplication mechanisms
2. Single Point of Failure (Redis)
All queue state lives in Redis.
If Redis crashes without persistence enabled, jobs can be permanently lost.
Impact
- Redis downtime equals system downtime
- No automatic failover
- Risk of data loss without persistence
Current Mitigation
- Enable Redis persistence (RDB snapshots and AOF logging)
Future Improvements
- Redis Sentinel for high availability
- Redis Cluster support
3. No PostgreSQL Fallback or Long-Term Storage
Job history exists only in Redis memory.
Impact
- No long-term job history queries
- Limited audit trail
- No historical retry analytics
Current Mitigation
- Use Redis persistence to reduce data loss
- Export critical job data via logs
- Retain completed jobs for a limited duration
Future Improvements
- PostgreSQL persistence layer for job history and analytics
4. No High Availability or Coordination
Workers operate independently with no leader election or coordination.
Impact
- No automatic failover
- Manual scaling required
- Multiple watchdogs may race to requeue stuck jobs
Current Mitigation
- Run multiple workers for redundancy
- Use container restart policies
Future Improvements
- Leader election
- Coordinated watchdog execution
5. No Built-in Observability
ReTraced does not provide built-in metrics, dashboards, or tracing.
Impact
- No real-time visibility into queue depth or latency
- No alerting on failures
- Debugging relies on logs and Redis inspection
Current Mitigation
- Structured JSON logging
- Manual Redis monitoring
- Retry history stored in job metadata
Future Improvements
- Prometheus metrics endpoint
- Grafana dashboards (planned)
Missing Features
- No UI dashboard
- No published npm package
- Limited production testing
- No cron or scheduled jobs
- No rate limiting per queue
- No job dependencies or workflows
Production Readiness Checklist
Before considering production usage:
Critical
- Enable Redis persistence (RDB and AOF)
- Test Redis restart and recovery scenarios
- Ensure all job handlers are idempotent
- Add monitoring and alerting
- Load test with large job volumes
Recommended
- Run multiple workers for redundancy
- Use structured logging
- Configure Redis backups
- Maintain operational runbooks
When to Use ReTraced
Suitable For
- Learning distributed systems concepts
- Side projects and internal tools
- Webhook forwarding
- Non-critical background jobs
- Debugging retry behavior and failure modes
Not Suitable For
- Mission-critical production systems
- Financial or payment processing
- Very high throughput workloads
- Exactly-once delivery requirements
- Compliance or audit-heavy environments
For production-grade systems, consider mature alternatives such as BullMQ, Temporal, or Agenda.
Summary
ReTraced prioritizes retry visibility, transparency, and educational value over production guarantees.
Intentional Tradeoffs
- At-least-once delivery semantics
- Single Redis instance
- No long-term persistence layer
- Manual operational control
Use ReTraced to learn, experiment, and understand failure behavior.
For production workloads, choose a battle-tested scheduler.