Skip to main content

Limitations

ReTraced is a learning-focused job scheduler designed to explore retry behavior and failure scenarios.
It is functional, but not production-hardened by design.


Core Limitations

1. At-Least-Once Delivery Only

Jobs may execute more than once if a worker crashes during processing.

Impact

  • Duplicate execution is possible (for example, sending the same email twice).

Guidance

  • Job handlers must be idempotent.
  • Handlers should verify whether work has already been completed before executing side effects.

Future Improvements

  • Idempotency keys
  • Result deduplication mechanisms

2. Single Point of Failure (Redis)

All queue state lives in Redis.
If Redis crashes without persistence enabled, jobs can be permanently lost.

Impact

  • Redis downtime equals system downtime
  • No automatic failover
  • Risk of data loss without persistence

Current Mitigation

  • Enable Redis persistence (RDB snapshots and AOF logging)

Future Improvements

  • Redis Sentinel for high availability
  • Redis Cluster support

3. No PostgreSQL Fallback or Long-Term Storage

Job history exists only in Redis memory.

Impact

  • No long-term job history queries
  • Limited audit trail
  • No historical retry analytics

Current Mitigation

  • Use Redis persistence to reduce data loss
  • Export critical job data via logs
  • Retain completed jobs for a limited duration

Future Improvements

  • PostgreSQL persistence layer for job history and analytics

4. No High Availability or Coordination

Workers operate independently with no leader election or coordination.

Impact

  • No automatic failover
  • Manual scaling required
  • Multiple watchdogs may race to requeue stuck jobs

Current Mitigation

  • Run multiple workers for redundancy
  • Use container restart policies

Future Improvements

  • Leader election
  • Coordinated watchdog execution

5. No Built-in Observability

ReTraced does not provide built-in metrics, dashboards, or tracing.

Impact

  • No real-time visibility into queue depth or latency
  • No alerting on failures
  • Debugging relies on logs and Redis inspection

Current Mitigation

  • Structured JSON logging
  • Manual Redis monitoring
  • Retry history stored in job metadata

Future Improvements

  • Prometheus metrics endpoint
  • Grafana dashboards (planned)

Missing Features

  • No UI dashboard
  • No published npm package
  • Limited production testing
  • No cron or scheduled jobs
  • No rate limiting per queue
  • No job dependencies or workflows

Production Readiness Checklist

Before considering production usage:

Critical

  • Enable Redis persistence (RDB and AOF)
  • Test Redis restart and recovery scenarios
  • Ensure all job handlers are idempotent
  • Add monitoring and alerting
  • Load test with large job volumes
  • Run multiple workers for redundancy
  • Use structured logging
  • Configure Redis backups
  • Maintain operational runbooks

When to Use ReTraced

Suitable For

  • Learning distributed systems concepts
  • Side projects and internal tools
  • Webhook forwarding
  • Non-critical background jobs
  • Debugging retry behavior and failure modes

Not Suitable For

  • Mission-critical production systems
  • Financial or payment processing
  • Very high throughput workloads
  • Exactly-once delivery requirements
  • Compliance or audit-heavy environments

For production-grade systems, consider mature alternatives such as BullMQ, Temporal, or Agenda.


Summary

ReTraced prioritizes retry visibility, transparency, and educational value over production guarantees.

Intentional Tradeoffs

  • At-least-once delivery semantics
  • Single Redis instance
  • No long-term persistence layer
  • Manual operational control

Use ReTraced to learn, experiment, and understand failure behavior.
For production workloads, choose a battle-tested scheduler.