Comparison
ReTraced takes a different approach: while mature schedulers optimize for production throughput and abstraction, ReTraced prioritizes visibility and learning by exposing retry and failure behavior as explicit, queryable data.
| Feature / Capability | ReTraced | BullMQ | Sidekiq | AWS SQS + Lambda |
|---|---|---|---|---|
| Primary Design Goal | Explicit retry & failure modeling for learning and debugging | High-throughput Redis-based queues | Battle-tested background jobs for Rails | Fully managed event-driven execution |
| Delivery Guarantee | At-least-once | At-least-once | At-least-once | At-least-once |
| Retry Strategy Configuration | ✅ Per-job strategy (linear, fixed, exponential, three-tier) | Per-job or queue-level backoff | Queue-level retry config | Platform-managed retry policies |
| Retry Behavior as Data | ✅ Full history[] array with metadata | Retry count and last error | Retry count | CloudWatch logs |
| Retry Attempt Audit Trail | ✅ Timestamp, error code, trigger type, result per attempt | Retry count, last error | Retry count, last error | CloudWatch logs |
| Manual vs Auto Retry Tracking | ✅ Explicit tracking (AUTO vs MANUAL) | Not tracked separately | Not tracked separately | Not tracked separately |
| Backoff Configuration | ✅ Per-job (baseDelay, factor, jitter, limitOfTries) | Configurable exponential backoff | Configurable backoff | AWS-managed |
| Dead Letter Queue (DLQ) | ✅ Separate DLQ with full job history | DLQ pattern support | Failed job sets | Native DLQ support |
| DLQ Forensics | ✅ Failure type, complete retry history, error classification | Job data + error | Job data + error | Message + error logs |
| Manual Retry from DLQ | ✅ First-class feature with trigger tracking | Manual job requeue | Manual job requeue | Redrive policy |
| Poison Job Handling | ✅ Explicit poisoned status | Handled via DLQ pattern | Handled via dead sets | Handled via DLQ |
| Failure Classification | ✅ Permanent / Temporary / Poison | Error handling via user code | Error handling via user code | Retry vs DLQ based on config |
| Job Lifecycle States | ✅ Detailed (pending, processing, delayed, dead, poisoned, completed, failed) | Active, completed, failed, delayed | Queued, processing, completed, failed | Managed by AWS |
| Priority Jobs | ❌ Not implemented | ✅ Supported | ✅ Supported | Not natively supported |
| Scheduling / Cron Jobs | ❌ Not implemented | ✅ Repeatable jobs | ✅ Cron-style scheduling | ✅ EventBridge integration |
| Exactly-Once Semantics | ❌ Not implemented (at-least-once by design) | Not guaranteed | Not guaranteed | Approximate (FIFO + deduplication) |
| Idempotency Support | ❌ Not implemented (user-managed) | User-managed | User-managed | ✅ Native (FIFO queues) |
| Persistence Layer | Redis | Redis | Redis + optional PostgreSQL | Fully managed by AWS |
| Operational Scale | Educational / small-scale | Production-grade, high-throughput | Production-grade | Massive scale |
| Observability | ❌ Planned (basic logging currently) | ✅ Built-in metrics and monitoring | ✅ Mature monitoring ecosystem | ✅ CloudWatch integration |
| Deployment | ✅ Docker Compose ready | Docker compatible | Docker compatible | Fully managed |
When to Use Each
Use ReTraced when:
- Debugging complex retry scenarios
- Need complete visibility into retry behavior
- Building prototypes that require failure forensics
- Teaching or understanding how job schedulers work internally