Comparison

ReTraced takes a different approach: while mature queues optimize for production throughput and abstraction, ReTraced prioritizes visibility and learning by exposing retry and failure behavior as explicit, queryable data.

Feature / Capability	ReTraced	BullMQ	Sidekiq	AWS SQS + Lambda
Primary Design Goal	Explicit retry & failure modeling for learning and debugging	High-throughput Redis-based queues	Battle-tested background jobs for Rails	Fully managed event-driven execution
Delivery Guarantee	At-least-once	At-least-once	At-least-once	At-least-once
Retry Strategy Configuration	✅ Per-job strategy (linear, fixed, exponential, three-tier)	Per-job or queue-level backoff	Queue-level retry config	Platform-managed retry policies
Retry Behavior as Data	✅ Full `history[]` array with metadata	Retry count and last error	Retry count	CloudWatch logs
Retry Attempt Audit Trail	✅ Timestamp, error code, trigger type, result per attempt	Retry count, last error	Retry count, last error	CloudWatch logs
Manual vs Auto Retry Tracking	✅ Explicit tracking (`AUTO` vs `MANUAL`)	Not tracked separately	Not tracked separately	Not tracked separately
Backoff Configuration	✅ Per-job (`baseDelay`, `factor`, `jitter`, `limitOfTries`)	Configurable exponential backoff	Configurable backoff	AWS-managed
Dead Letter Queue (DLQ)	✅ Separate DLQ with full job history	DLQ pattern support	Failed job sets	Native DLQ support
DLQ Forensics	✅ Failure type, complete retry history, error classification	Job data + error	Job data + error	Message + error logs
Manual Retry from DLQ	✅ First-class feature with trigger tracking	Manual job requeue	Manual job requeue	Redrive policy
Poison Job Handling	✅ Explicit `poisoned` status	Handled via DLQ pattern	Handled via dead sets	Handled via DLQ
Failure Classification	✅ Permanent / Temporary / Poison	Error handling via user code	Error handling via user code	Retry vs DLQ based on config
Job Lifecycle States	✅ Detailed (`pending`, `processing`, `delayed`, `dead`, `poisoned`, `completed`, `failed`)	Active, completed, failed, delayed	Queued, processing, completed, failed	Managed by AWS
Priority Jobs	❌ Not implemented	✅ Supported	✅ Supported	Not natively supported
Scheduling / Cron Jobs	❌ Not implemented	✅ Repeatable jobs	✅ Cron-style scheduling	✅ EventBridge integration
Exactly-Once Semantics	❌ Not implemented (at-least-once by design)	Not guaranteed	Not guaranteed	Approximate (FIFO + deduplication)
Idempotency Support	❌ Not implemented (user-managed)	User-managed	User-managed	✅ Native (FIFO queues)
Persistence Layer	Redis	Redis	Redis + optional PostgreSQL	Fully managed by AWS
Operational Scale	Educational / small-scale	Production-grade, high-throughput	Production-grade	Massive scale
Observability	❌ Planned (basic logging currently)	✅ Built-in metrics and monitoring	✅ Mature monitoring ecosystem	✅ CloudWatch integration
Deployment	✅ Docker Compose ready	Docker compatible	Docker compatible	Fully managed

When to Use Each

Use ReTraced when:

Debugging complex retry scenarios
Need complete visibility into retry behavior
Building prototypes that require failure forensics
Teaching or understanding how job queues work internally

When to Use Each​

When to Use Each