How ReTraced Helped Me Build ReTraced

Known Issue: Exponential Backoff Runtime Discrepancy

Problem Discovery

While stress-testing ReTraced by intentionally breaking the system to force failures, I discovered that actual retry intervals don't match expected exponential backoff behavior. This is exactly why explicit retry tracking matters—the data exposed the bug.

Test Setup

I deliberately configured a job to always fail (network error) to observe the full retry → DLQ lifecycle:

{
  "backoffStrategy": "exponential",
  "backoffConfig": {
    "baseDelaySeconds": 5,
    "maxDelaySeconds": 60,
    "factor": 2,
    "limitOfTries": 5
  }
}

The Outpt

{
  "jobId": "job-805",
  "createdAt": 1769032645464,
  "updatedAt": 1769032673534,
  "queueName": "email",
  "status": "dead",
  "tries": 5,
  "maxTries": 5,

  "jobData": {
    "emailFrom": "noreply@test.com",
    "emailTo": "user@test.com",
    "subject": "Test",
    "body": "Hello"
  },

  "backoffStrategy": "exponential",
  "backoffConfig": {
    "baseDelaySeconds": 5,
    "maxDelaySeconds": 60,
    "factor": 2,
    "limitOfTries": 5
  },

  "retryAttempts": [
    {
      "attemptedAt": 1769032648561,
      "trigger": "AUTO",
      "changesMade": false,
      "result": "PENDING",
      "error": {
        "code": "NETWORK_ERROR",
        "message": "NETWORK_ERROR",
        "failedAt": 1769032648561
      }
    },
    {
      "attemptedAt": 1769032654078,
      "trigger": "AUTO",
      "changesMade": false,
      "result": "PENDING",
      "error": {
        "code": "NETWORK_ERROR",
        "message": "NETWORK_ERROR",
        "failedAt": 1769032654077
      }
    },
    {
      "attemptedAt": 1769032661127,
      "trigger": "AUTO",
      "changesMade": false,
      "result": "PENDING",
      "error": {
        "code": "NETWORK_ERROR",
        "message": "NETWORK_ERROR",
        "failedAt": 1769032661126
      }
    },
    {
      "attemptedAt": 1769032667470,
      "trigger": "AUTO",
      "changesMade": false,
      "result": "PENDING",
      "error": {
        "code": "NETWORK_ERROR",
        "message": "NETWORK_ERROR",
        "failedAt": 1769032667469
      }
    },
    {
      "attemptedAt": 1769032673538,
      "trigger": "AUTO",
      "changesMade": false,
      "result": "PENDING",
      "error": {
        "code": "NETWORK_ERROR",
        "message": "NETWORK_ERROR",
        "failedAt": 1769032673537
      }
    }
  ]
}

Investigation Commands

Access Redis CLI

docker compose exec redis redis-cli

Inspect job data

127.0.0.1:6379> GET job:job-845

Expected vs Actual Results

Attempt	Delay Formula	Delay	Cumulative Time
1	Immediate	0s	0s
2	5 × 2⁰	5s	5s
3	5 × 2¹	10s	15s
4	5 × 2²	20s	35s
5	5 × 2³	40s	75s

Expected total retry duration: ~75 seconds

Actual Behavior (From Redis Data)

{
  "jobId": "job-845",
  "status": "dead",
  "tries": 5,
  "retryAttempts": [
    { "attemptedAt": 1769032648561 },  // +3.1s
    { "attemptedAt": 1769032654078 },  // +5.5s
    { "attemptedAt": 1769032661127 },  // +7.0s
    { "attemptedAt": 1769032667470 },  // +6.3s
    { "attemptedAt": 1769032673538 }   // +6.1s
  ]
}

Actual intervals: 3.1s, 5.5s, 7.0s, 6.3s, 6.1s Actual total: ~28 seconds (vs expected 75 seconds)

The delays plateau at ~6 seconds instead of growing exponentially.

Why This Matters

This bug was only discoverable because ReTraced makes retry data explicit: ✅ Timestamps exposed the timing issue ✅ Retry history showed the pattern ✅ Structured data enabled analysis Most queues hide this information, making such bugs invisible.

Takeaway

Intentionally breaking the system revealed bugs that would be hidden in traditional job queues. This validates ReTraced's core philosophy: explicit retry data makes systems debuggable and observable.

Known Issue: Exponential Backoff Runtime Discrepancy​

Problem Discovery​

Test Setup​

The Outpt​

Investigation Commands​

Access Redis CLI​

Inspect job data​

Expected vs Actual Results​

Actual Behavior (From Redis Data)​

Why This Matters​

Takeaway​