Rate Limits, Retries, Dedupe: CRM Integration Infrastructure

A CRM integration that's reliable at low volume and brittle under real load is the most common failure mode we've encountered when taking over integration work from previous engineers. Here is the full infrastructure we build for reliable, observable sync.
The Queue
CRM writes are asynchronous. An event โ a call, a form submission, a deal stage change โ goes into a durable queue, not directly to the CRM API. If the worker crashes, the event is not lost. Each job carries a tenant_id, an event type, a payload, and an idempotency key. The queue is the contract: once an event is enqueued, it will be processed exactly once.
The Rate Limiter
Each tenant gets a token bucket per target CRM. The bucket refills according to that CRM's documented rate limit. Workers consume tokens before making API calls. If the bucket is empty, the job waits and requeues โ the worker doesn't sleep, it releases the job back to the queue with a delay.
We also read rate limit headers from CRM API responses โ Zoho's X-ZOHO-API-CALLS-REMAINING, HubSpot's X-HubSpot-RateLimit-Remaining โ and use them to correct the local bucket state. If the CRM says we have 0 calls remaining, we trust the CRM, not our local count.
Retry Logic
On failure: exponential backoff starting at 5 seconds, with jitter (a random multiplier between 0.8 and 1.2 on each delay to prevent thundering herd). Maximum 5 retries for transient errors โ 429 rate limited, 503 service unavailable, network timeout. After 5 retries, the job goes to the dead letter queue with the full retry history attached.
Non-retryable errors โ 400 bad request, 404 not found, 403 forbidden โ go directly to the dead letter queue on first failure. There is no point retrying a malformed request or one that requires a configuration change.
Deduplication
The idempotency key is the first guard. A job with the same key that has already succeeded is not processed again. The second guard is the CRM-side existence check before write: does a record with our external ID already exist? If yes, update rather than create. The external ID is our own record ID, stored as a custom field in the CRM on first write.
Monitoring
- Dead letter queue depth exceeding threshold โ fires immediately
- Retry rate exceeding 10% of total jobs in any 5-minute window
- Rate limit bucket consistently below 20% โ approaching capacity
- Sync lag exceeding 2 minutes from event to CRM write
These alerts fire before users notice anything. That is the point.












