Amazon SQS Deep Dive: The Distributed Message Queue Handling Millions of Messages Daily
You’re running an e-commerce system. Everything works fine at a few thousand requests per minute — until flash sale day. Traffic spikes 20x within seconds. Thousands of orders flood in simultaneously, each needing to be written to the database. The connection pool is exhausted, queries pile up, timeout cascades begin, and the database crashes. Customers see 500 errors, the dev team pulls an all-nighter rolling back, and the business loses revenue.
The root problem here isn’t a weak database — it’s tight coupling. The application writes directly to the database, so write throughput is limited by the database’s processing capacity. Traffic spikes → database bottleneck → the entire system collapses.
The solution? Place a message queue in between. Instead of writing directly to the database, the application simply “throws” requests into the queue. On the other side, workers pull messages and write to the database at a rate the database can handle. The queue acts as a shock absorber — absorbing traffic spikes and releasing them gradually and steadily.
This is exactly the problem that Amazon SQS (Simple Queue Service) solves. A fully managed message queue — no provisioning, no capacity planning. You send messages in, consumers pull them out. AWS handles all the availability, durability, and scaling underneath.
This post will dive deep into SQS internals — from how messages are stored and replicated across distributed servers, the difference between Standard Queue and FIFO Queue (and why Standard Queue delivers messages out of order), visibility timeout, long polling, Dead Letter Queue, to real-world patterns like Auto Scaling Group integration and using SQS as a buffer for database writes.
1. What is SQS?
Amazon SQS (Simple Queue Service) is a fully managed message queue service from AWS — allowing applications to send, store, and receive messages between software components at any volume, without worrying about losing messages or requiring other services to be online simultaneously.
Simply put, SQS works like a post office mailbox. The sender (producer) drops a letter in the box, the receiver (consumer) picks it up when ready. The sender doesn’t need to know who the receiver is, where they are, or whether they’re online — the letter stays safely in the box until it’s picked up.
Key characteristics:
- Fully managed — no server provisioning, no patching, monitoring, or capacity planning needed. AWS manages the entire infrastructure.
- Pay per request — every API call (SendMessage, ReceiveMessage, DeleteMessage…) is billed. The billing unit is a request, where 1 request = 64KB. If you send a 256KB message, it counts as 4 requests. SQS doesn’t charge based on storage volume or message count.
- Near-unlimited scale — Standard Queue can handle a virtually unlimited number of messages per second.
- Message retention up to 14 days — default is 4 days, configurable up to 14 days.
- Maximum message size of 256KB — for larger payloads, use the claim check pattern: store the payload in S3, send the S3 URL via SQS.
SQS offers two queue types: Standard Queue and FIFO Queue. We’ll dive deep into their differences in the next section.
| Criteria | SQS | Kafka | RabbitMQ |
|---|---|---|---|
| Delivery model | Pull — consumer actively polls messages | Pull — consumer pulls from partition | Push — broker pushes to consumer |
| Ordering | Best-effort (Standard) / Strict per group (FIFO) | Strict per partition | Strict per queue |
| Retention | Up to 14 days | Configurable (days/volume) | Until consumed |
| Throughput | Near-unlimited (Standard) | Very high (~1M+ msg/s) | Medium (~50K msg/s) |
| Management | Fully managed | Self-managed or Amazon MSK | Self-managed |
| Primary use case | Task queue, decoupling, buffer | Event streaming, log aggregation | Task queue, request-reply |
2. Internal Architecture — “How Things Appear” vs “How Things Actually Are”
When you look at SQS from the outside, everything seems simple: a queue, messages go in one end, come out the other, in FIFO (First In, First Out) order. This is “How Things Appear” — what you see.
But beneath that simple surface is a complex distributed system. This is “How Things Actually Are”.
2.1. Three-tier Architecture
SQS is built on three main tiers:
Frontend Layer — the request reception tier. A load balancer distributes requests to frontend servers, which handle authentication (via IAM), validation (checking message size, whether the queue exists), and routing requests down to the storage tier.
Backend Storage Layer — the heart of SQS. This is a distributed key-value store — a distributed storage system in key-value format. Messages are stored as key-value pairs, where the key is generated from the queue ID and message metadata, and the value is the message body (up to 256KB) along with its attributes. The key point: messages are sharded (partitioned) across multiple storage nodes based on queue ID for load balancing.
Coordination Layer — the orchestration tier, using a distributed consensus mechanism (similar to Paxos or Raft) to ensure consistency between nodes. For Standard Queue, this layer operates minimally to prioritize throughput. For FIFO Queue, this layer operates more strictly to guarantee ordering and exactly-once delivery.
2.2. How Are Messages Stored?
When a producer sends a message to SQS, here’s what happens behind the scenes:
- The frontend server receives the request, authenticates, and validates.
- The message is synchronously replicated across multiple storage nodes in different Availability Zones (AZs).
- SQS uses quorum-based writes — a write is considered successful when M of N replicas confirm they’ve stored it. This ensures the message survives even if an entire AZ goes down.
- Only after achieving quorum does SQS return a success response to the producer.
Thanks to this mechanism, SQS achieves extremely high data durability — AWS claims 11 nines durability (99.999999999%). Your messages are virtually impossible to lose.
2.3. Why Does Standard Queue Deliver Messages Out of Order?
This is a question many people ask: “SQS is a queue, and queues should be FIFO, right? Why do messages arrive out of order?”
The answer lies in the distributed architecture underneath.
When you send message A, then B, then C to a Standard Queue — they don’t land on the same server. SQS shards messages across multiple storage nodes for load balancing. Message A might land on node 1, B on node 3, C on node 2.
When a consumer calls ReceiveMessage, SQS doesn’t scan all nodes. Instead, it connects randomly to a few nodes and returns whatever messages it finds there. If it hits node 3 first → you receive B before A.
This isn’t a bug — it’s a deliberate trade-off, which is why AWS calls it best-effort ordering. By not enforcing ordering, SQS can distribute messages across more nodes, serve requests from any node, and achieve near-unlimited throughput. If ordering had to be enforced, SQS would need to coordinate between nodes — consuming time, reducing throughput, and limiting scale.
2.4. FIFO Queue — Trading Throughput for Ordering
In the previous section, we saw that Standard Queue loses ordering because messages are randomly distributed across multiple nodes. FIFO Queue solves this by ensuring all messages with the same Message Group ID are always routed to the same partition (the same storage node or group of nodes).
When a producer sends a message with MessageGroupId = "order-123", SQS uses this value to hash and determine the target partition — similar to how Kafka uses partition keys. All messages with the same group ID go to the same partition, where SQS assigns them incrementing sequence numbers. Because everything is on the same partition, maintaining order becomes a local problem — no complex distributed coordination across multiple nodes is needed.
When a consumer calls ReceiveMessage, SQS returns messages in exact sequence number order within each Message Group. And importantly: only one consumer can process messages from a Message Group at any given time. If message #1 of group “order-123” is in-flight (not yet deleted), SQS won’t deliver message #2 of the same group to any consumer — ensuring no two consumers process the same group in parallel.
This is precisely why FIFO Queue has much lower throughput than Standard Queue. Each Message Group ID is effectively a deliberate bottleneck — messages are forced onto a single partition instead of being freely distributed. Fewer Message Group IDs mean fewer partitions are used, resulting in lower throughput. Conversely, if you use many different Message Group IDs (e.g., one per order), SQS can process multiple groups in parallel across multiple partitions — this is how to scale FIFO Queue effectively.
The trade-offs are clear:
| Criteria | Standard Queue | FIFO Queue |
|---|---|---|
| Throughput | Near-unlimited | 300 msg/s (3,000 with high throughput, 30,000 with batching) |
| Ordering | Best-effort | Strict per Message Group ID |
| Delivery | At-least-once (may duplicate) | Exactly-once processing |
| Deduplication | None | 5-minute rolling window |
| Queue name | Unrestricted | Must end with .fifo |
FIFO Queue also has built-in deduplication: each message can include a MessageDeduplicationId. Within a 5-minute window (non-configurable), if SQS receives a message with the same dedup ID, it discards the duplicate but still returns a success response. You can provide the dedup ID yourself, or let SQS compute a SHA-256 hash from the message body — but best practice is to provide it yourself for explicit control over dedup logic.
3. Message Lifecycle
Understanding the lifecycle of a message in SQS is foundational to understanding concepts like visibility timeout and Dead Letter Queue.
A message goes through the following stages:
Step 1 — SendMessage: The producer sends a message. SQS synchronously replicates it across multiple nodes, returning a MessageId once quorum is achieved.
Step 2 — Stored: The message sits in the queue in a visible state — ready for any consumer to poll.
Step 3 — ReceiveMessage: A consumer polls the message. SQS returns the message along with a receipt handle — a “ticket” the consumer needs to keep for further operations (deleting or extending the timeout).
Step 4 — In-flight: The message transitions to an invisible state — other consumers can no longer see this message. This is the visibility timeout in action.
Step 5 — Resolution: Two scenarios:
- The consumer finishes processing → calls
DeleteMessagewith the receipt handle → the message is permanently deleted. - The consumer crashes or processes too slowly → the visibility timeout expires → the message returns to visible state → another consumer can poll it again.
Each time a message is received without being deleted, SQS increments the message’s receive count by 1. This number is crucial — it determines when a message gets moved to the Dead Letter Queue, which we’ll cover in section 6.
4. Visibility Timeout
Visibility timeout is the period during which a message becomes “invisible” to other consumers after being received. Its purpose is to prevent multiple consumers from processing the same message — avoiding duplicate processing.
Think of it this way: you go to a library and take a book off the shelf to read. While you’re reading (visibility timeout), no one else can see that book on the shelf. If you finish reading and return it (delete message), the book disappears from the system. But if you hold it too long without returning it — the library puts the book back on the shelf (message becomes visible again) so someone else can take it.
Configuration
- Default: 30 seconds
- Range: 0 seconds to 12 hours
- Can be set at queue level (applies to all messages) or per-message (when calling
ReceiveMessage)
When is visibility timeout too short?
The consumer hasn’t finished processing, but the timeout has expired → the message becomes visible again → another consumer polls and processes it → duplicate processing. This is the most common reason for duplicates in SQS.
When is visibility timeout too long?
The consumer crashes mid-processing → the message is “stuck” in invisible state → must wait for the timeout to expire before retry → increased latency for that message.
ChangeMessageVisibility
If a consumer realizes it needs more time to process, it can call the ChangeMessageVisibility API to extend the timeout without releasing the message. This is very useful for tasks with variable processing times — for example, video processing, image resizing, or slow external API calls.
Best practice: set visibility timeout to about 6 times the average processing time. For example: if a consumer takes an average of 10 seconds to process → set the timeout to 60 seconds. This provides enough buffer for unusually slow cases, but isn’t too long if the consumer truly crashes.
5. Long Polling vs Short Polling
When a consumer calls ReceiveMessage, there are two ways SQS can return results: short polling and long polling.
Short polling — SQS returns results immediately, even if there are no messages. If the queue is empty, you get an empty response. The consumer must poll continuously at an interval to check for new messages.
Long polling — SQS waits for up to WaitTimeSeconds (maximum 20 seconds) until a message arrives, then returns it. Only if the wait time expires with no messages does it return an empty response.
If you think long polling is pointless — “just poll continuously and you’ll eventually get the message anyway” — you’d be wrong. Long polling provides three important benefits:
5.1. Reduced API Costs
SQS charges per request, not per message count or volume. Every ReceiveMessage call costs money — even when the response is empty. With short polling, if the queue is empty and the consumer polls every 1 second → 86,400 requests/day just to receive… nothing. With long polling (WaitTimeSeconds=20), that drops to 4,320 requests/day — a 95% cost reduction for empty responses.
5.2. Reduced Latency
With short polling, if a message arrives right after the consumer just polled → the consumer must wait until the next poll to receive it. The delay depends on the polling interval.
With long polling, the consumer is already “waiting” on the SQS side. When a message arrives, it’s delivered to the consumer immediately — no waiting for the next poll.
5.3. Avoiding False “Queue Empty” Signals
This is a lesser-known but extremely important benefit, directly related to the distributed architecture we discussed in section 2.
Recall: SQS stores messages across multiple distributed storage nodes. With short polling, SQS only queries a few random nodes and returns the result. If the message is on a node that short polling didn’t hit → you get a “queue empty” response — but in reality, the message is still there, just on a different node.
With long polling, SQS queries all nodes before returning the result. This completely eliminates false “queue empty” signals.
| Criteria | Short Polling | Long Polling |
|---|---|---|
WaitTimeSeconds | 0 (default) | 1–20 seconds |
| Empty responses | Very frequent | Very rare |
| Cost | High (many requests) | Significantly lower |
| Latency | High (depends on poll interval) | Low (immediate delivery when available) |
| SQS node coverage | A few random nodes | All nodes |
Best practice: you should almost always use long polling. Set
ReceiveMessageWaitTimeSeconds=20at the queue level to enable it globally for all consumers.
6. Dead Letter Queue (DLQ)
Some messages are “broken” — wrong data format, consumer bugs, external dependencies down. The message is received, processing fails, it returns to the queue, gets received again, fails again… repeating endlessly, consuming processing resources without ever succeeding. These are called poison messages.
A Dead Letter Queue (DLQ) is the solution: a separate SQS queue where poison messages are “exiled” after failing too many times.
How It Works
When configuring a DLQ, you set maxReceiveCount — the maximum number of times a message can be received before being moved to the DLQ. For example, maxReceiveCount=3:
- Message received attempt 1 → processing fails → returns to queue
- Message received attempt 2 → processing fails → returns to queue
- Message received attempt 3 → processing fails → SQS automatically moves the message to the DLQ
A DLQ is essentially just a normal SQS queue — you create a queue, then configure it as the DLQ target for the source queue.
Handling Messages in the DLQ
Messages in the DLQ are typically used for:
- Debugging — examining what message caused the error and why the consumer couldn’t process it
- Alerting — setting a CloudWatch alarm on the DLQ’s
ApproximateNumberOfMessagesVisiblemetric. A rising DLQ count = something is going wrong - Reprocessing — after fixing the bug, use AWS’s Redrive to source feature to push messages from the DLQ back to the source queue for reprocessing
Important Note About Retention
The retention time of a message in the DLQ is calculated from when the message was first sent to the source queue, not from when it entered the DLQ. For example: if retention is 14 days and a message sat in the source queue for 13 days before being moved to the DLQ → it only has 1 day left in the DLQ before automatic deletion.
Best practice: always set the DLQ’s retention equal to or longer than the source queue’s retention to allow enough time for debugging and reprocessing.
7. Redrive Allow Policy
By default, any queue can configure any other queue as its DLQ. This creates a governance problem — you don’t want another team’s queue to arbitrarily push failed messages into your queue.
Redrive Allow Policy controls which queues are allowed to use this queue as their DLQ. There are three modes:
allowAll (default) — any queue in the same account and region can use this queue as a DLQ.
denyAll — no queue is allowed to use this queue as a DLQ.
byQueue — only explicitly specified queues are allowed.
{
"redrivePermission": "byQueue",
"sourceQueueArns": [
"arn:aws:sqs:ap-southeast-1:123456789:order-processing-queue",
"arn:aws:sqs:ap-southeast-1:123456789:payment-queue"
]
}Best practice: in production, always use
byQueueto explicitly control which queues can redirect failed messages to your DLQ.
8. SQS with Auto Scaling Group (ASG)
One of the most powerful SQS patterns is combining it with an Auto Scaling Group (ASG) — automatically scaling the number of EC2 consumer instances based on queue depth.
Architecture
The flow:
- Messages are continuously sent to the SQS queue.
- CloudWatch monitors the
ApproximateNumberOfMessagesVisiblemetric — the number of messages waiting in the queue. - When the metric exceeds the threshold (e.g., > 1,000 messages), the CloudWatch Alarm transitions to ALARM state.
- The alarm triggers the ASG scaling policy → ASG launches additional EC2 instances.
- New instances start polling SQS, processing messages → queue depth decreases.
- When queue depth drops below the threshold → alarm returns to OK → ASG scales in, terminating excess instances.
Why Use Queue Depth Instead of CPU?
For most queue-based workloads, CPU doesn’t accurately reflect the actual workload. Consumers might be I/O bound (waiting on the database, waiting on external APIs) — CPU is low but thousands of messages are queuing up. Scaling based on CPU won’t trigger, even though the system is overloaded.
Queue depth directly reflects the actual backlog — how much work is waiting to be processed. This is a more accurate metric for scaling decisions.
Custom Metric: Backlog Per Instance
Instead of using raw queue depth, you can create a more precise custom metric:
backlog_per_instance = ApproximateNumberOfMessages / current_instance_countThis metric tells you how many messages each instance is “carrying.” For example: 10,000 messages with 5 instances → each instance carries 2,000 messages. Scaling when backlog_per_instance exceeds an acceptable threshold makes proportionally better decisions than raw queue depth.
9. SQS as a Buffer for Database Writes
This is the pattern we mentioned in the introduction — using SQS as a shock absorber between the application tier and the database, solving the problem of database bottlenecks during traffic spikes.
Architecture
The flow:
- Requests from clients hit EC2 instances (the Enqueue group). These instances validate the request, create a message, and call
SendMessageto push it into SQS. - The SQS queue absorbs all traffic — whether bursty or steady. SQS scales near-infinitely, so there’s never a bottleneck at this layer.
- EC2 instances (the Dequeue group) poll messages from SQS using
ReceiveMessages, then insert into the database at a rate the database can handle. - Both EC2 groups have independent Auto Scaling — the Enqueue group scales based on request traffic, the Dequeue group scales based on queue depth (the pattern from section 8).
Why Is This Pattern Effective?
SQS acts as a decoupling layer — separating the request reception tier from the database write tier. The two tiers scale independently:
- The Enqueue group scales freely with traffic — sending messages to SQS is fast and cheap.
- The Dequeue group scales according to database capacity — never pushing beyond what the database can handle.
- SQS in the middle is “unlimited” — absorbing any traffic spike without failing.
Trade-off
Writes become eventually consistent — there’s a delay between when the request arrives and when data is actually written to the database. This is acceptable for many use cases: analytics, logging, notifications, order processing. But it’s not suitable for cases requiring synchronous confirmation that data has been written — for example, checking an account balance before allowing a transaction.
10. SQS with Lambda
Besides EC2, AWS Lambda is the most popular consumer for SQS. When you configure a Lambda trigger from SQS, AWS automatically manages polling and scaling.
How Lambda Polls SQS
When you create an SQS event source mapping for Lambda, AWS creates a thread pool (default 5 threads). Each thread independently long-polls SQS and retrieves messages in batches (up to 10 messages/batch). The system automatically adjusts the number of threads based on queue message volume and concurrency limits.
Important Failure Scenario
If Lambda’s reserved concurrency is exhausted (e.g., reserved concurrency set to 10, but 10 Lambda instances are already running), the remaining threads will push their batch of messages back to the queue. This action increments the receive count of each message by 1.
The consequence: if the DLQ’s maxReceiveCount is low (e.g., 3), messages may be moved to the DLQ earlier than expected — not because processing failed, but because concurrency was insufficient. This is a common gotcha when using Lambda with SQS.
Best practice: when using Lambda with SQS, set
maxReceiveCounthigh enough to account for throttling scenarios, not just processing failures. And monitor theNumberOfMessagesReceivedmetric againstNumberOfMessagesDeleted— if there’s a large gap, Lambda is likely being throttled.
11. SNS + SQS: Fan-Out Pattern
The previous patterns (ASG, buffer, Lambda) all handle a single flow: producer sends message → one group of consumers processes it. But in practice, a single event often needs to be processed by multiple independent services simultaneously.
For example: when an order is placed, the system needs to simultaneously check for fraud (Fraud Service), handle shipping (Shipping Service), send confirmation emails (Notification Service), and update analytics. If the Buying Service sends messages directly to each service, you create tight coupling — every time you add a new service, you have to modify the Buying Service code.
The fan-out pattern solves this by combining SNS (Simple Notification Service) — a pub/sub (publish-subscribe) service that sends a single message to multiple subscribers at once — with SQS to ensure each subscriber receives and processes messages reliably.
Architecture
How it works:
- Buying Service sends a single message to an SNS Topic — publish once.
- SNS automatically fans out — copies the message and pushes it to all SQS queues subscribed to that topic.
- Each SQS queue operates completely independently — Fraud Service polls from its own queue, Shipping Service polls from its own queue. One service being slow or failing doesn’t affect the others.
Why Combine SNS + SQS Instead of Just SNS?
SNS alone is push-based — when SNS sends a message to a subscriber, if the subscriber is down or unable to process at that moment, the message is lost. SNS doesn’t store messages.
Combining SQS behind each subscriber completely solves this:
- Persistence — messages are safely stored in SQS until the consumer processes and deletes them.
- Delayed processing — each consumer processes at its own pace, not forced by the publish rate. Fraud Service might take 5 seconds per message while Shipping Service takes only 100ms — no problem.
- Retry + DLQ — failed messages automatically return to the queue, combined with Dead Letter Queue for poison messages — all mechanisms discussed in previous sections.
- Decoupled scaling — each subscriber scales independently based on its own queue depth.
Adding New Subscribers
This is the most powerful benefit of fan-out: adding new subscribers without changing the producer. When you need to add an Analytics Service to track orders, just create a new SQS queue, subscribe it to the same SNS topic — done. The Buying Service doesn’t know and doesn’t need to know how many subscribers are listening.
This is the Open-Closed Principle applied at the infrastructure level — the system is open for extension (adding subscribers) but closed for modification (no changes to the producer).
Important Configuration: SQS Access Policy
For SNS to send messages to an SQS queue, the queue needs a resource-based access policy allowing the SNS service principal to perform the sqs:SendMessage action. Without this policy, SNS cannot write to the queue and messages will be silently dropped.
Best practice: always restrict the SQS access policy with an
aws:SourceArncondition pointing to the specific SNS topic — avoid allowing any SNS topic to write to your queue. This is especially important in multi-team environments where multiple SNS topics exist within the same AWS account.
Cross-Region Delivery
SNS supports cross-region delivery — an SNS topic in Region A can fan out messages to SQS queues in Region B or Region C. This is useful for multi-region architectures where you want to replicate events to other regions for local processing, reducing latency for end users.
Practical Application: S3 Events Fan-Out
A common use case for the fan-out pattern is handling S3 event notifications. S3 has an important limitation: for the same combination of event type (e.g., s3:ObjectCreated:*) and prefix (e.g., images/), you can only create one S3 Event notification rule.
This means if you want three different services to react when a file is uploaded to images/ — you cannot create three separate S3 event rules.
The solution: configure S3 to send event notifications to a single SNS Topic. SNS then fans out to multiple SQS queues and/or Lambda functions. Each downstream service receives its own copy of the event and processes independently — one service creates thumbnails, another extracts metadata, a third indexes into a search engine.
Summary
SQS looks simple from the outside — send message, receive message, delete message. But underneath is a carefully designed distributed system with many deliberate trade-offs:
- Standard Queue vs FIFO Queue — near-unlimited throughput by distributing messages across multiple nodes and dropping ordering guarantees, or trading throughput for strict ordering per Message Group ID.
- Visibility Timeout — a mechanism to prevent duplicate processing, managed by a distributed timer system, tunable per-message.
- Long Polling — reduces cost, reduces latency, and eliminates false “queue empty” signals by querying all storage nodes instead of just a few.
- Dead Letter Queue + Redrive Allow Policy — a safety net for poison messages, with governance controls over who can redirect failed messages to your DLQ.
- ASG Integration — queue depth is a more accurate scaling signal than CPU for queue-based workloads.
- Buffer Pattern — SQS serves as a shock absorber between fast producers and capacity-limited databases, allowing the two tiers to scale independently.
- Lambda Integration — AWS manages polling and scaling automatically, but be careful with concurrency limits and their impact on DLQ receive counts.
- Fan-Out Pattern — combining SNS + SQS to fan out a single event to multiple independent subscribers, with persistence, retry, and the ability to add subscribers without changing the producer.
SQS is the right choice when you need a simple, fully managed, infinitely scalable task queue to decouple components in your system. If you need event streaming with replay and long-term retention → Kafka. If you need fan-out of a single event to multiple subscribers → combine SNS + SQS as discussed in section 11. If you need complex routing based on event content → EventBridge. Each tool has its place — SQS excels at simplicity and reliability.