Amazon SQS (Simple Queue Service)

1. What is SQS?

Amazon SQS is a fully managed message queuing service that enables you to decouple and scale distributed systems. Producers send messages to a queue, and consumers poll the queue to process them.

Core Concept

SQS = asynchronous decoupling. Producer sends a message and moves on. Consumer picks it up later and processes it. If the consumer is slow or down, messages wait in the queue. This decouples producers from consumers, improving resilience and scalability.

2. SQS Standard Queue

Unlimited throughput (unlimited messages per second)
At-least-once delivery (a message may be delivered more than once)
Best-effort ordering (messages may arrive out of order)
Default retention: 4 days (configurable 1 minute to 14 days)
Max message size: 256 KB (use Extended Client Library + S3 for larger)
Low latency (<10 ms on publish and receive)

3. SQS FIFO Queue

First-In-First-Out ordering guaranteed
Exactly-once processing (deduplication within 5-minute window)
Limited throughput: 300 messages/sec (without batching), 3,000/sec (with batching)
Queue name MUST end with .fifo (e.g., my-queue.fifo)
Message Group ID: messages in the same group are processed in order
Deduplication ID: prevents duplicate messages (content-based or explicit ID)

4. Key SQS Concepts

Visibility Timeout

After a consumer receives a message, it becomes invisible to other consumers
Default: 30 seconds. Configurable: 0 seconds to 12 hours.
If the consumer doesn’t delete the message before timeout expires, it becomes visible again (reprocessed)
Set timeout > processing time to prevent duplicate processing
Consumer can call ChangeMessageVisibility to extend the timeout

Dead-Letter Queue (DLQ)

A separate SQS queue where messages go after failing processing N times
MaxReceiveCount: after this many receive attempts, message moves to DLQ
Use for: debugging failed messages, isolating poison messages
DLQ must be the same type as the source (Standard → Standard DLQ, FIFO → FIFO DLQ)
Redrive to Source: move messages from DLQ back to source queue for reprocessing

Long Polling vs Short Polling

SQS + Auto Scaling

Use CloudWatch metric ApproximateNumberOfMessagesVisible to trigger ASG scaling
Custom metric: queue depth / number of instances = messages per instance
Scale out when backlog grows, scale in when backlog shrinks
Common pattern: SQS → EC2 ASG consumer fleet

SQS Security

Encryption at rest: SSE-SQS (default, free) or SSE-KMS (customer key)
Encryption in transit: HTTPS endpoints
Access control: IAM policies + SQS resource policies (for cross-account access)
VPC Endpoint: Interface Endpoint for private access from VPC

Exam Tip

SQS: "Decouple services" = SQS. Standard = unlimited throughput, at-least-once, best-effort order. FIFO = strict order, exactly-once, 300/sec (3,000 batched), name ends .fifo. Visibility Timeout > processing time. DLQ for failed messages (same queue type). Always use Long Polling (WaitTimeSeconds=20). Max message = 256 KB (Extended Client Library for larger).