1. What is SQS?
Amazon SQS is a fully managed message queuing service that enables you to decouple and scale distributed systems. Producers send messages to a queue, and consumers poll the queue to process them.
Core Concept
SQS = asynchronous decoupling. Producer sends a message and moves on. Consumer picks it up later and processes it. If the consumer is slow or down, messages wait in the queue. This decouples producers from consumers, improving resilience and scalability.
2. SQS Standard Queue
- Unlimited throughput (unlimited messages per second)
- At-least-once delivery (a message may be delivered more than once)
- Best-effort ordering (messages may arrive out of order)
- Default retention: 4 days (configurable 1 minute to 14 days)
- Max message size: 256 KB (use Extended Client Library + S3 for larger)
- Low latency (<10 ms on publish and receive)
3. SQS FIFO Queue
- First-In-First-Out ordering guaranteed
- Exactly-once processing (deduplication within 5-minute window)
- Limited throughput: 300 messages/sec (without batching), 3,000/sec (with batching)
- Queue name MUST end with .fifo (e.g., my-queue.fifo)
- Message Group ID: messages in the same group are processed in order
- Deduplication ID: prevents duplicate messages (content-based or explicit ID)
4. Key SQS Concepts
Visibility Timeout
- After a consumer receives a message, it becomes invisible to other consumers
- Default: 30 seconds. Configurable: 0 seconds to 12 hours.
- If the consumer doesn’t delete the message before timeout expires, it becomes visible again (reprocessed)
- Set timeout > processing time to prevent duplicate processing
- Consumer can call ChangeMessageVisibility to extend the timeout
Dead-Letter Queue (DLQ)
- A separate SQS queue where messages go after failing processing N times
- MaxReceiveCount: after this many receive attempts, message moves to DLQ
- Use for: debugging failed messages, isolating poison messages
- DLQ must be the same type as the source (Standard → Standard DLQ, FIFO → FIFO DLQ)
- Redrive to Source: move messages from DLQ back to source queue for reprocessing
Long Polling vs Short Polling
SQS + Auto Scaling
- Use CloudWatch metric ApproximateNumberOfMessagesVisible to trigger ASG scaling
- Custom metric: queue depth / number of instances = messages per instance
- Scale out when backlog grows, scale in when backlog shrinks
- Common pattern: SQS → EC2 ASG consumer fleet
SQS Security
- Encryption at rest: SSE-SQS (default, free) or SSE-KMS (customer key)
- Encryption in transit: HTTPS endpoints
- Access control: IAM policies + SQS resource policies (for cross-account access)
- VPC Endpoint: Interface Endpoint for private access from VPC
Exam Tip
SQS: "Decouple services" = SQS. Standard = unlimited throughput, at-least-once, best-effort order. FIFO = strict order, exactly-once, 300/sec (3,000 batched), name ends .fifo. Visibility Timeout > processing time. DLQ for failed messages (same queue type). Always use Long Polling (WaitTimeSeconds=20). Max message = 256 KB (Extended Client Library for larger).