1. What is Kinesis?

Amazon Kinesis is a platform for real-time streaming data. It enables you to collect, process, and analyze data streams in real time. Kinesis has four services.

2. Kinesis Data Streams

Collect and process large streams of data records in real time.

  1. Real-time streaming (data available within ~200 ms)
  2. Data organized into shards: each shard = 1 MB/sec in, 2 MB/sec out
  3. Retention: 24 hours (default), up to 365 days
  4. Consumers: Lambda, KCL applications, Kinesis Data Firehose, Kinesis Data Analytics
  5. Supports replay: consumers can re-read data within the retention period
  6. Data is immutable: once written to a stream, it cannot be deleted
  7. Ordering: guaranteed within a shard (use partition key for ordering)
  8. Provisioned mode: you choose number of shards. On-demand mode: auto-scales.


Capacity

Enhanced Fan-Out

  1. Each consumer gets a dedicated 2 MB/sec throughput per shard (using SubscribeToShard)
  2. Push model: Kinesis pushes data to the consumer via HTTP/2
  3. Without Enhanced Fan-Out: all consumers share 2 MB/sec per shard (pull model)
  4. Use when: multiple consumers need high throughput from the same stream


3. Kinesis Data Firehose

Load streaming data into destinations for storage and analytics. The easiest way to get streaming data into AWS data stores.

  1. Fully managed, serverless. No shards to manage.
  2. Near real-time (60-second buffer minimum, NOT truly real-time)
  3. Auto-scales to match throughput
  4. Can transform data with Lambda before delivery
  5. Pay per GB of data ingested


Firehose Destinations

4. Kinesis Data Analytics

  1. Run SQL queries or Apache Flink applications on streaming data in real time
  2. Input: Kinesis Data Streams or Kinesis Data Firehose
  3. Output: Kinesis Data Streams, Firehose, or Lambda
  4. Use for: real-time dashboards, anomaly detection, aggregation, time-windowed analytics
  5. Fully managed, auto-scales

5. Kinesis Data Streams vs SQS

6. Kinesis Data Streams vs Firehose

Exam Tip

Kinesis: "Real-time streaming" = Kinesis Data Streams. "Load streaming data into S3/Redshift" = Firehose. "Real-time SQL on streams" = Kinesis Data Analytics. Streams = real-time (200ms), manages shards, replay OK. Firehose = near real-time (60s buffer), serverless, no replay. "Multiple consumers on same data" = Kinesis (not SQS). "Process and delete individual messages" = SQS (not Kinesis).