Back to posts
May 20, 2026
20 min read

Amazon Kinesis Data Streams & Data Firehose: From Real-Time Ingestion to Automated Delivery into S3, Redshift, OpenSearch

You’re building an analytics system for an e-commerce application. Thousands of events pour in every second — clickstream, page views, add-to-cart, purchases — and you need to land all this data into S3 for analytics with Athena, while also pushing it into OpenSearch for the product team’s real-time dashboard.

The simplest approach? Write a consumer application that polls data from the stream, transforms it, then writes to S3. But then you have to manage batching (grouping many records into one large file instead of writing each record individually), compression, format conversion, retry logic when S3 returns errors, monitoring throughput, scaling the consumer when traffic spikes… You spend weeks building the “pipeline” before ever touching the actual analytics work.

This is exactly the problem that the combo of Amazon Kinesis Data Streams and Amazon Data Firehose solves. Kinesis Data Streams handles the ingestion — receiving and storing real-time data streams with linear scalability. Data Firehose handles the delivery — automatically transforming, converting formats, buffering, and writing data to storage without you writing a single line of consumer code.

This post will deep dive into both services — from Kinesis Data Streams’ shard architecture (very similar to Kafka partitions), the fixed limits per shard, fan-out modes, to Data Firehose’s internal pipeline: Lambda transformation, Glue format conversion, buffering internals, and the technical reasons why Firehose must buffer data before writing.


1. What is Kinesis Data Streams?

Amazon Kinesis Data Streams (KDS) is a fully managed real-time data streaming service from AWS — enabling applications to collect, process, and analyze continuous data streams with extremely low latency (under 200ms).

If you’ve read the Apache Kafka deep dive, the architecture of Kinesis Data Streams will feel very familiar. Both are based on the distributed commit log model — data is written sequentially to distributed logs, consumers actively pull data and manage their own read position. The biggest difference is that Kinesis Data Streams is fully managed — you don’t need to manage brokers, worry about ZooKeeper/KRaft, or tune JVM heap or disk I/O. AWS handles all the underlying infrastructure.

Key characteristics:


2. Core Concepts

Since Kinesis Data Streams and Kafka share the same architectural model, the fastest way to understand Kinesis is to map the equivalent concepts:

Kinesis Data StreamsApache KafkaExplanation
StreamTopicLogical channel containing data. Example: stream clickstream, stream orders.
ShardPartitionUnit of parallelism — each shard is a sequential, append-only log. Data is distributed to shards based on partition key.
Data RecordMessage / RecordSmallest unit of data — includes partition key, sequence number, and data blob (max 1 MB).
Partition KeyPartition KeyString used to determine which shard a record goes to, via hash function: MD5(partition_key) → shard. Records with the same partition key always go to the same shard.
Sequence NumberOffsetUnique, incrementing identifier for each record in a shard — helps consumers know where they are in the stream.
KCL ApplicationConsumer GroupLibrary (Kinesis Client Library) that helps multiple consumers coordinate processing data from a stream — automatically distributes shards among consumer instances and manages checkpoints (similar to committing offsets).

Key Difference

With Kafka, each partition’s throughput depends on the broker’s hardware — disk speed, network bandwidth, RAM. You can upgrade broker hardware to increase per-partition throughput.

With Kinesis, each shard’s throughput is a fixed number set by AWS — it cannot be changed. Want more throughput? The only way is to add more shards (horizontal scaling). This is the core design point you need to understand before moving forward.


3. Shard — The Core Unit of Kinesis

3.1. Shard Limits — Fixed Numbers

Each shard in Kinesis Data Streams has fixed, unchangeable throughput:

MetricLimit
Write throughput1 MB/s per shard
Write records1,000 records/s per shard
Max record size1 MB per record
Read throughput (Shared)2 MB/s per shard — shared across all consumers
Read throughput (Enhanced)2 MB/s per shard per consumer — each consumer gets dedicated throughput

This means: if your stream needs to ingest 10 MB/s of data, you need at least 10 shards (10 shards × 1 MB/s = 10 MB/s). Need 100 MB/s? 100 shards. Total throughput scales linearly with the number of shards.

Like Kafka, producers use the partition key to determine which shard a record goes to: MD5(partition_key) produces a 128-bit hash value, which is mapped to the corresponding shard’s hash key range. Each shard owns a distinct range, and all ranges together cover the entire hash space.

3.2. Shared Fan-Out vs Enhanced Fan-Out

Kinesis provides two read modes that directly affect the throughput and latency consumers receive:

Shared Fan-Out (default) — consumers call the GetRecords API to poll data from shards. All consumers share the 2 MB/s per shard bandwidth. If 5 consumers read from the same shard, each gets on average ~400 KB/s. Average latency is about 200ms due to the polling mechanism.

const { KinesisClient, GetShardIteratorCommand, GetRecordsCommand } = require('@aws-sdk/client-kinesis') const client = new KinesisClient({ region: 'ap-southeast-1' }) const { ShardIterator } = await client.send( new GetShardIteratorCommand({ StreamName: 'clickstream', ShardId: 'shardId-000000000000', ShardIteratorType: 'LATEST', }) ) const { Records, NextShardIterator } = await client.send( new GetRecordsCommand({ ShardIterator, Limit: 100 }) )

Enhanced Fan-Out — consumers register via SubscribeToShard and receive data via HTTP/2 push. Each consumer gets a dedicated 2 MB/s per shard, shared with no one. 5 consumers reading the same shard? Each still gets the full 2 MB/s. Latency drops to ~70ms because data is pushed directly to the consumer instead of waiting for polls.

Before subscribing, you need to register a stream consumer — an identity representing your consumer application. Each stream consumer gets a dedicated 2 MB/s per shard, completely isolated from other consumers:

const { KinesisClient, RegisterStreamConsumerCommand, SubscribeToShardCommand } = require('@aws-sdk/client-kinesis') const client = new KinesisClient({ region: 'ap-southeast-1' }) const { Consumer } = await client.send( new RegisterStreamConsumerCommand({ StreamARN: 'arn:aws:kinesis:ap-southeast-1:123456789012:stream/clickstream', ConsumerName: 'analytics-consumer', }) ) const response = await client.send( new SubscribeToShardCommand({ ConsumerARN: Consumer.ConsumerARN, ShardId: 'shardId-000000000000', StartingPosition: { Type: 'LATEST' }, }) ) for await (const event of response.EventStream) { if (event.SubscribeToShardEvent) { const { Records, MillisBehindLatest } = event.SubscribeToShardEvent } }
CriteriaShared Fan-OutEnhanced Fan-Out
Delivery modelPull (GetRecords API)Push (SubscribeToShard, HTTP/2)
Read throughput2 MB/s per shard — shared2 MB/s per shard per consumer
Latency~200ms~70ms
Max consumersSoft limit, but bandwidth is sharedMax 20 consumers per stream
Additional costNoneExtra charge per consumer per shard-hour + data
When to useFew consumers, latency not criticalMany consumers, need low latency

3.3. Data Retention and Replay

Kinesis Data Streams stores data in the stream based on the configured retention:

During the retention period, consumers can replay — re-read data from any point in time. This is just like Kafka: if a consumer has a bug and processes the last 2 hours of data incorrectly, you fix the bug and have the consumer read from 2 hours ago — no records are lost.


4. Scaling Modes

Kinesis Data Streams offers two scaling modes that determine how you manage shards:

4.1. Provisioned Mode

You decide how many shards you need and manage them manually. When traffic increases, you perform shard splitting — splitting 1 shard into 2 new shards (halving the hash key range). When traffic decreases, you merge 2 adjacent shards into 1. Each split/merge operation takes a few seconds to a few minutes.

Billing is per shard-hour — you pay for the number of shards you provision, regardless of whether you use the full throughput.

4.2. On-Demand Mode

AWS automatically scales the number of shards based on actual traffic. You don’t need to manage shards — Kinesis automatically adds or removes shards as throughput changes. Default limit: auto-scales up to 200 MB/s write (equivalent to 200 shards) or double the peak throughput reached in the last 30 days — whichever is larger.

Billing is per stream-hour + per GB of data ingested and retrieved.

CriteriaProvisionedOn-Demand
Shard managementManual (split/merge)Automatic
Scale speedSeconds to minutesAutomatic, minutes
Throughput limitDepends on shards you provision200 MB/s write or 2× peak in 30 days
Pricing modelPer shard-hour (~$0.015/shard-hour)Per stream-hour + per GB ($0.08/GB in, $0.04/GB out)
When to usePredictable traffic, cost optimizationUnpredictable traffic, no capacity management needed

Note: On-Demand mode is great when you don’t know the traffic pattern in advance. But if traffic is stable and predictable, Provisioned mode is typically much cheaper — similar to how EC2 Reserved Instances are cheaper than On-Demand.


5. What is Amazon Data Firehose?

Kinesis Data Streams solves the ingestion problem — receiving and storing real-time data streams. But the next step — delivering data from the stream to storage (S3, Redshift, OpenSearch) — still requires you to write consumer code: poll data, transform, batch, write, handle errors, retry…

Amazon Data Firehose eliminates all that consumer code. It’s a fully managed delivery pipeline — you just configure the source and destination, and Firehose automatically receives data, transforms it (if needed), buffers, and writes to storage. No code to write, no servers to manage, no scaling to worry about.

If you think of Kinesis Data Streams as a highway where data flows continuously, then Data Firehose is the fleet of trucks dispatched to collect goods from the highway, repack them neatly (transform, convert format), and deliver them to the destination warehouse (S3, Redshift, OpenSearch).

Key characteristics:


6. Sources and Destinations

6.1. Sources — Where Does Data Come From?

Firehose receives data from multiple sources:

6.2. Destinations — Where Does Data Go?

AWS Destinations:

3rd-party Destinations:

Custom Destinations:

6.3. S3 Backup

Regardless of the primary destination, Firehose always supports S3 backup — saving a copy of the original data (before transformation) to a separate S3 bucket. Three options:

When the destination is S3 and All or Failed data backup is enabled, Firehose writes both raw data (original, pre-transform) and transformed data to S3 — but at different prefixes. This is useful for audit trails and debugging transformation errors.


7. Firehose Pipeline — Under the Hood

Now let’s dive into Firehose’s internals — what steps does data go through from reception to storage.

7.1. Receiving and Collecting Records

Firehose receives data through two APIs:

When receiving data from Kinesis Data Streams, Firehose automatically polls from all shards of the stream and collects records. You don’t need to manage this process.

7.2. Lambda Transformation

After receiving records, Firehose can invoke an AWS Lambda function to transform data before writing to storage. This step is optional — you only configure it when you want to modify the data.

How it works:

Firehose collects records into a small buffer (configurable, 1–3 MB or 60 seconds — whichever condition is met first), then sends that batch to the Lambda function. Lambda receives the batch of records, processes each one, and returns the result batch. Each record in the result is tagged with one of three statuses:

StatusMeaning
OkRecord was transformed successfully → continues through the pipeline
DroppedRecord was intentionally discarded (e.g., filtering invalid data) → not written
ProcessingFailedLambda couldn’t process the record → record is written to the S3 error bucket

Common use cases:

If the Lambda function fails or times out, Firehose retries up to 3 times. If it still fails, records are tagged as ProcessingFailed and written to the S3 error bucket — ensuring no data is lost.

7.3. Format Conversion — AWS Glue Data Catalog

After Lambda transformation (or immediately after receiving records if no Lambda is configured), Firehose can convert data format from JSON/CSV to Apache Parquet or Apache ORC — two columnar storage formats widely used in analytics.

Why columnar formats?

Traditional formats like JSON or CSV store data by row — each record stores all columns side by side. When an analytics query only needs 3 out of 50 columns, the system still has to read all 50 columns for every record and discard 47 that aren’t needed.

Columnar formats store data by column — all values of the same column are grouped together on disk. When a query only needs 3 columns, the system reads only those 3 columns, completely skipping the other 47. Results:

How Firehose performs conversion:

Firehose uses a schema from the AWS Glue Data Catalog — a centralized metadata store service from AWS where you define data structure (column names, data types). Firehose reads this schema to know how to deserialize the incoming JSON (parse JSON into data fields) and then serialize it back into Parquet/ORC (encode data in columnar format).

Note: Format conversion requires input data to have a consistent structure — all records must match the schema in the Glue Catalog. If a record has a field of the wrong type or is missing a required field, that record will fail conversion.

7.4. Buffering — Why Buffer Before Writing?

This is the most important section for understanding why Firehose is near real-time rather than real-time.

Instead of writing each record individually to storage as soon as it arrives, Firehose groups many records into a large batch before writing once. This grouping process is called buffering, and you can configure it with two criteria:

Firehose flushes when whichever condition is met first. Example: buffer size = 64 MB, buffer interval = 300 seconds. If after 120 seconds the buffer is already full at 64 MB → flush immediately. If after 300 seconds the buffer only has 10 MB → still flush.

Why buffer instead of writing each record immediately?

Four technical reasons:

1. Reduce API call costs. S3 charges $0.005 per 1,000 PUT requests. If you receive 1 million records per hour and write each one individually, that’s 1,000,000 PUT requests = $5/hour = $3,600/month. If buffer groups 1,000 records into 1 file, that’s only 1,000 PUT requests = $0.005/hour = $3.6/month. Saving ~99.9% on S3 PUT costs.

2. Better compression. Compression algorithms work by finding repeating patterns in data. Larger files mean more data to find patterns in, resulting in higher compression ratios. A 64 MB file compressed with Gzip might shrink to 10 MB (6:1). But 64,000 small 1 KB files? Each file is too small for the algorithm to find patterns → compression ratio is nearly negligible.

3. Avoid the “small file problem”. When running analytics queries on S3 with Amazon Athena or Apache Spark, each S3 file needs to be opened, headers read, parsed, then closed. 1,000 large files (64 MB each) process much faster than 1,000,000 small files (64 KB each) — because the overhead of opening/closing files dominates processing time when files are too small. This is a well-known problem in big data called the small file problem.

4. Columnar formats need large batches. Apache Parquet and ORC store data in row groups — a group of rows compressed and stored together. Row groups are most efficient with thousands to millions of rows. With just 1 row per file, Parquet’s metadata overhead can be larger than the data itself, and the columnar compression advantage is almost entirely lost.

Buffer configUse caseDelivery latencyCostFile size
1 MB / 60sNear real-time monitoring, alerting~60 secondsHigherSmall
64 MB / 300sGeneral analytics pipeline~5 minutesMediumMedium
128 MB / 900sCost-optimized data lake, batch analytics~15 minutesLowestLargest

Best practice: for S3 destinations used in analytics, use buffer size 64–128 MB to optimize cost and query performance. Only reduce to 1 MB when near real-time delivery is needed (e.g., monitoring dashboards that need data within 1 minute).

7.5. Writing to Destination

After the buffer is full (or time is up), Firehose writes the batch data to the destination:

When delivery fails after all retries, records are written to the S3 error bucket — ensuring data is never lost, even when the destination is unavailable.

7.6. S3 Backup — Preserving Original Data

Beyond the S3 error bucket for failed records, Firehose also supports source record backup — saving a copy of the original data (before Lambda transformation and format conversion) to a separate S3 bucket.

Why? Three reasons:


8. Important Considerations for Firehose


9. Kinesis Data Streams vs Data Firehose

These two services complement each other, they don’t compete. The comparison table below helps distinguish the role of each service:

CriteriaKinesis Data StreamsAmazon Data Firehose
Primary roleReal-time data ingestion & processingManaged data delivery to storage
LatencyReal-time (~70–200ms)Near real-time (~60s+)
Capacity managementManual (Provisioned) or auto (On-Demand)Fully automatic
Data storageYes (24 hours – 365 days)No
Replay capabilityYesNo
Consumer codeRequired (KCL, Lambda, SDK)Not needed
Data transformationConsumer handles itBuilt-in Lambda + Glue format conversion
DestinationsAny (depends on consumer)S3, Redshift, OpenSearch, 3rd party, HTTP
ScalingAdd shards (Provisioned) or autoFully automatic, no configuration needed
PricingPer shard-hour or per GB (On-Demand)Per GB ingested

Most common pattern: Combine both — Kinesis Data Streams → Data Firehose → S3. KDS receives real-time data, allowing multiple consumers to process it (real-time analytics, alerting). Simultaneously, Firehose consumes from the same KDS stream, automatically transforms, converts format, buffers, and writes to the S3 data lake for batch analytics. You get the best of both worlds: real-time processing and automated delivery.


Summary

Kinesis Data Streams and Data Firehose solve two different problems in a streaming data pipeline:

If you’re coming from the Kafka world, Kinesis Data Streams will feel very familiar — and Data Firehose is essentially the part that in Kafka you’d have to build yourself: consumer code, batching logic, format conversion, S3 writer. AWS has packaged all of that into a managed service.

Related