Redpanda Managed AWS S3 Sink Connector

IntVerse.io
6 min readJul 6, 2023

--

Redpanda is a distributed streaming platform that is designed to handle large volumes of data and provide high throughput and low latency for real-time data processing.

It is specifically built to address the needs of modern data streaming applications.

  1. Performance: Redpanda is known for its high performance and low latency. It is built on top of the Raft consensus algorithm, which allows it to achieve strong consistency guarantees while maintaining high throughput.
  2. Scalability: Redpanda is designed to scale horizontally, allowing you to handle increasing data volumes and processing requirements. It can distribute data across multiple nodes in a cluster, providing fault tolerance and high availability.
  3. Compatibility with Kafka: Redpanda is API-compatible with Apache Kafka, which means you can leverage existing Kafka applications, tools, and ecosystems with Redpanda without any major code changes. This compatibility makes it easy for Kafka users to migrate to Redpanda if they require improved performance or scalability
  4. Simplified Operations: Redpanda offers a simplified operational experience compared to Kafka. It eliminates the need for an external ZooKeeper dependency, as it uses an embedded coordination service. This reduces the complexity of managing and operating the platform.
  5. Observability: Redpanda provides comprehensive observability features, including metrics, logs, and tracing, which help monitor and debug streaming applications. It offers built-in integration with popular observability tools like Prometheus and Grafana.
  6. Community and Support: Redpanda has an active and growing community of users and contributors. It is an open-source project, which means you can benefit from community-driven enhancements, bug fixes, and support.

Redpanda Managed Connectors

Redpanda Connectors allow Redpanda to integrate with other systems or applications. Connectors make it easy to ingest data from a variety of sources into Redpanda or to stream data from Redpanda to other systems.

Redpanda connectors are important because they allow organizations to seamlessly integrate Redpanda with other systems in their data infrastructure, such as databases, message queues, and data lakes.

By doing so, organizations can build end-to-end data pipelines that can process, transform, and store large volumes of data in real time, making it available for analysis, machine learning, and other data-driven applications.

Redpanda connectors come in different flavors, ranging from open-source connectors like Kafka Connect, to proprietary connectors developed by vendors that specialize in data integration. Regardless of the type of connector, the goal is the same: to enable organizations to leverage the power of Redpanda as a streaming platform, while minimizing the complexity of data integration.

This Blog focuses on Redpanda Managed AWS S3 Sink Connector which allows you to stream data from a Redpanda cluster to an AWS S3 bucket. The connector is fully managed by Redpanda, so you don’t need to worry about setting up or maintaining it.

Prerequisites to enable AWS S3 Sink Connector

  1. Redpanda Cluster
  2. Kafka Connector
  3. AWS account
  4. AWS S3 Bucket

AWS S3

S3(Simple Storage Service) is a cloud-based object storage service provided by Amazon Web Services (AWS). An S3 bucket is a container for storing objects (files) within the S3 storage system.

Redpanda Managed S3 Sink Connector

AWS S3 Bucket creation:

Login to AWS Console and go to services > storage > s3

Click on Create a S3 bucket

Fill in the bucket name and select the region

Keep the rest of the settings at default and click on Create bucket.

The bucket will be created as below.

Click on the bucket name and go inside

Click on permissions and edit the bucket policy

Update the policy and save the changes

Redpanda Console

The Redpanda Console provides a simple and efficient way to perform administrative tasks such as creating topics, updating topic configuration, managing partitions, and connectors configuration, and monitoring the health of the cluster.

Let’s look into how to configure the GCS Sink connector in Redpanda Console

AWS S3 Sink Connector

Login to the Redpanda console and go to the connectors section

Click on Create Connector and select the s3 sink connector and click Next

Provide connector name, key converter, and value converter class under configuration

Provide the topic and s3 details and click on next.

Review all the connector properties and click on Finish.

If there are any issues in the network, we might see the below error and this needs to be resolved by looking at the network configurations of the cluster.

Error:

Otherwise, the connector will be displayed in the list of connectors with the “Running” status

There you complete the configuration of the S3 Sink connector on the Redpanda.

Test

Objective: By producing a message on the Redpanda topic and utilizing the connector on Redpanda, the message will be transferred to the designated AWS S3 bucket.

Publish a message onto the topic “rptesttopic”

Click on Publish.

Now, check the target AWS S3 bucket

Click on the created object “rptesttopic-0–1” and then click on “open”

Observations

In the event that the bucket is unavailable or experiencing issues on the cloud, I revoked access to the bucket in order to replicate the situation. As a result, the connector on the Redpanda will fail since the user cannot be authenticated to the GCS cloud.

Despite this, if messages are still published on the topic, Redpanda will continue to maintain the offset of these messages.

Once the bucket is restored (with access rights set accordingly), Redpanda will not automatically send the outstanding events to the target. This is because the connector requires a manual restart in order to resume running status.

Any outstanding events will only be published to the target once a new event is published on the topic. As a result, the target will not receive all the outstanding events as separate payloads but instead will receive them as a single payload.

There were 4 events outstanding during the access issue and all 4 pushed to target in a single event

Congratulations! You have now learned how to use the S3SinkConnector to transfer a message that has been published on the Redpanda topic to an AWS S3 bucket.

Thanks for reading!!!

At IntVerse.io, we are excited to offer services for Redpanda, a modern streaming platform built for mission-critical production workloads.

Our team of experts can help your organization implement, manage, and optimize Redpanda to achieve maximum value and performance across any cloud providers and On-premises.

--

--

IntVerse.io

We Solve Platform & Integration Problems in the UniVerse