Confluent Platform VS Apache Kafka: What are the differences?
Updated: Jan 27
Apache Kafka is an open source software originally created at LinkedIn in 2011. It’s used by companies like Linkedin, Uber, Twitter and more than one-third of all Fortune 500 companies use Apache Kafka. It provides a framework for collecting, reading and analysing streaming data.
Apache Kafka works as a distributed publish-subscribe messaging system. In the Publish-Subscribe system, message producers are called publishers and message consumers are called subscribers. Kafka messages are persisted on the disk and replicated within the cluster to prevent data loss.
You can use Apache Kafka in many ways. For example for analizing metrics, as a messaging system or to track the activity on a website. You can see a few of the popular uses cases here: https://kafka.apache.org/uses
Following are a few benefits of Kafka:
- Reliability: Kafka is distributed, partitioned, replicated and fault tolerant.
- Scalability: Kafka messaging system scales easily without down time by adding additional nodes
- Durability: Kafka uses Distributed commit log which means messages persist on disk as fast as possible, hence it is durable..
- Performance: Kafka is capable of handling high-volume data. Also, it's able to support message throughput of thousands of messages per second. It is capable of handling these messages with a very low latency.
Now let's see what is Confluent.
Confluent inc is a company founded in 2004 by the original creators of Apache Kafka. It develops Confluent Platform and provides support, consulting, and training.
Confluent Platform includes Apache Kafka and additional (optional) add-ons.
There are open souce and community license components of Confluent Platform that are free to use: Rest Proxy, Schema Registry, KSQL and some connectors.
Then there are other components of Confluent Platform that are not free. These components are under the Enterprise license: Control Center, Replicator, Auto Data Balancer and Enterprise Connectors. The Confluent Platform Enterprise components are free forever on single Kafka broker or Free 30 day trial on unlimited Kafka brokers
Confluent Platform Community Components:
Provides access to Apache Kafka via HTTP protocol and it operates as RESTful WEB API. So you don't need to use the native Kafka protocol to produce messages, consume messages, view the state of the cluster and perform administrative actions.
Usually Apache Kafka doesn't make data verification. It just takes bytes from input and redistributes them in output. So if the producer sends bad data, or a field gets renamed or the data format changes, then the consumers break. The Schema Registry provides data compatibility. It's a separate component and it must be able to reject bad data. Producers and Consumers need to be able to talk to it. With the Schema Registry, the Kafka Clients can process messages with an Avro Schema without including the schema with every message. Instead, the schemas are stored in the Schema Registry.
It's the SQL engine for Apache Kafka to do real-time analysis of data in a topic. It allows you to write SQL queries to explore Kafka topic data easily.
Connectors are software that write data from an external data system into Kafka and from Kafka into an external data system. Confluent provides a list of additional supported connectors: Big Query Connector, ElasticSearch Connector, Amazon S3 Connector, Azure Blob Storage Connector, Cassandra Sink Connector, etc. You can see the entire list here: https://www.confluent.io/hub/
You can use these connectors to import and export data from some of the most commonly used data systems.
Confluent Platform Enterprise Components:
Control Center helps users manage and monitor Apache Kafka with a friendly dashboard and administrative tools. For example you can use the Control Center to monitor brokers, topics, connectors configurations and to optimize the Kafka cluster performance.
It allows to replicate data from one cluster to another cluster. So it provides disaster recovery protection to avoid data loss.
Confluent Auto Data Balancer monitors your cluster for number of nodes, size of partitions, number of partitions and number of leaders within the cluster, and shifts data automatically to create an even workload across your cluster.
The Confluent Operator provides deployment and management automation for Confluent Platform on Kubernetes ( using Helm ) including Apache Kafka, Zookeeper, Schema Registry, Connect, Control Center, Replicator and KSQL. Confluent Operator support the following environments: Amazon Elastic Kubernetes Service (EKS), Azure Kubernetes Service (AKS), Google Kubernetes Engine (GKE), OpenShift (3.9 or later), Pivotal Container Service (PKS), Vanilla Kubernetes (1.9 to 1.15) cluster