Message Queues: RabbitMQ vs. Kafka
Message Queues have become a key foundational component of any large scale software application. The importance of Databases as a dedicated software infrastructure for supporting Create, Read/Retrieval, Update and Delete (CRUD) operations is well understood by most application software developers and architects. However, Message Queues can enable building scalable, robust and high-performing applications that are vital to today’s software landscape. In this article, a description of what a Message Queue is, why and how it is useful in applications, a comparison of two popular message queues : RabbitMQ and Apache Kafka, as well as alternatives options are covered in detail. This deep dive into Message queues allowed us to apply the knowledge in the development of our application and should be an asset to other software architects as well.
Section 1: What are Message Queues:
Message Queues are dedicated infrastructure software applications that allow queues to be defined and managed. Applications can connect to Message Queues via the programming interfaces exposed by the Message Queue. These APIs include mechanisms to connect, enqueue and dequeue messages from the queue. When working with Message Queues, three common terminologies are used: Producers, Consumers and Broker (in different implementations, the terms may vary).
Producers : This is the portion in an application (typically a microservice that is closer to front-end interfaces) that is generating the work that needs to be performed by the system. Producers will queue the work items to Message queue to a particular queue based on the work request type. The queues themselves are typically defined via configuration.
Consumers: This is the module of the application (typically a backend microservice) that is subscribed to a particular queue. In an application, there are multiple instances of the consumers subscribed to a queue. Whenever, the producers enqueue work items to the queue, one of the consumers will get notified and it will start performing the required task.
Message Broker: This is the third-party software component where queues are defined. A message broker acts as a middleman between various components of a large-scale application. Along with the interface for defining queues, brokers provide rules for routing different types of messages to different queues. Furthermore, each message broker will have its own set of APIs that are used by producers and consumers of the queues. The most popular on-premise Message Brokers are : RabbitMQ and Apache Kafka. Cloud-based message brokers such as Amazon Simple Queue Service (SQS) and Azure Service Bus are also widely used for cloud based applications.
Section 2. Why use Message Queues:
Message Queues provide the following functionality to Web-Applications :
•Performance : Message queues enable asynchronous communication which allows Producers to add requests to the queue without waiting for them to be processed. Consumers get notified when a message is available and can then process messages. Thus, no component in the system is ever stalled waiting for another, thereby, optimizing data flow.
•Reliability : If one part of the system is ever unreachable, the other can still continue to interact with the queue. The queue itself can also be mirrored for even more availability and create more fault tolerance. The data in the queue still persists and one part of the system can still read or write to the queue.
•Scalability : Multiple instances of the application can add requests to the Queue without risk of collision. As queues get longer with these incoming requests, the workload can be distributed across a fleet of Consumers. Producers, Consumers, and the Queue itself can all grow and shrink on demand.
•Decoupling : Message Queues enable better modularity and separation of concerns by removing dependencies between components. The development, maintenance and troubleshooting of this decoupled application is significantly simpler.
Message Queue Features:
Key features of a Message Broker include:
Load Balancing: Multiple Consumers can subscribe to the same queue. When a message arrives, any one of the Consumers can receive it and start processing the message. The subsequent message would then be processed by the next available Consumer. This messaging pattern is known as the ‘Competing Consumers’ pattern. The Message Broker includes advanced features as well, such as assigning ‘Affinity’ to queues which allows consumers proximal to a particular Data center or Region to process the messages.
Fan-out: Each message is delivered to all the consumers. Fan-out can be useful when the same message needs to be sent to one or more queues with consumers who may process the same message in different ways. Further, each consumer can process the same message without affecting the other consumers.
Acknowledgement and Redelivery: Message Brokers provide a mechanism to Acknowledge messages has been processed. This feature helps in the cases where a Consumer may crash while processing a message. Message broker will detect that the Consumer is no longer active, the broker will then requeue the message and redeliver the same message to another Consumer. The business logic in the Consumer needs to account for re-processing of the messages.
Section 3: Traditional vs Log-Based:
Message Brokers in use can be categorized into two broad categories : Traditional and Log-Based. This classification is derived from two different schools of thoughts - Network packet-based transient operation vs. Database operation with permanent record.
Traditional Message Brokers:
Messaging with AMQP/JMS-style (Traditional Messaging in this discussion) can be thought of as using network packet-based paradigm for their operations. When a packet is sent over a network, it is transient in nature in the intermediate routers, with no permanent record. Similarly, once a AMQP style consumer has processed the message from the broker, and there is an acknowledgement, the message is deleted from the broker. The processed message is no longer available for other Consumers. New Consumers that are added can receive the message only after the time it has registered with the Broker. This messaging style can be used when processing a message takes a long time and message ordering is not a factor. In these cases, parallel processing of these messages can be performed by multiple consumers. Each message is assumed to be fully self-contained units of work, with no dependency on any other messages. The messaging system can then retain the message in-memory, further improving throughput.
E.g.: RabbitMQ, ActiveMQ, HornetQ, Azure Service Bus, Google Cloud Pub/Sub.
Log-Based Message Brokers:
The idea behind log-based message brokers is to follow the Database paradigm of permanent storage and combine that with notification mechanisms of messaging. This message queue can be thought of a log-file similar to ‘/var/log/messages’ in Linux/Unix systems. Each consumer simply reads the log file, and once it reaches the end, it watches for data to be appended to the file, similar to ‘tail -f’ command in Linux systems.
In Log-Based message brokers, the messages are assigned to a monotonically increasing sequence number. Consumers also maintains an offset of the message it has processed. Any new consumer can start processing all of the past available messages. The logs are further assigned to a ‘Partition’ to allow horizontal scalability. In Apache Kafka a ‘Topic’ is defined to group partitions that carry messages of the same type. All consumers assigned to a partition read the message sequentially, there-by ordered delivery of message is guaranteed. Load-balancing is achieved by creating a Consumer-group and assigning the group to a topic, only one consumer is assigned to a partition. One of the downsides of Log-based Message broker is that, If a particular message is expensive to process, it may lead to ‘Head-of-Line’ blocking. Log-based approach works best where each message is fast to process and message ordering is important.
E.g: Apache Kafka, Amazon Kinesis Streams.
Two of the most popular Message Queues in the industry: RabbitMQ and Apache Kafka, use these two different approaches, Table 1 provides a side-by-side comparison on key functional areas that can help decide the best approach for a given application.
Section 4: Message Queue vs Database Queue:
One school of thought is that instead of adding another infrastructure software like Message queue that needs to be installed, monitored and administered, why not use the Database as a queue? Finding Database Administrators is far easier than finding expertise with Message queue Administration.
For using the Database as a queue the following features needs to be implemented:
Custom code to inserts requests by Consumers and maintaining request states.
Consumer needs to be implement Polling of the database to detect new messages.
Database locks needs to be implemented to handle the case where multiple workers are accessing the same request.
Additional logic to mark a request that has been processed.
For simpler use case this added complexity in application layer may justify using database as a queue. Table 2 provides a comparison of Database Queue vs a traditional Message Queue.
Section 5: Cloud-Based Message Queues:
Amazon, Microsoft and Google all have developed cloud-based Message Brokers. All these brokers can be categorized in the same way as on-premise Message Brokers - Traditional vs Log-based.
Amazon Simple Queue Service (SQS): SQS offers two types of message queues:
Standard queues offer maximum throughput, best-effort ordering, and at-least-once delivery.
SQS FIFO queues are designed to guarantee that messages are processed exactly once, in the exact order that they are sent.
Azure Service Bus:
A multi-tenant cloud messaging service that can be used to send information between applications and services.
The asynchronous operations gives brokered messaging, first-in, first-out (FIFO) messaging, and publish/subscribe capabilities.
Section 6: Conclusion:
Message queues have become an integral part of all large-scale web applications. It is a crucial and essential component in developing a scalable, resilient and high performing Web-Application. In our in depth review, we researched two popular Message Queue solutions - RabbitMQ and Apache Kafka - for use with our application. We decided to use RabbitMQ as the primary use case was load-distribution. Rabbitmq provided a simpler mechanism to achieve horizontal scaling of the application, leading to better performance and resiliency.
References:
What is a message queue - https://aws.amazon.com/message-queue/
Designing Data-Intensive Applications, M. Kleppmann. O'Reilly, Beijing, (2017)
Rabbitmq vs Kafka https://jack-vanlightly.com/blog/2017/12/4/rabbitmq-vs-kafka-part-1-messaging-topologies
Amazon SQS - https://aws.amazon.com/sqs/
Azure Service Bus - https://azure.microsoft.com/en-us/services/service-bus/
DEBS '17: Proceedings of the 11th ACM International Conference on Distributed and Event-based Systems - Kafka versus RabbitMQ: A comparative study of two industry reference publish/subscribe implementations: Industry Paper Pages 227–238, Philippe Dobbelaere, Kyumars Sheykh Esmaili
RabbitMQ Best practices - https://www.cloudamqp.com/blog/2017-12-29-part1-rabbitmq-best-practice.html
Collibet India: Your gateway to innovative and excellent software development! We're a skilled team delivering tailored solutions with cutting-edge technology. Timely, budget-friendly projects to elevate your business. Let's partner for success!
For More: https://collibetindia.in