High Availability Cluster Support

Ju-Ru · August 26, 2020, 1:00pm

Hello everyone,

I am currently evaluating HiveMQ’s cluster support in the enterprise version. First tests show a very promising behavior for scenarios in which cluster nodes fail and a failover of QoS 1 sessions has to happen for reconnecting clients with (clean_session=false). To build up some trust in the behavior, I would like to better understand (on a concept level) how HiveMQ implements the session replication. Would it be possible to share some details here, so I can verify that the behavior fits our needs?

Here are some questions I have - all relate to QoS 1 sessions & replica-count>1:

Is a successful replication of the session ensured before acknowledging a received message to a publisher? (In other words: Can a publisher be sure that a published message was replicated to at least one other broker node once it receives an acknowledgement?)
What steps need to be taken to achieve an in-order delivery of QoS 1 messages while maintaining high-availability features?

Thanks in advance for your support.

hivemq-support · August 31, 2020, 11:01am

Hi @Ju-Ru,

Welcome to the HiveMQ Community Forum!
If you let me know about your specific needs or scenario I can surely shed some light on how HiveMQ handles those.

In regards to your specific question.

Yes. HiveMQ ensures replication before sending the PUBACK to the sending client.
Messages are delivered in order (per publisher) by a HiveMQ automatically. No steps need to be taken.

From the MQTT spec

An Ordered Topic is a Topic where the Client can be certain that the Application Messages in that Topic from the same Client and at the same QoS are received are in the order they were published. When a Server processes a message that has been published to an Ordered Topic, it MUST send PUBLISH packets to consumers (for the same Topic and QoS) in the order that they were received from any given Client.

Looking forward to learning more about your use case to provide additional information.
Best regards,
Florian from the HiveMQ Team

Ju-Ru · September 1, 2020, 7:16am

Hi @hivemq-support,

thanks a lot for your reply.

We are looking for an MQTT broker that we can run in a multi-node Kubernetes environment as a reliable message bus among application components. It turns out that most cluster implementations focus on scalability first, but cannot meet our requirements for message delivery in case individual nodes in the cluster fail.
As part of the Kubernetes setup, we require a broker cluster that replicates state before acknowledging the receipt of messages. This way, the cluster should be able to deal with failed nodes in the sense that Kubernetes can spawn new instances that can immediately take over sessions of reconnecting clients. Here, clients would republish unacknowledged messages but can rely on the delivery of already acknowledged messages to all subscribers.

It is absolutely clear that there is no free lunch when it comes to these requirements. Strong requirements on the guaranteed delivery of messages will come with some performance penalties. Thus, I would like to learn a bit more on the performance impact that, for example, a scaling of the cluster and also network performance has. I would assume that the overhead is limited by the number of replicas configured. Can you share any details on the performance in different replication settings?
I am happy to also discuss with you more details via email if this is not the right channel.

Topic		Replies	Views
Using Request and Response will MQTT Broker guarantee message delivery even if the MQTT Broker is very busy withmillion messages been publish HiveMQ Community Edition	4	615	March 17, 2022
Session invalidation does not destroy non-acknowledged QoS 1 MQTT messages on broker HiveMQ Extension SDK	1	650	June 24, 2021
Impact of Retained Messages on Clustering HiveMQ Commercial Offerings	1	280	April 27, 2023
How to ensure in-order and lossless with low latency packet delivery from a publisher (mobile client) to a subscriber (web app)? HiveMQ Client Library	1	40	January 3, 2025
Persistant Session & Queuing Message, Session Expiry	8	960	December 14, 2022

High Availability Cluster Support

Related topics