High Availability Cluster Support

Hello everyone,

I am currently evaluating HiveMQ’s cluster support in the enterprise version. First tests show a very promising behavior for scenarios in which cluster nodes fail and a failover of QoS 1 sessions has to happen for reconnecting clients with (clean_session=false). To build up some trust in the behavior, I would like to better understand (on a concept level) how HiveMQ implements the session replication. Would it be possible to share some details here, so I can verify that the behavior fits our needs?

Here are some questions I have - all relate to QoS 1 sessions & replica-count>1:

  • Is a successful replication of the session ensured before acknowledging a received message to a publisher? (In other words: Can a publisher be sure that a published message was replicated to at least one other broker node once it receives an acknowledgement?)
  • What steps need to be taken to achieve an in-order delivery of QoS 1 messages while maintaining high-availability features?

Thanks in advance for your support.

Hi @Ju-Ru,

Welcome to the HiveMQ Community Forum!
If you let me know about your specific needs or scenario I can surely shed some light on how HiveMQ handles those.

In regards to your specific question.

  1. Yes. HiveMQ ensures replication before sending the PUBACK to the sending client.
  2. Messages are delivered in order (per publisher) by a HiveMQ automatically. No steps need to be taken.

From the MQTT spec

An Ordered Topic is a Topic where the Client can be certain that the Application Messages in that Topic from the same Client and at the same QoS are received are in the order they were published. When a Server processes a message that has been published to an Ordered Topic, it MUST send PUBLISH packets to consumers (for the same Topic and QoS) in the order that they were received from any given Client.

Looking forward to learning more about your use case to provide additional information.
Best regards,
Florian from the HiveMQ Team

Hi @fraschbi,

thanks a lot for your reply.

We are looking for an MQTT broker that we can run in a multi-node Kubernetes environment as a reliable message bus among application components. It turns out that most cluster implementations focus on scalability first, but cannot meet our requirements for message delivery in case individual nodes in the cluster fail.
As part of the Kubernetes setup, we require a broker cluster that replicates state before acknowledging the receipt of messages. This way, the cluster should be able to deal with failed nodes in the sense that Kubernetes can spawn new instances that can immediately take over sessions of reconnecting clients. Here, clients would republish unacknowledged messages but can rely on the delivery of already acknowledged messages to all subscribers.

It is absolutely clear that there is no free lunch when it comes to these requirements. Strong requirements on the guaranteed delivery of messages will come with some performance penalties. Thus, I would like to learn a bit more on the performance impact that, for example, a scaling of the cluster and also network performance has. I would assume that the overhead is limited by the number of replicas configured. Can you share any details on the performance in different replication settings?
I am happy to also discuss with you more details via email if this is not the right channel.