Persistance Session, Retry Message, and Failure Detection

HI Team,
I’m not clear with Persistence session and how it works. Can someone explain?.
I have also tried to implement many times but still its not working as expected.
Also I referred HiveMQ Documentation but its not clear for me.

Can you please explain this with simple example of how to achieve this?
Below attached the config.xml screenshot for reference.

Can you please send the expected output sample for Failure Detection.

Team, can anyone please give me some real time example for these topics.

Regards,
Harry

Hi Team,

@Daria_H

I got an solution for Persistence session but still I am not clear with Failure Detection.
Can you please help with this?

Regards,
Harry

Hi @Harry , are you looking for this post?

1 Like

Hi @Daria_H
Yes I am clear with persistence session, but I asking about Cluster Failure Detection (Heartbeat and TCP Health Checkup). I referred HiveMQ Documentation still is not clear, so can you please help me to solve this.
Regards
Harry

Hi @Daria_H
Any update on this?

Hi Harry,

The documentation for heartbeat and tcp health check as failure detection concepts is here, and explains “how” they work - but it sounds like you’ve looked at that already. What is it you are wanting to achieve or do with these concepts?

Thanks,
Seth - HiveMQ Support

1 Like

Hi @hivemq-support ,
Below attached my config code for heartbeat and health check-up.

<failure-detection>
            <tcp-health-check>
                <enabled>true</enabled>
                <bind-address>null</bind-address>
                <bind-port>9000</bind-port>
                <port-range>50</port-range>
            </tcp-health-check>

            <heartbeat>
                <enabled>true</enabled>
                <interval>5000</interval>
                <timeout>10000</timeout>
            </heartbeat>
        </failure-detection>

And my question is Where do I see if the heartbeat and health check are affected or not?
Can you please share simple output examples for Heart beat and Health checkup?

Hi Harry,

If you are running a cluster with more than one broker node, and the brokers’ logging levels are set to DEBUG, you should see something like this when one of the broker nodes leaves the cluster:

2022-12-27 20:29:18,150 INFO  - Cluster nodes found by discovery: [HceI5|16] (1) [HceI5].
2022-12-27 20:29:18,870 DEBUG - Previous running nodes [BBkvg, HceI5]
2022-12-27 20:29:19,252 INFO  - Cluster size = 1, members : [HceI5].
2022-12-27 20:29:19,258 DEBUG - Removed state of node BBkvg.
2022-12-27 20:29:19,259 DEBUG - Node left cluster: 'BBkvg'

In my example, I have a simple cluster containing 2 broker nodes. One node with the name “BBkvg” was shut down; the remaining node “HceI5”'s log showed the above output when it realized “BBkvg” had left the cluster.

Later when a new node “hm276” joins the cluster, “HceI5”'s log shows output like this:

2022-12-27 20:30:49,943 INFO  - Cluster nodes found by discovery: [HceI5|17] (2) [HceI5, hm276].
2022-12-27 20:30:49,951 DEBUG - New node discovered: 'hm276'
2022-12-27 20:30:49,951 DEBUG - State of node hm276 changed to UNKNOWN.
2022-12-27 20:30:49,952 DEBUG - Current cluster node states: {HceI5=RUNNING, hm276=UNKNOWN}
2022-12-27 20:30:51,979 DEBUG - State of node hm276 changed to NOT_JOINED was UNKNOWN.
2022-12-27 20:30:51,979 DEBUG - Current cluster node states: {HceI5=RUNNING, hm276=NOT_JOINED}
2022-12-27 20:30:51,986 DEBUG - Sending NOT_JOINED state notification for hm276 to all nodes.
2022-12-27 20:30:52,308 DEBUG - State of node hm276 changed to JOINING was NOT_JOINED.
2022-12-27 20:30:52,308 DEBUG - Current cluster node states: {HceI5=RUNNING, hm276=JOINING}
2022-12-27 20:30:52,837 DEBUG - Sending JOINING state notification for hm276 to all nodes.
2022-12-27 20:30:53,555 DEBUG - Replication of data to nodes [hm276] finished.
2022-12-27 20:30:54,959 DEBUG - State of node hm276 changed to RUNNING was JOINING.
2022-12-27 20:30:54,960 DEBUG - Current cluster node states: {HceI5=RUNNING, hm276=RUNNING}
2022-12-27 20:30:54,961 INFO  - Cluster size = 2, members : [HceI5, hm276].
2022-12-27 20:30:54,971 DEBUG - Sending RUNNING state notification for hm276 to all nodes.

If the logging level is INFO, you would see only the lines in the output above that show “INFO” after the date/timestamps (which would still give you an idea that nodes had left and joined the cluster.

I encourage you to try this with your test cluster to see the results. You can kill or stop one of the broker nodes in the cluster and observe similar results in the logs of one of the other broker nodes that is still running.

If you are running only one broker, you will not be able to replicate the above test since there are no other nodes for it to interact with via the “heartbeat” or “tcp health check” features.

I hope this helps!

Thanks,
Seth - HiveMQ Support

1 Like

Hi @hivemq-support
Thanks for your response, I followed the steps that you mentioned in previous message Now, Health checkup and Heartbeat are working as expected.

2022-12-28 14:47:26,961 INFO  - Started HiveMQ in 12449ms
2022-12-28 14:47:28,154 DEBUG - Sent anonymous usage statistics
2022-12-28 14:48:03,734 INFO  - Cluster nodes found by discovery: [Eqawe|1] (2) [Eqawe, J9vyX].
2022-12-28 14:48:03,736 DEBUG - New node discovered: 'J9vyX'
2022-12-28 14:48:03,739 DEBUG - State of node J9vyX changed to UNKNOWN.
2022-12-28 14:48:03,741 DEBUG - Current cluster node states: {Eqawe=RUNNING, J9vyX=UNKNOWN}
2022-12-28 14:48:03,853 DEBUG - State of node J9vyX changed to NOT_JOINED was UNKNOWN.
2022-12-28 14:48:03,854 DEBUG - Current cluster node states: {Eqawe=RUNNING, J9vyX=NOT_JOINED}
2022-12-28 14:48:03,857 DEBUG - Sending NOT_JOINED state notification for J9vyX to all nodes.
2022-12-28 14:48:03,870 DEBUG - State of node J9vyX changed to JOINING was NOT_JOINED.
2022-12-28 14:48:03,871 DEBUG - Current cluster node states: {Eqawe=RUNNING, J9vyX=JOINING}
2022-12-28 14:48:03,875 DEBUG - Sending JOINING state notification for J9vyX to all nodes.
2022-12-28 14:48:03,997 DEBUG - Replication of data to nodes [J9vyX] finished.
2022-12-28 14:48:04,956 DEBUG - State of node J9vyX changed to RUNNING was JOINING.
2022-12-28 14:48:04,956 DEBUG - Current cluster node states: {Eqawe=RUNNING, J9vyX=RUNNING}
2022-12-28 14:48:04,958 INFO  - Cluster size = 2, members : [Eqawe, J9vyX].
2022-12-28 14:48:04,959 DEBUG - Sending RUNNING state notification for J9vyX to all nodes.
2022-12-28 14:48:04,964 DEBUG - Removing redundant persistence entries after join
2022-12-28 14:48:05,026 DEBUG - Removed redundant persistence entries after join in 58ms

Thanks and Regards
Harry.