We have the problem that sometimes under a very high load our service does not receive the responses to their requests anymore.
It always happens for the same service (which has the highest load), for this example I will refer to it as Service A. Service A receives a lot of messages via Websocket. Those requests are sent to another service with a response topic. At somepoint the responses do not arrive at ServiceA anymore. We can see in the logs that the other services still receive the requests and send the responses to the response topic. We can also see that the reponses arrive at the hiveMq-Broker. But Service B (while still sending requests) just do not receive those responses anymore.
We do not know what causes this issue but for some reasons the subscriptions of Service A are removed or get lost?
In order to fix this issue we need to restart Service A. After it booted, it subscribes to the topics again and receives all the messages. There are no changes done to the service other then restarting it, which indicates that we subscribe correctly. Also all the other services we use in our system still are able to receive and send messages.
Does anyone faced similar problems? Is it a bug on hiveMQ-Broker side? Any kind of help is appreciated. I also created a bug-ticket (im not sure whether this is an actual hiveMq bug though) Responses do not arrive · Issue #495 · hivemq/hivemq-community-edition · GitHub
Hello @lBleibt ,
Thank you for the outreach, and welcome to the HiveMQ Community!
It sounds like we are currently encountering an issue where, when under heavy load, a particular service (Service A) stops receiving expected messages on its subscribed topics, and as a result does not deliver responses to the defined response topics. This can typically be rectified with a restart of the service, which triggers a re-subscription and message flow resumes once more.
With that in mind, I’d like to collect some additional details in order to best assist further :
- From the perspective of the HiveMQ broker, is ‘Service A’ mentioned above a connected client? If so, which client implementation does this service currently utilize?
- Does the HiveMQ broker logs indicate any failures, errors, warnings, or other messages associated with these topics, or this associated client ID? If possible, providing the client ID for ‘Service A’ and the logs themselves will allow us to review more deeply.
- If a testing client is connected to the HiveMQ broker, such as the MQTT CLI tool or the HiveMQ Websocket client, is connected and subscribed to the same topics as ‘Service A’ , are messages successfully received?
Let us know your thoughts, and we will be happy to assist further!
Best,
Aaron from the HiveMQ Team
Hey @AaronTLFranz,
thanks for the response. Unfortunately I did not saw the notification and therefore did not answer earlier.
- Yes Service A is a connected client. We currently use the hivemq-mqtt-client in version “1.3.1”
- At least I do not see any strange logs in the hiveMq. Seems like everything works fine
- We was not able to try it. This behaviour only occured on our production-environment. We did not had the time to keep it in this state for a long time to test some stuff, but instantly restarted the service in order for the system to work again. BUT since we have 2-3 instances of this service running to split the load, we noticed that both instances had the same issue at the same time. Therefore I would assume that something happend there that makes a resubscribe necessary. The response topics also always have a unique ID in them, but still all 3 would not receive responses on the same time.
Further on this mostly occured after deployments, when we have a huge load because of many reconnects.
If you have a private way to contact you, we have the hiveMq-Logs, a heap-dump and a backup of the persistence of the hiveMQ from the last crash. I would love to provide you all those information, but for obvious reasons I cannot post them publicly.
Best regards,
Lukas