Hi there, in our hivemq-ce-2019.1 installation we observe different log entries and are quite unsure about their meaning. Could you give some information about the following log entries and the consequence of their occurrance:
"Tried to decrement a payload reference counter (123443870) that was already zero." (WARN)
This seems to be connected to message drops (reported in hivemq.messages.dropped.count), which I find a little surprising, because generally, decrementing a reference counter should happen after processing is complete.
Basically, decreasing such a counter more than it had been increased doesn’t sound like a situation that can happen due to the usual exceptional circumstances, like crashed/cancelled tasks. Can this be a bug?
"No message id available for client xyz" (ERROR)
After all I could find out, the messageIDs seem to be held per client and their number seems limited to 64k.
Now, if this ID pool is exhausted (and client xyz is a subscriber), does it mean that there are more than 64k messages not dispatched (=queued) to that client xyz?
We are using QoS=0, so I wonder why MessageIDs are generated at all, since afaik they are normally required only for QoS>0.
What happens in the case of this message? Will the message be dropped or does processing wait until a messageID gets available?
Any help and information is appreciated.
Greetings & thanks!
Thank you for being such an active community member and trying to help us improve the quality of HiveMQ CE, we really appreciate it.
In regards to your observations:
Generally speaking your assumptions are correct.
a) A message drop should not lead to decrementing a reference counter
b) QoS=0 messages do not have message IDs
We have not seen either of those occurrences with any HiveMQ CE version as of now.
It would be really helpful for us, if you could share some details about your use case, allowing us to reproduce and investigate this behaviour.
Cliff notes about the characteristics that would certainly help us:
Number of connected clients
Your config file
The reason given for the observed dropped messages
Number (and structure) of the topics that are used
Ratio of Publisher to Subscriber
Usage of Shared Subscriptions
Message Rate
Hardware specs of your machine
Information regarding any extensions that are used
Anything else you could think of that might be helpful to us is a welcome addition.
Thank you in advance!
Florian
Hi @hivemq-support,
first I want to apologize that it took so long for me to find the time for gathering the information.
Then I want to give a big shoutout to you that you care that much about the community and the issues they have!
That said, here is some info which hopefully helps to reproduce the issue:
Our usage scenario
A big number of hosts report various metrics via mqtt, which are then read by one subscriber (technically has been split into multiple subscribers) and stored into a influxdb.
Number of connected clients
20.000 clients (=tcp connections) coming from 4.000 hosts
config file
We consider our metrics as uncritical and would accept data losses in the case of hardware or network failure. To reduce side-effects by intensive I/O accesses on the local machine, we switched everything to in-memory.
You can find our config at the end.
The reason given for the observed dropped messages
Not quite sure if I understand the question. What I do is the following: First, I check the number of dropped messages from the metric ‘com.hivemq.messages.dropped.count’ captured by the ‘hivemq-influxdb-extension’.
In the chart of the according per-minute-rate, I see sporadic peaks, e.g. today at 12:02 (293 messages) and at 09:53 (1961 messages) - for the rest of the time, the value is zero.
When I then look at the logfile, I find exclusively for these times the messages ‘Tried to decrement a payload reference counter (…)’ in exactly the amount of the dropped messages (293 and 1961 in the example).
I think it is worth mentioning, that the number in the brackets of the messages is not unique, but can repeat several times in successive messages. You can see it in the logs example at the end of this post.
Number (and structure) of the topics that are used
We use unique topics for each time series we send, which are in total approx. 500.000 different topics. Since we send the values for each series every minute, this suit well the number of 450.000 mqtt messages per minute, as reported on ‘com.hivemq.messages.incoming.total.count’.
Ratio of Publisher to Subscriber
The 20.000 publishers are opposed by only a handful subscribers (currently 8).
Usage of Shared Subscriptions
We tested both, with and without subscriptions. Currently, the setup is without shared subscriptions. The total range of topics is distributed among 5 subscribers, which subscribe to distinct branches of the topic tree.
Message Rate
Message rate is around 450.000 msg/minute, according to ‘com.hivemq.messages.incoming.total.count’ and ‘com.hivemq.messages.outgoing.total.count’
Hardware specs of your machine
It is a VM over 8 cores on Intel® Xeon® CPU E5-2620 v4 @ 2.10GHz with 64 GB RAM
Information regarding any extensions that are used
only extension is ‘hivemq-influxdb-extension’ for collecting the HiveMQ metrics.
our config
<?xml version="1.0"?>
<listeners>
<tcp-listener>
<port>1883</port>
<bind-address>0.0.0.0</bind-address>
</tcp-listener>
</listeners>
<!-- HiveMQ has an automatic update check which writes to the log file when a new version is available. -->
<!-- The update check will send anonymized data about the HiveMQ installation. -->
<general>
<update-check-enabled>false</update-check-enabled>
</general>
<anonymous-usage-statistics>
<enabled>false</enabled>
</anonymous-usage-statistics>
<!-- Disk persistence disabled for max. performance -->
<persistence>
<client-session>
<!-- Changing the Client Session Persistence Configuration -->
<general>
<mode>in-memory</mode>
</general>
<queued-messages>
<!-- Limit the maximum queued messages per client -->
<!-- 2019-12-09: changed from 60 to 6000 to check whether less messages will be lost -->
<max-queued-messages>6000</max-queued-messages>
<!-- Discard the new message (not the oldest) if a new message arrives -->
<queued-messages-strategy>discard</queued-messages-strategy>
<!-- Use in-memory persistence -->
<mode>in-memory</mode>
</queued-messages>
<!-- Changing the Client Session Subscription Persistence Configuration -->
<subscriptions>
<mode>in-memory</mode>
</subscriptions>
</client-session>
<message-flow>
<!-- Changing the incoming message flow -->
<incoming>
<mode>in-memory</mode>
</incoming>
<!-- Changing the outgoing message flow -->
<outgoing>
<mode>in-memory</mode>
</outgoing>
</message-flow>
<!-- Changing the retained message to be in-memory -->
<retained-messages>
<mode>in-memory</mode>
</retained-messages>
<!-- Changing the publish payload persistence to be in-memory -->
<publish-payloads>
<mode>in-memory</mode>
</publish-payloads>
<!-- Changing the session attribute persistence to be in-memory -->
<attribute>
<mode>in-memory</mode>
</attribute>
<!-- Changing the client group persistence to be in-memory -->
<client-group>
<mode>in-memory</mode>
</client-group>
</persistence>
Example of logs
2020-02-03 12:02:04,143 WARN - Tried to decrement a payload reference counter (4370223082) that was already zero.
2020-02-03 12:02:04,143 WARN - Tried to decrement a payload reference counter (4370223082) that was already zero.
2020-02-03 12:02:04,143 WARN - Tried to decrement a payload reference counter (4370223087) that was already zero.
2020-02-03 12:02:04,143 WARN - Tried to decrement a payload reference counter (4370223087) that was already zero.
2020-02-03 12:02:04,143 WARN - Tried to decrement a payload reference counter (4370223088) that was already zero.
2020-02-03 12:02:04,143 WARN - Tried to decrement a payload reference counter (4370223088) that was already zero.
2020-02-03 12:02:04,143 WARN - Tried to decrement a payload reference counter (4370223088) that was already zero.
2020-02-03 12:02:04,144 WARN - Tried to decrement a payload reference counter (4370223088) that was already zero.
2020-02-03 12:02:04,144 WARN - Tried to decrement a payload reference counter (4370223088) that was already zero.
2020-02-03 12:02:04,144 WARN - Tried to decrement a payload reference counter (4370223088) that was already zero.
2020-02-03 12:02:04,144 WARN - Tried to decrement a payload reference counter (4370223088) that was already zero.
2020-02-03 12:02:04,144 WARN - Tried to decrement a payload reference counter (4370223088) that was already zero.
Please let me know if there is anything unclear or other questions arise.
Greetings & thanks!
Thank you so much for being an active member of the community and collaborating with us to make HiveMQ CE even better.
The information you provided is certainly useful.
I have a few comments/recommendations:
In regards to your config file:
You appear to have to checked the config based on an older HiveMQ version (3.x)
HiveMQ 4.x, which the HiveMQ CE is the core of, does no longer support in-memory persistence.
As there is no clustering in the HiveMQ CE as well, no persistence configuration is necessary or valid.
Check our Github Wiki for all configuration options available to the HiveMQ CE.
The tags in your config have no effect.
Hint: Check out our newest release. We improved our persistence performance by quite a bit!
In regards to the dropped messages:
If you check the event.log file, you will find the specific reason given for a message dropped.
I am interested in this information.
Did you assign an -Xms and -Xmx value for the heap of the HiveMQ process?
ok, that’s definitely a good hint - I will check the new version and the corresponding config options.
And thanks for pointing to the event.log - I totally had forgotten about that logfile.
After activating DEBUG level, I found the corresponding number of messages like this: 2020-02-12 16:34:54,411 DEBUG - Outgoing publish message was dropped. Receiving client: {client_id}, topic: {topic}, qos: 0, reason: The tcp socket was not writable.
It seems our client is not consuming the messages fast enough (a hint I already got from your colleague Michael in this topic). I definitely need to check this.
What will happen, when we switch back to shared subscriptions? Will the broker then skip a blocked client in the subscriber-group and distribute the messages among the other subscribers from the same group?
Finally: No, I didn’t explicitly specify -Xms or -Xmx parameters.