to answer your question:
A publisher is a specific script running on a specific host to collect data. Multiple such scripts are running on each host, each script publishing multiple metrics. And each single metric has a distinct topic, containing identifiers for host, script, metric.
That gives the total of approx. 20k connections to the broker.
Now about the progress with analysing the problem:
First, I wanted to recover the broker back to a normal state without restarting it. What I did was:
- removing all unnecessary subscribers (some monitors were simply counting messages)
- changing the required subscriber (data sink) from shared subscription (group of 12 clients) to distinct topic subscriptions (9 topic-branches distributed onto 5 clients)
- disabled debug logging (switched to INFO)
However, the broker did not recover and still was losing messages at roughly the same intensity.
On monday, I finally restarted the broker in the exact same configuration and it went fine for the rest of the week.
Today, as an attempt to recreate the problem, I started 5 subscriber clients (monitors) on different hosts, which each subscribed to the full topic-set.
After ~3 hours, the first messages occurred in the log, stating that messages have been dropped due to tcp socket not writable.
So it seems, I am at least able to reproduce the issue.
As suggested by you, I downloaded the Enterprise version and set it up. However, in the log it stated that unlicensed version will only accept 25 clients, which will not be enough to reproduce the problem.
So I didn’t continue there.
Another thing I was trying, was to install the hivemq-influxdb extension to the HiveMQ CE with the hope to get some metrics out of the broker. But I couldn’t manage to get it running. I mean, the broker was running and I saw no errors whatsoever in the logs, but I simply see nothing at all in the logs (and no connection to our influxdb). But that’s a different story…
Wish you a merry christmas!