Just installed HiveMQ community edition. All went well…nice alternative to paho server.
Thanks to HiveMQ for making this version open source. Had a few questions about
metrics.
I installed the prometheus extension jar and had problems getting the extension to start up -
missing servlet class in classpath. Found the note about correcting the gradle build file to
include a dependency for this class and rebuilt the server jar. No problems in log file; but
still no one listening on port 9399 (i.e. extension runs but is not active). Then I noticed
an empty file in the extension directory - “DISABLE”. On a whim, I renamed it and viola,
HiveMQ is up and promethius and grafana started showing server metrics! Just wanted
to pass that along and would welcome any comments on alternate ways to get the
prometheus extension up and running.
Here’s the reason I’m sending this note. I’ve loaded the grafana dashboard described in
the HiveMQ document. Some of the metrics show up - and make sense - but, others are
missing. The output of prometheus ("localhost:9399/metrics) shows, after removing lots
of descriptive text, a set of name value pairs (e.g. “com_hivemq_messages_incoming_total_count 123”). This list matches the code found in
…/com/hivemq/metrics/MetricsHolder.java and HiveMQMetrics.java. But not all values are
being updated.
I appears that some of the values in the MetricsRegistry are not being updated by the server code and, after
looking at the json for the HiveMQ grafana dashboard, some of the value names formerly
used as data sources for monitoring no longer appear in the MetricsRegistry - and do not
appear in the server code base as well.
While some metrics (e.g. cluster info) don’t make any sense for the community edition,
others, like ping counters and total bytes throughput, would be nice (i.e. “comforting”)
to see when things appear to go wrong. Is this the result recent code refactoring in
HiveMQ or, maybe, just didn’t make the list when creating the community edition code
subset? I’d be glad to help re-insert metric code and contribute it back if it meets HiveMQ’s
plans.
All in all, great job. When the time comes, I now have a migration path to a cloud
environment.
nice to see your interest in the HiveMQ Community Edition and thank you for the warm words and praise.
The DISABLE file is generated if HiveMQ encounters an error while loading an extension or during extension execution, like when you have a unsatisfied class error. I would guess that it was still leftover from the earlier extension start. This behavior is documented in the HiveMQ documentation https://www.hivemq.com/docs/latest/extensions/general.html#life-cycle but has not been added to the community edition wiki.
Right now only a view easily understood metrics are available in the community edition. These are:
At the moment we would like to keep it this way. You are correct in noticing that most other metrics do not make sense and others require a lot of knowledge in MQTT/broker interna. This may change in the future.
Thanks for your note. “DISABLE” file now makes sense - your explanation exactly matches the sequence I went through. Feels good to understand how things work a bit better.
As for metrics, your list for the community edition looks pretty good. The only additions I might suggest is pingreq and pingresp. But, that’s clearly a matter of your priorities. I
would note that the current contents of MetricsHolder.java isn’t an exact match for the
list you provided. I’m guessing that the MetricsRegistry provides the list of key:value
pairs that the prometheus outputs…at least that seems to be the case from scraping
localhost:9399/metrics results.
I’d be glad to make the Metrics/ classes consistent with the above and see what happens.
But, since there are currently no getXXXCounter() methods for some metrics in your list,
I suspect that not all of them are being updated in the current server code. At least, I
can’t find update accesses for some of these metrics by greping through /src… Not a
criticism, just trying to add some value to the community edition.
you wont be able to find metrics like com.hivemq.networking.bytes.read.total by greping for counter because they are gauges. You can look in the package com.hivemq.metrics[.handler] for them.
I’m not completely sure what the ping/pingresp metric would enhance over the connections.current one. I feel like the former would only be a more difficult to understand one over the later.
Additionally such a metric could be very easily implemented through a custom extension once this PR is merged.
Not a criticism, just trying to add some value to the community edition.
Don’t worry, your suggestions are completely welcome and we are very happy to consider the wishes of everybody interested in the project.