S3 cluster discovery extension inside docker container

Hello,

I am currently attempting to configure a broker cluster on the AWS cloud. Each broker is running inside a Docker container on an EC2 instance. Unfortunately, I am encountering an issue that prevents the broker from starting. The process stalls at the following log messages:

2023-07-23 21:45:45,199 INFO  - HiveMQ version: 4.16.0
2023-07-23 21:45:45,199 INFO  - HiveMQ home directory: /opt/hivemq-4.16.0
2023-07-23 21:45:45,202 INFO  - Log Configuration was overridden by /opt/hivemq-4.16.0/conf/logback.xml
2023-07-23 21:45:46,095 INFO  - Successfully loaded configuration from '/opt/hivemq-4.16.0/conf/config.xml'.
2023-07-23 21:45:46,268 WARN  - Soft limit for open files (65536) is lower than the recommended limit (1000000). Please increase the open file limit to at least the recommended limit.
2023-07-23 21:45:46,273 WARN  - Hard limit for open files (65536) is lower than the recommended limit (1000000). Please increase the open file limit to at least the recommended limit.
2023-07-23 21:45:46,289 INFO  - This node's ID is VkU1L
2023-07-23 21:45:46,290 INFO  - Clustering is enabled
2023-07-23 21:45:52,584 INFO  - No valid license file found. Using trial license, restricted to 25 connections.
2023-07-23 21:45:53,513 INFO  - This node uses '1' CPU cores.
2023-07-23 21:45:53,519 INFO  - Starting HiveMQ extension system.
2023-07-23 21:45:53,627 INFO  - Starting extension with id "hivemq-allow-all-extension" at /opt/hivemq-4.16.0/extensions/hivemq-allow-all-extension
2023-07-23 21:45:53,656 WARN  - 
################################################################################################################
# This HiveMQ deployment is not secure! You are lacking Authentication and Authorization.                      #
# Right now any MQTT client can connect to the broker with a full set of permissions.                          #
# For production usage, add an appropriate security extension and remove the hivemq-allow-all extension.       #
# You can download security extensions from the HiveMQ Marketplace (https://www.hivemq.com/extensions/).       #
################################################################################################################
2023-07-23 21:45:53,658 INFO  - Extension "Allow All Extension" version 1.0.0 started successfully.
2023-07-23 21:45:53,821 INFO  - Using TCP cluster transport on address 172.31.37.143 and port 7800
2023-07-23 21:45:53,836 INFO  - Using extension cluster discovery

To run the containerized broker, I am using the command docker run -p 8080:8080 -p 8883:8883 -p 7800:7800 --network=host broker. The IP address 172.31.37.143 corresponds to the private IP of the EC2 instance.
This is the config.xml:

<?xml version="1.0"?>
<hivemq xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:noNamespaceSchemaLocation="config.xsd">

    <listeners>
        <tls-tcp-listener>
            <port>8883</port>
            <bind-address>0.0.0.0</bind-address>
            <tls>
                <keystore>
                    <path>conf/keystore.jks</path>
                    <password>password</password>
                    <private-key-password>password</private-key-password>
                </keystore>
                <truststore>
                    <path>conf/truststore.jks</path>
                    <password>password</password>
                </truststore>
                <client-authentication-mode>REQUIRED</client-authentication-mode>
            </tls>
        </tls-tcp-listener>
    </listeners>
    <cluster>
        <enabled>true</enabled>
        <transport>
            <tcp>
                <bind-address>172.31.37.143</bind-address>
                <bind-port>7800</bind-port>
            </tcp>
        </transport>

        <discovery>
            <extension/>
        </discovery>
    </cluster>
    <anonymous-usage-statistics>
        <enabled>true</enabled>
    </anonymous-usage-statistics>

    <control-center>
        <listeners>
            <http>
                <port>8080</port>
                <bind-address>0.0.0.0</bind-address>
            </http>
        </listeners>
    </control-center>
</hivemq>

Interestingly, the broker functions correctly when not running inside a container, but when run within the container, it gets stuck at the ‘Using extension cluster discovery’ step. This issue is causing urgency, and I would greatly appreciate your assistance in resolving it. If additional information is needed, I am more than willing to provide it.

Thank you in advance for your help.

Hello @Fabio

Please enable the DEBUG log level so we can see verbose information on the broker startup process. Quickly looking at the information that you have shared it seems related to the credentials configured in S3 cluster discovery extension, I’m unsure which S3 credentials-type are you using but please check if the AWS credentials are being passed to the docker container.

For example, if you are using environment variables, as an option, you can pass them at the runtime as environment variables, give it a try using the sample command below

docker run -p 8080:8080 -p 8883:8883 -p 7800:7800 --network=host broker -e AWS_ACCESS_KEY_ID=XXXXXXX -e AWS_SECRET_ACCESS_KEY=XXXXXXX

Kind regards,
Diego from HiveMQ Team

Hello. I enabled debug logs, and it seems like that it ‘cannot add a discovery callback’:

2023-07-24 14:41:49,754 DEBUG - Simple authenticator added by extension 'hivemq-allow-all-extension'.
2023-07-24 14:41:49,754 INFO  - Extension "Allow All Extension" version 1.0.0 started successfully.
2023-07-24 14:41:49,930 INFO  - Using TCP cluster transport on address 172.17.0.2 and port 7800
2023-07-24 14:41:49,951 INFO  - Using extension cluster discovery
2023-07-24 14:41:50,082 DEBUG - Waiting for cluster discovery callback to be added.
2023-07-24 14:43:50,086 WARN  - No cluster discovery callback added within 2 minutes.
2023-07-24 14:43:50,088 INFO  - zgyal: no members discovered after 120006 ms: creating cluster as first member
2023-07-24 14:43:50,099 INFO  - Cluster nodes found by discovery: [zgyal|0] (1) [zgyal].
2023-07-24 14:43:50,102 DEBUG - State of node zgyal changed to UNKNOWN.
2023-07-24 14:43:50,103 DEBUG - Current cluster node states: {zgyal=UNKNOWN}
2023-07-24 14:43:50,139 DEBUG - State of node zgyal set to RUNNING
2023-07-24 14:43:50,147 DEBUG - Current cluster node states: {zgyal=RUNNING}
2023-07-24 14:43:50,148 INFO  - No user for HiveMQ Control Center configured. Starting with default user
2023-07-24 14:43:50,149 INFO  - Starting HiveMQ Control Center on address 0.0.0.0 and port 8080
2023-07-24 14:43:50,593 INFO  - Control Center Audit Logging started.
2023-07-24 14:43:50,596 INFO  - Started HiveMQ Control Center in 445ms
2023-07-24 14:43:50,915 INFO  - Enabled protocols for TCP Listener with TLS at address 0.0.0.0 and port 8883: [TLSv1.3, TLSv1.2]
2023-07-24 14:43:50,920 INFO  - Enabled cipher suites for TCP Listener with TLS at address 0.0.0.0 and port 8883: [TLS_ECDHE_ECDSA_WITH_AES_256_GCM_SHA384, TLS_ECDHE_ECDSA_WITH_AES_128_GCM_SHA256, TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256, TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384, TLS_ECDHE_RSA_WITH_AES_128_CBC_SHA, TLS_ECDHE_RSA_WITH_AES_256_CBC_SHA, TLS_RSA_WITH_AES_128_GCM_SHA256, TLS_RSA_WITH_AES_128_CBC_SHA, TLS_RSA_WITH_AES_256_CBC_SHA, TLS_AES_128_GCM_SHA256, TLS_AES_256_GCM_SHA384]
2023-07-24 14:43:50,922 INFO  - Starting TLS TCP listener on address 0.0.0.0 and port 8883
2023-07-24 14:43:50,975 INFO  - Started TCP Listener with TLS on address 0.0.0.0 and on port 8883.
2023-07-24 14:43:50,976 INFO  - Started HiveMQ in 129324ms
2023-07-24 14:43:51,609 DEBUG - Sent anonymous usage statistics

The EC2 has the IAM permission to have full access on every S3 on my account. The problem only occurs if I run it inside a container, with the following Dockerfile:

FROM hivemq/hivemq4:4.16.0

EXPOSE 8080:8080
EXPOSE 8883:8883
EXPOSE 7800:7800

RUN apt-get update && \
    apt-get install -y curl && \
    curl -LO https://releases.hivemq.com/extensions/hivemq-s3-cluster-discovery-extension-4.0.1.zip && \
    unzip hivemq-s3-cluster-discovery-extension-4.0.1.zip && \
    rm hivemq-s3*zip

COPY --chown=hivemq:hivemq s3discovery.properties extensions/hivemq-s3-cluster-discovery-extension/

COPY --chown=hivemq:hivemq  config.xml conf/config.xml

COPY --chown=hivemq:hivemq logback.xml conf/logback.xml

COPY --chown=hivemq:hivemq keystore.jks conf/keystore.jks

COPY --chown=hivemq:hivemq truststore.jks conf/truststore.jks

COPY broker-entrypoint.sh /opt/broker-entrypoint.sh

RUN chmod +x /opt/broker-entrypoint.sh

CMD ["/opt/broker-entrypoint.sh"]

Hey @Fabio

You are using an outdated S3 cluster discovery extension version, could you please update the latest version and share the results?

Kind regards,
Diego from HiveMQ Team

Hey, I updated che s3 cluster discovery extension but I still get the exact same issue. Could it be some network misconfiguration of docker?

These are the full final logs, as you can see, it blocks for 2 minutes on ‘Waiting for cluster discovery to be added’:

2023-07-24 15:58:35,099 DEBUG - Simple authenticator added by extension 'hivemq-allow-all-extension'.
2023-07-24 15:58:35,100 INFO  - Extension "Allow All Extension" version 1.0.0 started successfully.
2023-07-24 15:58:35,303 INFO  - Using TCP cluster transport on address 172.31.7.170 and port 7800
2023-07-24 15:58:35,322 INFO  - Using extension cluster discovery
2023-07-24 15:58:35,452 DEBUG - Waiting for cluster discovery callback to be added.
2023-07-24 16:00:35,456 WARN  - No cluster discovery callback added within 2 minutes.
2023-07-24 16:00:35,462 INFO  - JX1m4: no members discovered after 120006 ms: creating cluster as first member
2023-07-24 16:00:35,482 INFO  - Cluster nodes found by discovery: [JX1m4|0] (1) [JX1m4].
2023-07-24 16:00:35,486 DEBUG - State of node JX1m4 changed to UNKNOWN.
2023-07-24 16:00:35,487 DEBUG - Current cluster node states: {JX1m4=UNKNOWN}
2023-07-24 16:00:35,523 DEBUG - State of node JX1m4 set to RUNNING
2023-07-24 16:00:35,524 DEBUG - Current cluster node states: {JX1m4=RUNNING}
2023-07-24 16:00:35,532 INFO  - No user for HiveMQ Control Center configured. Starting with default user
2023-07-24 16:00:35,533 INFO  - Starting HiveMQ Control Center on address 0.0.0.0 and port 8080
2023-07-24 16:00:35,988 INFO  - Control Center Audit Logging started.
2023-07-24 16:00:35,990 INFO  - Started HiveMQ Control Center in 456ms
2023-07-24 16:00:36,363 INFO  - Enabled protocols for TCP Listener with TLS at address 0.0.0.0 and port 8883: [TLSv1.3, TLSv1.2]
2023-07-24 16:00:36,365 INFO  - Enabled cipher suites for TCP Listener with TLS at address 0.0.0.0 and port 8883: [TLS_ECDHE_ECDSA_WITH_AES_256_GCM_SHA384, TLS_ECDHE_ECDSA_WITH_AES_128_GCM_SHA256, TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256, TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384, TLS_ECDHE_RSA_WITH_AES_128_CBC_SHA, TLS_ECDHE_RSA_WITH_AES_256_CBC_SHA, TLS_RSA_WITH_AES_128_GCM_SHA256, TLS_RSA_WITH_AES_128_CBC_SHA, TLS_RSA_WITH_AES_256_CBC_SHA, TLS_AES_128_GCM_SHA256, TLS_AES_256_GCM_SHA384]
2023-07-24 16:00:36,367 INFO  - Starting TLS TCP listener on address 0.0.0.0 and port 8883
2023-07-24 16:00:36,416 INFO  - Started TCP Listener with TLS on address 0.0.0.0 and on port 8883.
2023-07-24 16:00:36,418 INFO  - Started HiveMQ in 129835ms
2023-07-24 16:00:37,132 DEBUG - Sent anonymous usage statistics

It tricks you into thinking that it works, but it does not, I checked the s3 bucket content, and is empty.

Hi @Fabio

Please verify if the s3 discovery extension is enabled in your broker container.

cd /opt/hivemq/extension/s3-cluster-discovery-extension/
ls -la 

Check whether the file DISABLED exists or not.

Thanks,
Dasha

The problem was in the Dockerfile. Very stupidly, I forgot to move the s3 discovery extension into the extensions folder of hivemq, I was just copying the s3discovery.properties file!

Thank you guys for your patience and help.

1 Like