I am designing a solution to manage MQTT devices & have a question about message fan in – where a large number of clients (100K+) publish messages to a single topic tree.
In order for the solution to manage devices, it needs to know some information about each and every device. At startup each client sends a message which includes the information the solution needs for device management.
The question is… which topic should each client publish to?
Option 1: Single topic tree
Clients publish to: clientinformation/[client-id]
Concern: Message fan-in.
In this approach multiple clients are publishing to one solution side client subscribed to the wildcard topic ‘clientinformation/+’.
Having one client responsible for processing these messages raises concerns around latency and rate limiting. This also seems to give us no ability to scale horizontally.
Are these concerns valid? Are there well known solutions to this type of problem?
Option 2: Partitioned topics
Assuming the concerns from Option #1 are valid, I am considering the alternative where each client is assigned a partition as part of the bootstrapping process (when they discover the broker endpoint).
This partition information would be included in the topic tree to allow for concurrent processing.
Topic template: clientinformation/[partition]/[client-id]
ClientA publishes to clientinformation/1/clienta
ClientB publishes to clientinformation/2/clientb
ClientC publishes to clientinformation/N/clientc
The solution would then deploy N clients, each subscribed to a single partition:
Solution1 subscribes to clientinformation/1/+
Solution2 subscribes to clientinformation/2/+
SolutionN subscribes to clientinformation/N/+
Concerns here are:
- Is this a problem we need to solve, or is this strictly a theoretical concern
- Are there well established solutions to this problem?
- How we’ll reliably deliver this partition information to devices.
- How or if we’ll need to repartition.