MQTT Topic Tree Design best practices, tips & examples

By Maximilian Batz | 2019-05-29

Generic MQTT Background

With MQTT the sender and receiver are not aware of each other – the broker handles the messaging. This allows the messages to be separated in space, time, and intensity. The sender can send at the speed it wants, and the time it wants. The receiver can pick up the messages at the speed it wants, and the time it wants. The sender and receiver do not have to know each other, and they don’t have to have a direct route set up between them (as would be the case in e.g. a telephone line). MQTT allows to easily design multicast setups – where one sender can send to many subscribed receivers.

Clients do not know who has published the message originally – unless it is clear from the topic (and possibly enforced on the broker by ACLs), or from the message content (possibly signed with HMAC)
MQTT is designed for low-power, low-latency communication (HMAC etc. might be expensive)
You can’t get a list of all topics from a broker. The broker discards messages (and topics) to which no one is subscribed (possibly only fully true if the retain flag is not set)
You don’t know who subscribed to a topic (use ACLs to limit subscriptions of unauthorized clients)

last will and testament / keep alive

The clients can connect, publish a message and disconnect, or keep a persistent connection to the server. In the case of a persistent connection, keep-alives will be sent. If the client does not send keep-alives in the timeframe which was agreed during the connection setup, the server (broker) can send a so-called last-will and testament message, which the client submitted during connection establishment. This message can be anything and can be published to one single topic per connection. The broker will send it to all clients subscribed on this topic.

If the client disconnects gracefully, the last will and testament will not be published.

The keep alive is especially useful for mobile clients, where the connection might break (e.g. to bad network, tunnels, etc.)

Also see:

http://www.steves-internet-guide.com/checking-active-mqtt-client-connections/

http://www.steves-internet-guide.com/mqtt-last-will-example/

https://owntracks.org/booklet/tech/mqtt/

Client ID / Client take-over

Each client which connects to a MQTT broker should have a unique client id. Clients which connect with the same client id kill the previous connection and take over the new one. If you have two clients with the same client id, this can lead to a ping-pong of disconnected connections.

Retained messages

The last retained message to a topic is saved and delivered to all new subscribers of this topic. This can be used for communicating device state / configuration / etc.

You set the retained state as a flag per message.

Retained messages can be used for many interesting things, e.g. to ease auto-discovery (see Suggestion 4 below).

Quality of Service / QoS

There are three levels of quality of service, which determine how much overhead during the processing of a message is encountered. The higher the level, the more overhead.

0: no guarantees whatsoever
1: the message will be sent at least once (duplicates possible)
2: the message will be sent exactly once (no duplicates possible)

QoS can be set per message sent to the broker, and during subscription. The lower of the two values will be used for actual message delivery.

Have a look at this for more detail:

https://www.eclipse.org/paho/files/mqttdoc/MQTTClient/html/qos.html

Topics

Facts about topics

Topic names are case sensitive, and UTF-8 strings. It is advisable to limit yourself to printable ASCII characters for easier debugging.

Topic names must consist of at least one character to be valid.

is already a topic name

The $SYS topic is for broker usage / statistics.

You have a choice of placing information into the topic structure, or putting it into the message. A JSON structure for the message will aid you in being able to extend it with further data.

Consider that on every call to the broker, every publish the topic will be sent along in the payload of the MQTT packet – therefore try to avoid unnecessary length.

Consider wild cards usage

# matches everything for arbitrary level depth from the current level (multi level wildcard, can only be inserted at the end)
+ matches everything at the current level (single level wildcard)

e.g. if you have multiple temperature sensors, you could easily subscribe to messages for all of them if you design your topic tree correctly.

Wild cards are just for subscribing, they are not allowed for publishing.

You can subscribe to all topics using #

except the $SYS topic. Use $SYS/# to subscribe to all topics for $SYS

Possibly, using the “function” part before the device identifier allows you to subscribe to certain channels easier.

For example, if you define your paths as tele/room1/sensorname1 tele/room2/sensorname2 etc. and stat/room1/sensorname1 stat/room1/sensorname2 subscriptions are possible on:

tele/#

stat/#

If you do it the other way around, with sensor1/stat and sensor2/stat, etc. you can use

+/stat

to subscribe to stats, but will not match in arbitrary depth as in the previous example. That is, the path to your sensor must consist of precisely one component.

This way, if you prefix the action your paths will be more consistent.

See for example https://stevessmarthomeguide.com/setting-up-the-sonoff-tasmota-mqtt-switch/

Do not

· Don’t use a leading slash (e.g. /mytopics ) to start your topic tree – it adds overhead, no value

· Do not use $SYS as the starting topic, this is reserved for the broker

· Do not use spaces in the topic

· Do not use non-printable characters in the topic

· use one MQTT topic for messages with different meaning – create different topics for them

Suggestions / Do

· embed client id into the topic tree

· create specific topics for individual addressing (e.g. of sensors), instead of sending all values over one topic – this will avoid sending messages to recipients that do not need them

· try to keep the topic names short (as they will be sent on every publish and message)

· consider which data will need to be sent how often

o e.g. consider separating metadata out to it’s own topic, and publishing it with a “retain” flag

· separate out command/cmd and status topics for a device – this way bidirectional communication will be possible, and the device will not have to filter on it’s own messages.

· use a last-will & testament message to indicate an unexpected client disconnect

Consider multi-tenancy & Bridging

Do you need to support multiple clients / organisations / independent applications? There is the possibility of using, for instance with VerneMQ, different mountpoints. Another alternative is to include multi-tenancy in your topic tree (by using an appropriate prefix at the beginning of the topic tree).

I prefer the former route, as it provides an additional layer of ACL separation – and less opportunity to “screw things up”.

Additionally, you might want to bridge several MQTT brokers together, sharing part of their topic trees (as defined by their ACLs).

See this https://owntracks.org/booklet/guide/bridge/ for more about bridging.

Consider encryption

MQTT can be run over encrypted networks (e.g. over websockets / TLS encrypted) or over unencrypted channels. Additionally you can consider to encrypt and/or sign the payload of your data.

Consider discoverability

Consider building in topics which will aid discoverability of services and capabilities of your devices, e.g. by using retained messages.

Consider feedback about commands

When you publish a command, you do not have an immediate back channel. Therefore it might be useful to establish a hierarchy for specific feedback about specific commands – whether the device received and was able to execute the command, to be able to provide feedback to the user.

One pattern I was thinking of applying here was to give unique command ids as part of the message payload, which are referenced in the feedback channel with a status code.

Consider multicast scenarios

Do you want several devices to react to a single message you publish? Remember, you can’t publish to wildcard topics – only subscribe to them. In this case, additional topics should be created which allow the clients to react to multicast scenarios. All clients / all clients of a particular group would subscribe to the multicast topic and react to the messages.

If you combine this with my feedback / reference to command id suggestion above, a new requirement of client id as payload is introduced on the (generic) feedback channel. Another possibility is to have the devices respond on their individual feedback channels – where the device id is part of the path already. (That is, they receive a command on the multicast path, but reply in their own exclusive message topic hierarchy).

Consider traffic & privacy, Sensor types

If you are running a commercial system, e.g. collecting sensor data for users, they might want to be able to have a fine degree of control which sensor values are published at all.

Additionally, automatically simply publishing all sensor readings will increase traffic and load on your server(s).

Therefore simply pushing all available sensor values to your MQTT broker might not be the best choice. One possibility is to create a retained topic, with a configuration which the client can subscribe to. When the client connects, it will automatically receive the setup on the retained topic, and can adjust the values it publishes according to the setup. Additionally, it can react to configuration changes by reacting to any additional messages which arrive on this configuration topic.

Furthermore, it is important to consider that different types of sensors exist. For example, for a door sensor you would be interested in a status change (closed vs. open), whereas with a temperature sensor, you might want to receive continuous readings. It is important for your design to consider what kind of values the subscribers will be receiving when the client responsible for the sensor readings disconnects. Remember, that you can set a last will & testament only on one topic (for multi-sensor devices). In the case of a disconnect, retained messages might give a wrong impression (e.g. of the door being open whereas it is closed in reality, but not having been updated, etc.).

On the other hand, if, for instance, your webinterface which controls the client is connected only after the door-sensor client had been connected for a while, and you use non-retained messages, the message of the last door status update will be missed;

There are two possibilities here:

actively request an update of the door status (by e.g. using your own get topic)
use retained messages, and use webinterface client-side logic to discard the last value in case the client itself is not connected / mark it as stale

In any case, the user should be made aware whether a sensor reading can be trusted, or whether it is possibly incorrect due to a disconnected sensor-client.

Steve notes that combining sensor data from several sensors in a JSON encoded payload does not decrease traffic, but can even slightly increase it (vs. non-encoded payload!)

semantic vs. physical topic path

things meaningful to a human create the topic structure: (“semantic approach”)

home/bathroom/humidity

device paths (to which devices are the sensors attached, how are they addressed?) create the topic structure: (“physical approach”)

pi/00000234898324734/i2c/10/pressure

(the numbers symbolizing the BCM serial, and the I2C address respectively).

republishing

In this idea you would combine both approaches, and create an additional service which translates between the two worlds.

See: https://tinkerman.cat/post/mqtt-topic-naming-convention

Overview of online design suggestions

Suggestion 1: Raspberry-Valley

from: https://raspberry-valley.azurewebsites.net/MQTT-Topic-Trees/

device-category/device-id/payload-context/payload-differentiator

device-category: e.g. “pi”, “arduino”, etc.
device-id: e.g. BCM serial
payload-context: e.g. temperature
payload-differentiator: e.g. front / back (adding a level of measurement for a given context)

Note the symmetry betweend device-category/device-id and payload-context/payload-differentiator (generic-> specific; generic -> specific)

Suggestion 2: Steve

(By Steve): http://www.steves-internet-guide.com/mqtt-topic-payload-design-notes/

You can include the following items in your topic tree:

high level topic grouping devices
assigned sensor name
function (e.g. set / status / get / cmd )

Steve’s approach:

use topic name for individual device (e.g. a sensor)
use payload / JSON data for attributes of this sensor
use separate topic for data and commands

Suggestion 3: MQTT-Smarthome

MQTT Smarthome proposal

https://github.com/mqtt-smarthome/mqtt-smarthome

a path looks like this in this proposal:

toplevelname/function/item

knxgateway1/status/Kitchen/Lights/Front Left

first level (knxgateway1) = toplevelname, the concrete gateway being addressed

for multiple similar gateways this name must be settable to avoid collisions in the namespace

second level (status) = function
third level, and further levels -> individual address hierarchy of the particular gateway (item)

The interesting bit are the functions, and their position before the deeper paths (consider the wild cards for subscription – this way you can subscribe to all device status reports easily!)

Available / defined functions in this proposal:

status – for getting status reports
set – for requesting state changes
get – (optional) for actively requesting a state update (for gateways which support it)

the result of the read will be published to the status hierarchy.

The subsequent hierarchies for the individual verbs should match each other, so that you would be addressing the same device / node / parameter.

There is a special topic “connected” per gateway, which allows the client to set a last will & testament message

toplevelname/connected

simple values are defined for this topic:

0 = disconnected from broker (set as last will)
1 = connected to MQTT, disconnected from hardware
2 = connected to MQTT and hardware (fully operational)

note that the value 0 does not distinguish between voluntary disconnections or a lost connection (where the keep-alives timed out).

There are also suggestions for the payload:

JSON encoding of the payload is optional.

In the case of JSON encoding, the value will always be in the key “val”. Additionally,

ts : timestamp (timestamp, milliseconds since Epoch)

lc : last change of value (timestamp, milliseconds since Epoch)

Suggestion 4: Tinkerman

https://tinkerman.cat/post/mqtt-topic-naming-convention

consider whether you want a “semantic” approach (e.g. things meaningful to humans) or a “physical” approach (e.g. actual device paths in which your system is connected).

The physical approach is more machine friendly.

(Please note that Tinkerman’s and my understanding of mountpoints and their usage deviates – for me mountpoints are completely segregated topic trees, e.g. for multi-tenancy)

Metadata can be encoded as postfix paths.

e.g.

/home/bedroom/temperature -> 21

/home/bedroom/temperature/units -> °C

/home/bedroom/temperature/timestamp -> 2012-12-10T12:47:00+01:00

Note: since the units and some other metadata are unlikely to change, they could be sent as retained topics.

An interesting part of this suggestion is also the aggregation (device side):

/home/bedroom/temperature/last -> last temperature value

/home/bedroom/temperature/last/timestamp

/home/bedroom/temperature/last24h/max -> maximum value in the last 24h

/home/bedroom/temperature/last24h/max/timestamp -> timestamp where the temperature was this high

/home/bedroom/temperature/last24h/min

/home/bedroom/temperature/ever/max -> the absolute highest temperature ever measured

Again, these values could be set as retained topics (specifically “ever”) to save network traffic – the receiving system could discard older retained values (if the sensor is disconnected, as should be evident by another topic – possibly set by last testament & will), or the system could show them as “stale” values.

This is an interesting design (for the aggregation), and could serve as an inspiration for allowing this kind of flexibility.

Suggestion 5: Homie IoT Convention

https://homieiot.github.io/

Home IoT is focused on discoverability of nodes in the IoT Home Automation setting. Homie does not use JSON encoded messages, it uses a direct representation of the payload for each node. See this example:

Homie defines a strict character subset which you are allowed to use to name your nodes, as special characters (e.g. $ and the underscore _) are used for special purposes.

It defines datatypes, which are published as (retained) metadata (“attributes”) to $datatype of the individual property (see below for definition of property):

String
Integer
Float
Boolean
Enum
Color

(note the usage of the $ to mark special topics).

Last will is used to mark the device as lost:

homie/deviceID/$state -> lost

Homie defines 6 possible states for the device:

init -> the device has connected to MQTT but not published all necessary Homie messages yet
ready -> the device is connected, and has finished setup – all necessary messages have been sent
disconnected -> the device disconnected in a clean way
sleeping -> self-descriptive
lost -> the device has disconnected unexpectedly (set by last will & testament)
alert -> the device is connected, but something is wrong, needs human intervention

Homie recommends to use QoS 1 (as the nature of Homie makes it safe for duplicate messages).

A provision for extensions is made using a reverse domain name syntax, e.g. org.mycompany.homie-extension

nodes and devices

In Homie-parlance, a device is the basic hardware unit. For instance, a Raspberry Pi, a coffee machine, a car.

A node is a logical, independent unit of the device. For instance, a car might have a wheels node, an engine node, and a lights node.

properties represent basic characteristics of the node/device. A car, for instance, could have a “speed” and a “temperature” property for the engine node. Properties can be retained and/or settable.

retained + non-settable: e.g. temperature sensor
retained + settable: node can publish a property and receive commands for the property (e.g. lamp power)
non-retained + non-settable: e.g. a door bell (momentary events)
non-retained + settable: node can receive commands for the property, e.g. brew coffee

attributes:

the attributes are important for Homie autodiscovery, they are specified, and started with a $ to indicate their particular significance / avoid collisions. They are used to store and update metadata.

For example, the device publishes a retained message on

homie/device ID/$nodes

with a comma-separated list of the nodes. To indicate that a node is an array (can be used to group e.g. front lights and back lights in a car), the name is specified with rectangular brackets at the end, e.g.

lights[]

An example for the device id subtree:

Homie specifies different statistics for the device out of the box, most of them optional (except the uptime). For example:

With nodes, again autodiscovery is made possible for properties, by specifying them in the $properties attribute:

This way you, as the subscriber, don’t have to guess / subscribe all events – you can selectively subscribe to the ones of interest to you.

In Homie, the metadata (attributes) are published as a suffix to the path. E.g. for a particular property:

Control in Homie

Homie is state-based. This means, that you don’t “turn the light on”, but set the power state of a device to on.

then the device updates it’s power state to communicate that the command has been executed:

Broadcast

alert is an arbitrary choice – the broadcast can be to any other topic, which the devices can filter for. The messages are broadcast to all devices – devices can react if they want to.

This is the most complete specification I have been able to find in my online research. I personally feel it is very elegantly engineered.

https://homieiot.github.io/specification/

Suggestion 6: Owntracks

https://owntracks.org/booklet/tech/json/

http://owntracks.org/booklet/guide/topics/

Owntracks is an Open Source application, to allow you to track your own physical location, to share with friends, etc. The location is published via HTTP or MQTT, ideally to your own broker.

” It can also detect when you enter or leave a particular region for which you set a so-called waypoint. People use this, say, to control some aspect of their home-automation system. (Everybody left home? We can turn the lights off.)”

If you want to read more about OwnTracks, go to this website: https://owntracks.org/booklet/

About the owntracks topic design:

https://owntracks.org/booklet/guide/topics/

“The principles during the design of the OwnTracks topic-naming scheme were

human readability

traffic minimization

granular access control

“

Base topic name (others are possible):

owntracks/peter/iPhone

here owntracks is a prefix, to allow other applications without collisions on the MQTT broker.

peter is the owner of tracking devices, and iPhone is one of his devices.

commands to devices are sent on: owntracks/peter/iPhone/cmd

the output of the commands is published e.g. to relative topic names step, dump etc.

some other topics are defined (info, waypoint, event). Events are published, for instance, when a user enters a defined area.

This way subscribing to owntracks/+/+/event will allow you to see (for users you are authorized to see) when they enter or leave a defined area.

Owntracks publishes it’s messages in the JSON format:

https://owntracks.org/booklet/tech/json/

as you can see the functions are after the device.

Since there is a fixed “depth” for user/device known in advance, subscription to multiple users and devices can be done using wildcards:

In the JSON, the type of the particular JSON is passed along in the message:

This is a nice idea! This way, you can verify that your application understands the event correctly, can do type checks, etc.

The individual elements are specified in a non-verbose manner. This helps to save traffic, especially for messages which will be published frequently.

e.g.:

acc: accuracy of the reported location in meters (without unit)

many of the elements are optional.

Interestingly, the location type is defined for different devices, which support different elements.

A simple last will and testament message is published. It simply includes a timestamp tst, at which the device first connected.

tst is a Unix epoch timestamp.

an interesting bit about this event type / payload type is that several timestamps are specified:

wtst: Timestamp of waypoint creation
tst: Timestamp at which the event occured.

A waypoint is a point of interest, e.g. your garage – if you enter or leave your garage a transition event will be triggered:

event: (enter|leave)

A configuration event. If enabled, apps also accept remote configuration messages.

This event includes lots of possible elements, which are more verbose (e.g. validatecertificatechain) – this is sensible, since the configuration can be stored as JSON (! -> this can be passed around as MQTT message and stored!), and also since it is not sent often, verbosity is not a big problem, but rather an advantage for users.

e.g. an action with the “action”:”dump” will trigger a publish of the configuration message to the relative path dump

optional subparameters can be given for an action, for e.g. a timeframe to be looked at:

Another interesting design idea is that by a certain variation of this cmd message a browser can be opened to a certain URL. E.g., if you enter the garage, a garage control site could be opened, …

an array of a different message type can / must be passed as a parameter here.

The interesting bit about this message is that it also has a field for a Base64 coded PNG image:

the interesting idea here is that an optional _creator item can be passed along, which identifies the entity which created these waypoints.

Also note, that owntracks uses the underscore _ for “special” parameters.

optionally all these messages can be encrypted for transport with a shared symmetric key.

The encrypted payload (the original message) will be in the “data” element, Base64 encoded.

Messages

Message payload is binary and can be anything, there is no requirement that the messages are valid UTF-8.

Therefore you can also publish digital data, e.g. images, as payload on certain topics.

Many current IoT Dashboards take data encoded as JSON.

Generally there seem to be two schools of thought: where messages are actually raw values, and metadata is encoded on the topic tree, and JSON encoded values which carry metadata along as the payload.

suggestions

Use JSON format for message payload
bundle attributes into messages, instead of creating individual topics for them
include:

payload data
sensor / device id (if not part of the topic tree)
function
timestamp

Interesting Resources

Awesome MQTT: Collection of MQTT related links:

https://github.com/hobbyquaker/awesome-mqtt

individual projects:

twitter-to-mqtt – A python daemon that uses the Twitter Streaming API to access tweets and republishes them to an MQTT topic.

https://github.com/knolleary/twitter-to-mqtt

fritz2mqtt – Connect FRITZ!Box to MQTT.

https://github.com/akentner/fritz2mqtt

Ref:

Republishing:

Signal K for Marine usage (as possible inspiration for JSON format):

http://signalk.org/specification/1.3.0/doc/signalk.pdf

Posted in Development, VerneMQ and tagged best practices, dashboard, design, IoT, messages, mqtt, payload, protocol, protocol design, topic tree, transport