Debugging MQTT over websockets on Envoy 1.28.0
I have migrated our Envoy installation from Envoy 1.11.1 to 1.28.0, and am now also using SNI for selecting the correct certificate.
A big part of that migration is upgrading the syntax of the configuration for Envoy from the v2 API to the v3 API.
The upgrade went well, except for our websocket-based MQTT service (based on VerneMQ) not working as expected.
At first I assumed, that the issue was in envoy. After trying many timeout options, and looking at the envoy documentation, I have decided to experiment with a new route, and a different broker (Mosquitto) behind it.
The following configuration works with Mosquitto as a broker, in case someone else stumbles into the same problem.
Here’s an excerpt of my envoy.yaml (the full configuration is over 87000 lines, generated by a template script, because of the SNI and having to have individual listeners per domain as I mentioned above):
static_resources:
listeners:
- address:
socket_address:
address: 0.0.0.0
port_value: 443
per_connection_buffer_limit_bytes: 32768 # 32 KiB
listener_filters:
- name: tls_inspector
typed_config:
"@type": type.googleapis.com/envoy.extensions.filters.listener.tls_inspector.v3.TlsInspector
filter_chains:
- filter_chain_match:
server_names: ["picockpit.com","www.picockpit.com","picockpit.com:443","www.picockpit.com:443"]
transport_socket:
name: envoy.transport_sockets.tls
typed_config:
"@type": type.googleapis.com/envoy.extensions.transport_sockets.tls.v3.DownstreamTlsContext
common_tls_context:
tls_certificates:
- certificate_chain: { filename: "/certs/letsencrypt/live/picockpit.com/fullchain.pem" }
private_key: { filename: "/certs/letsencrypt/live/picockpit.com/privkey.pem" }
alpn_protocols: [ "h2,http/1.1" ]
filters:
- name: envoy.filters.network.http_connection_manager
typed_config:
'@type': type.googleapis.com/envoy.extensions.filters.network.http_connection_manager.v3.HttpConnectionManager
stat_prefix: ingress_http
codec_type: AUTO
http_filters:
- name: envoy.filters.http.compressor
typed_config:
'@type': type.googleapis.com/envoy.extensions.filters.http.compressor.v3.Compressor
compressor_library:
name: text_optimized
typed_config:
'@type': type.googleapis.com/envoy.extensions.compression.gzip.compressor.v3.Gzip
compression_level: BEST_SPEED
compression_strategy: DEFAULT_STRATEGY
memory_level: 9
window_bits: 15
chunk_size: 16384
- name: envoy.filters.http.router
typed_config:
"@type": type.googleapis.com/envoy.extensions.filters.http.router.v3.Router
common_http_protocol_options:
idle_timeout: 3600s # 1 hour
use_remote_address: true
xff_num_trusted_hops: 0
route_config:
virtual_hosts:
- name: backend
domains: ["picockpit.com", "www.picockpit.com", "picockpit.com:443", "www.picockpit.com:443"]
routes:
- match: { path: "/pidoctor"}
redirect:
path_redirect: "/raspberry-pi/pidoctor-raspberry-pi-system-health-monitor/"
- match: { prefix: "/pidoctor/"}
redirect:
path_redirect: "/raspberry-pi/pidoctor-raspberry-pi-system-health-monitor/"
- match: { prefix: "/mqtt/test" }
route:
prefix_rewrite: "/mqtt"
cluster: target_test
timeout: 0s
idle_timeout: 0s
upgrade_configs:
- upgrade_type: "websocket"
enabled: true
- match: { prefix: "/" }
route:
cluster: target_main
timeout: 0s
clusters:
- name: target_test
connect_timeout: 5s
per_connection_buffer_limit_bytes: 32768 # 32 KiB
type: STRICT_DNS
lb_policy: ROUND_ROBIN
load_assignment:
cluster_name: target_test
endpoints:
- lb_endpoints:
- endpoint:
address:
socket_address:
address: mosquitto-test.test-network
port_value: 8025
Note that I omitted a big part of the configuration of other services, routes, and did not give you the target_main cluster information (because it is irrelevant to the MQTT over websockets situation).
Notice the timeout: 0s value, which is important for MQTT connections to continue instead of being timed out after 15sec as is the default.
I have also highlighted other parts which are, in my opinion, relevant to allow connections to be upgraded to websockets (so MQTT can be carried through them). Note also the port numbers being passed in as additional domain matches.
Mosquitto docker-compose.yml:
version: '3.6'
services:
mosquitto:
image: eclipse-mosquitto
container_name: mosquitto-test
hostname: mosquitto-test
networks:
- test_net
restart: "no"
user: "root:root"
volumes:
- type: bind
source: ./mosquitto.conf
target: /mosquitto/config/mosquitto.conf
networks:
test_net:
external:
name: test-network
mosquitto.conf:
listener 8025
protocol websockets
allow_anonymous true
log_type all
Tool used to verify the connection:
Update 13.11.2023
MQTT is online again, with VerneMQ as well:
While I had restarted VerneMQ several times, apparently I did not wait long enough for it to stabilize itself. A colleague restarted it today, and it now works. It seems, that it takes 10 – 15 minutes (in our setup) to become fully responsive and work appropriately.
Therefore I can confirm that the configuration above for envoy also works with VerneMQ.
Lesson learned
If something is not working, try to replicate the problem in the interplay with another tool – if it works there, then possibly the problem is not in the first tool you’ve changed, but in the second tool which it needs to work with.
and some additional goodies:
Online Documentation
- https://www.envoyproxy.io/docs
- https://www.envoyproxy.io/docs/envoy/v1.28.0/ (specifically for the current envoy version)
- https://github.com/envoyproxy/envoy/blob/main/configs/envoyproxy_io_proxy_http3_downstream.yaml – example HTTP3 configuration
- https://codilime.com/blog/envoy-configuration/ – understanding Envoy
- https://pi3g.com/envoy-docker-and-websockets-debugging-and-configuration/ – about websockets and envoy (websockets are used in PiCockpit to transport MQTT) – this previous article is based on v1.11.1 of envoy
Useful tools
- https://http3check.net/ – allows to check for HTTP/3 support
- HiveMQ websocket client