Debugging MQTT over websockets on Envoy 1.28.0

I have migrated our Envoy installation from Envoy 1.11.1 to 1.28.0, and am now also using SNI for selecting the correct certificate.

A big part of that migration is upgrading the syntax of the configuration for Envoy from the v2 API to the v3 API.

The upgrade went well, except for our websocket-based MQTT service (based on VerneMQ) not working as expected.

At first I assumed, that the issue was in envoy. After trying many timeout options, and looking at the envoy documentation, I have decided to experiment with a new route, and a different broker (Mosquitto) behind it.

The following configuration works with Mosquitto as a broker, in case someone else stumbles into the same problem.

Here’s an excerpt of my envoy.yaml (the full configuration is over 87000 lines, generated by a template script, because of the SNI and having to have individual listeners per domain as I mentioned above):

static_resources:
  listeners:
  - address:
      socket_address:
        address: 0.0.0.0
        port_value: 443
    per_connection_buffer_limit_bytes: 32768  # 32 KiB    
    listener_filters:
    - name: tls_inspector
      typed_config:
        "@type": type.googleapis.com/envoy.extensions.filters.listener.tls_inspector.v3.TlsInspector
    filter_chains:
    - filter_chain_match:
        server_names: ["picockpit.com","www.picockpit.com","picockpit.com:443","www.picockpit.com:443"]
      transport_socket:
        name: envoy.transport_sockets.tls
        typed_config:
          "@type": type.googleapis.com/envoy.extensions.transport_sockets.tls.v3.DownstreamTlsContext
          common_tls_context:
            tls_certificates:
            - certificate_chain: { filename: "/certs/letsencrypt/live/picockpit.com/fullchain.pem" }
              private_key: { filename: "/certs/letsencrypt/live/picockpit.com/privkey.pem" }
            alpn_protocols: [ "h2,http/1.1" ]
      filters:
      - name: envoy.filters.network.http_connection_manager
        typed_config:
          '@type': type.googleapis.com/envoy.extensions.filters.network.http_connection_manager.v3.HttpConnectionManager
          stat_prefix: ingress_http
          codec_type: AUTO

          http_filters:
          - name: envoy.filters.http.compressor
            typed_config:
              '@type': type.googleapis.com/envoy.extensions.filters.http.compressor.v3.Compressor
              compressor_library:
                name: text_optimized
                typed_config:
                  '@type': type.googleapis.com/envoy.extensions.compression.gzip.compressor.v3.Gzip
                  compression_level: BEST_SPEED
                  compression_strategy: DEFAULT_STRATEGY
                  memory_level: 9
                  window_bits: 15
                  chunk_size: 16384
          - name: envoy.filters.http.router
            typed_config:
             "@type": type.googleapis.com/envoy.extensions.filters.http.router.v3.Router          
          common_http_protocol_options:
            idle_timeout: 3600s # 1 hour

          use_remote_address: true
          xff_num_trusted_hops: 0
          route_config:
            virtual_hosts:
            - name: backend
              domains: ["picockpit.com", "www.picockpit.com", "picockpit.com:443", "www.picockpit.com:443"]
              routes:

              - match: { path: "/pidoctor"}
                redirect:
                  path_redirect: "/raspberry-pi/pidoctor-raspberry-pi-system-health-monitor/"
              - match: { prefix: "/pidoctor/"}
                redirect:
                  path_redirect: "/raspberry-pi/pidoctor-raspberry-pi-system-health-monitor/"


              - match: { prefix: "/mqtt/test" }
                route:
                  prefix_rewrite: "/mqtt"
                  cluster: target_test
                  timeout: 0s
                  idle_timeout: 0s
                  upgrade_configs:
                    - upgrade_type: "websocket"
                      enabled: true
                      
              - match: { prefix: "/" }
                route: 
                  cluster: target_main
                  timeout: 0s
  clusters:

    - name: target_test
      connect_timeout: 5s
      per_connection_buffer_limit_bytes: 32768 # 32 KiB
      type: STRICT_DNS
      lb_policy: ROUND_ROBIN
      load_assignment:
        cluster_name: target_test
        endpoints:
        - lb_endpoints:
          - endpoint:
              address:
                socket_address:
                  address: mosquitto-test.test-network
                  port_value: 8025

Note that I omitted a big part of the configuration of other services, routes, and did not give you the target_main cluster information (because it is irrelevant to the MQTT over websockets situation).

Notice the timeout: 0s value, which is important for MQTT connections to continue instead of being timed out after 15sec as is the default.

I have also highlighted other parts which are, in my opinion, relevant to allow connections to be upgraded to websockets (so MQTT can be carried through them). Note also the port numbers being passed in as additional domain matches.

Mosquitto docker-compose.yml:

version: '3.6'

services:
  mosquitto:
    image: eclipse-mosquitto
    container_name: mosquitto-test
    hostname: mosquitto-test
    networks:

      - test_net
    restart: "no"
    user: "root:root"
    volumes:
      - type: bind
        source: ./mosquitto.conf
        target: /mosquitto/config/mosquitto.conf


networks:

  test_net:
    external:
      name: test-network


mosquitto.conf:

listener 8025
protocol websockets

allow_anonymous true
log_type all

Tool used to verify the connection:

HiveMQ websocket client

Update 13.11.2023

MQTT is online again, with VerneMQ as well:

While I had restarted VerneMQ several times, apparently I did not wait long enough for it to stabilize itself. A colleague restarted it today, and it now works. It seems, that it takes 10 – 15 minutes (in our setup) to become fully responsive and work appropriately.

Therefore I can confirm that the configuration above for envoy also works with VerneMQ.

Lesson learned

If something is not working, try to replicate the problem in the interplay with another tool – if it works there, then possibly the problem is not in the first tool you’ve changed, but in the second tool which it needs to work with.

and some additional goodies:

Online Documentation

Useful tools