Skip to content

WebRTC signaling protocol

WebSocket-based offer/answer exchange between clients (browser, iPad app) and the lokust-stream server. Matches the implementation in session-vm/crates/stream-server/src/signaling/mod.rs.

Connection

Endpoint: ws://<host>:<port>/signal (or wss:// when behind TLS termination).

Messages are JSON text frames. Binary frames are ignored.

Message envelope:

{ "type": "<kind>", "...": "kind-specific fields" }

type names use snake_case. Unknown types are logged and ignored by the server.

Client → Server

offer

Client initiates WebRTC negotiation with its SDP offer.

{
  "type": "offer",
  "sdp": "v=0\r\n..."
}

The client is expected to have already: 1. Added a recvonly video transceiver to its peer connection. 2. Created an offer with pc.createOffer(). 3. Waited for ICE gathering to complete (non-trickle) OR plans to send ICE candidates separately (trickle not yet supported by the server). 4. Set the local description.

Server → Client

ready

Sent immediately after WebSocket handshake. Carries the server's view of this connection's session ID.

{
  "type": "ready",
  "session_id": "uuid-v4"
}

The client typically responds to ready by sending its offer.

answer

Sent after the server applies the client's offer and generates a response.

{
  "type": "answer",
  "sdp": "v=0\r\n..."
}

The client sets this as its RTCPeerConnection.remoteDescription.

error

Non-fatal error. The connection may or may not survive.

{
  "type": "error",
  "message": "description"
}

Happy path

Client                                  Server
  │                                       │
  │ ─────── WebSocket Upgrade ──────────→ │
  │                                       │
  │ ←───────── { ready, session_id } ──── │
  │                                       │
  │ ─────── { offer, sdp } ─────────────→ │
  │                                       │
  │ ←───────── { answer, sdp } ────────── │
  │                                       │
  │       (DTLS-SRTP over UDP)            │
  │       (video track RTP flowing)       │
  │                                       │

Codec negotiation

The server registers a single H.264 codec entry:

mimeType:   video/H264
clock_rate: 90000
sdp_fmtp:   level-asymmetry-allowed=1;packetization-mode=1;profile-level-id=42e01f
payload:    102

profile-level-id=42e01f is baseline profile, level 3.1 — compatible with essentially all modern browsers and iOS/iPadOS WebRTC implementations.

The server outputs H.264 byte-stream with AU alignment. The h264parse element emits SPS/PPS inline (config-interval=-1).

Multi-peer

Every WebSocket connection creates an independent RTCPeerConnection and TrackLocalStaticSample. A background task subscribes each peer to the shared encoded-sample broadcast channel and writes samples to the peer's video track.

Consequences: - One pipeline, N peers — encoded frames fan out, no re-encoding per client - If a peer's network is slow, its broadcast receiver lags; the sample is dropped with a warning. Other peers are unaffected. - No simulcast or layer selection yet — all peers receive the same stream

Not yet supported

  • Trickle ICE — the client must gather candidates before sending the offer. Latency impact on mobile networks is 0–3 seconds.
  • Audio — video only. Audio transceiver can be added when needed.
  • Data channels for input — the iPad app scaffold reserves this; server-side handler not implemented yet. Will follow the input_handler.py protocol from the Selkies reference (touch → uinput events) adapted to Rust.
  • Renegotiation — cleanup-then-reconnect is the current strategy. Expect to implement renegotiation when adaptive bitrate / resolution switching lands.

Reference implementations

  • Server: session-vm/crates/stream-server/src/signaling/mod.rs
  • Browser test client: session-vm/crates/stream-server/src/signaling/client.html (served at /)
  • iPad client: ipad/LokustIpad/Streaming/SignalingClient.swift