[webcodec_registry]: https://w3c.github.io/webcodecs/codec_registry.html
[mse]: https://developer.mozilla.org/en-US/docs/Web/API/Media_Source_Extensions_API
# Generic bidirectional WebSocket-based streaming transport specification
# Goals of this specification
- A simple format that is nearly trivial to implement
- Flexible enough to transport any codec, past, current or future
- TCP-based, with built-in framing
- Streams track data in either direction, or both at once
- Allow later upgrades to external transports (e.g. WebRTC) rather than in-line
- Compatible with prepackaged / non-raw codec data as well
# Basic WS packet types
- Text packets contain signalling messages in JSON format
- Binary packets contain raw codec data, with a 12-byte header
## Text packets
JSON data in plain text UTF-8 format. May be ignored if not an object.
May be ignored if there is no `type` member.
`type` member is a string listing the message type. Unrecognized `type`s should be (silently) ignored.
### Type: `info`
Contains a `msg` member with an informational-level (non-error) message.
May additionally contain further data in other members, formatting undefined.
### Type: `pause`
Request to pause or unpause playback, or a response that indicates playback was paused. If a response, `paused` is a boolean indicating the new state. Otherwise, it's a request.
### Type: `hold`
Request to pause, regardless of current playback state.
### Type: `stop`
Request playback stop. No other payload, usually the other side will disconnect upon receiving this.
### Type: `play`
Resume or start playback. Optionally may have `seek_time` member, which indicates the wanted starting playback position. If omitted, playback should either resume from where it was last paused, or if no playback occurred yet at the beginning for VoD / the live point for live.
### Type: `tracks`
Request to override the track selection. All members optional.
`video` is a string containing the new video selector.
`audio` is a string containing the new audio selector.
`meta` is a string containing the new meta selector.
`seek_time` is an integer (or numeric string) containing the target timestamp for a simultaneous seek
May result in a reply of type `tracks` with the same payload as `codec_data`.
### Type: `fast_forward`
Request to fast-forward playback to a specific timestamp. `ff_to` member contains requested timestamp.
### Type: `seek`
A request to seek to a different point in time for playback, or a response to such a request.
If `seek_time` is set, it's a request, and `seek_time` must hold the target timestamp or the string ``"live"`` to indicate the live point.
For responses, `data.play_rate_prev` has the previous playback rate, `data.play_rate_curr` has the current playback rate. Valid values are a multiplayer int or float (1 = real time, 2 = double speed, 0.5 = half speed), the string "auto" (determine best speed automatically) and the string "fast-forward" (play as quickly as possible).
Optionally, `data.live_point` contains a boolean that is true if the playback is attempting to stick to the live point of a stream.
Optionally, `error` may be set if a problem occurred during the seek, containing a string with the error message.
### Type: `request_codec_data`
Request for track index to codec mapping.
Optionally contains `supported_codecs` member containing an array of strings that list the supported codecs (if not, supported codec list is guessed/assumed).
### Type: `codec_data`
Response to `request_codec_data`, may also be sent at any time the other end considers useful (e.g. available track list changing, initial connection).
`data.current` contains current playback time, if available.
`data.codecs` contains an array of codec name strings.
`data.tracks` contains an array of corresponding track indexes.
### Type: `set_speed`
Either a request to change speed, or a notification that the speed has changed.
If `play_rate` is set, it's a request (and `play_rate` contains the requested speed). If `data` is set, it's a notification.
For notifications, `data.play_rate_prev` has the previous playback rate, `data.play_rate_curr` has the current playback rate. Valid values are a multiplayer int or float (1 = real time, 2 = double speed, 0.5 = half speed), the string "auto" (determine best speed automatically) and the string "fast-forward" (play as quickly as possible). Optionally, `data.live_point` contains a boolean that is true if the playback is attempting to stick to the live point of a stream.
### Type: `on_stop`
Indicates playback has stopped on the sending side. `data.current` has the position the stop happened at. `data.begin` the currently available first playback position, and `data.end` the currently available last playback position.
### Type: `on_time`
While playback is happening, regularly sent to indicate current playback position, speed, and available seek range.
`data.current` is the current playback position.
`data.begin` is the currently available first playback position.
`data.end` is the currently available last playback position.
`data.next` is the timestamp of the next-to-be-transmitted packet.
`data.play_rate_prev` has the previous playback rate, `data.play_rate_curr` has the current playback rate. Valid values are a multiplayer int or float (1 = real time, 2 = double speed, 0.5 = half speed), the string "auto" (determine best speed automatically) and the string "fast-forward" (play as quickly as possible).
`data.tracks` contains an array of track indexes that are being played.
`data.jitter` contains the estimated data jitter in milliseconds.
## Binary packets
The binary data format is determined by the requested URL path. It is not required to implement all types, unimplemented types can be denied by sending a non-1XX and non-2XX HTTP response to requests on their path.
### Payload format: MP4 data (`*.mp4` paths)
MP4 data consists of an MP4 mux suitable to be fed into a [MSE][mse] context.
### Payload format: EBML data (`*.mkv`, `*.webm` paths)
EBML data consists of an EBML mux suitable to be fed into a [MSE][mse] context.
### Payload format: Raw data (`*.raw`, `*.h264` paths)
Header format:
- 1 byte track index (0-indexed)
- 1 byte frame type (0 = regular, 1 = keyframe, 2 = init data)
- 8 bytes timestamp in milliseconds (network byte order == little endian)
- 2 bytes time offset in milliseconds (network byte order == little endian)
The header is followed by rest-of-the-packet bytes of raw codec data.
The timestamp and offset should be ignored when receiving type 2 packets and should be zeroed out when sending them.
The payload for type 2 packets after the header for all codecs is the full contents of libav's `codec_private` field (which is, incidentally, also identical to `CodecPrivate` elements in WebM/Matroska format).
Binary format is identical to the [WebCodec registration][webcodec_registry] for the codec in use, where "init data" is the DecoderConfig.
For codecs that have no WebCodec registration, ffmpeg's codec_private_data can be used for init data and ffmpeg's encoded frame data format for the other packets.