89 views
[webcodec_registry]: https://w3c.github.io/webcodecs/codec_registry.html [mse]: https://developer.mozilla.org/en-US/docs/Web/API/Media_Source_Extensions_API # Generic bidirectional WebSocket-based streaming transport specification # Goals of this specification - A simple format that is nearly trivial to implement - Flexible enough to transport any codec, past, current or future - TCP-based, with built-in framing - Streams track data in either direction, or both at once - Allow later upgrades to external transports (e.g. WebRTC) rather than in-line - Compatible with prepackaged / non-raw codec data as well # Basic WS packet types - Text packets contain signalling messages in JSON format - Binary packets contain raw codec data, with a 12-byte header ## Text packets JSON data in plain text UTF-8 format. May be ignored if not an object. May be ignored if there is no `type` member. `type` member is a string listing the message type. Unrecognized `type`s should be (silently) ignored. ### Type: `info` Contains a `msg` member with an informational-level (non-error) message. May additionally contain further data in other members, formatting undefined. ### Type: `pause` Request to pause or unpause playback, or a response that indicates playback was paused. If a response, `paused` is a boolean indicating the new state. Otherwise, it's a request. ### Type: `hold` Request to pause, regardless of current playback state. ### Type: `stop` Request playback stop. No other payload, usually the other side will disconnect upon receiving this. ### Type: `play` Resume or start playback. Optionally may have `seek_time` member, which indicates the wanted starting playback position. If omitted, playback should either resume from where it was last paused, or if no playback occurred yet at the beginning for VoD / the live point for live. ### Type: `tracks` Request to override the track selection. All members optional. `video` is a string containing the new video selector. `audio` is a string containing the new audio selector. `meta` is a string containing the new meta selector. `seek_time` is an integer (or numeric string) containing the target timestamp for a simultaneous seek May result in a reply of type `tracks` with the same payload as `codec_data`. ### Type: `fast_forward` Request to fast-forward playback to a specific timestamp. `ff_to` member contains requested timestamp. ### Type: `seek` A request to seek to a different point in time for playback, or a response to such a request. If `seek_time` is set, it's a request, and `seek_time` must hold the target timestamp or the string ``"live"`` to indicate the live point. For responses, `data.play_rate_prev` has the previous playback rate, `data.play_rate_curr` has the current playback rate. Valid values are a multiplayer int or float (1 = real time, 2 = double speed, 0.5 = half speed), the string "auto" (determine best speed automatically) and the string "fast-forward" (play as quickly as possible). Optionally, `data.live_point` contains a boolean that is true if the playback is attempting to stick to the live point of a stream. Optionally, `error` may be set if a problem occurred during the seek, containing a string with the error message. ### Type: `request_codec_data` Request for track index to codec mapping. Optionally contains `supported_codecs` member containing an array of strings that list the supported codecs (if not, supported codec list is guessed/assumed). ### Type: `codec_data` Response to `request_codec_data`, may also be sent at any time the other end considers useful (e.g. available track list changing, initial connection). `data.current` contains current playback time, if available. `data.codecs` contains an array of codec name strings. `data.tracks` contains an array of corresponding track indexes. ### Type: `set_speed` Either a request to change speed, or a notification that the speed has changed. If `play_rate` is set, it's a request (and `play_rate` contains the requested speed). If `data` is set, it's a notification. For notifications, `data.play_rate_prev` has the previous playback rate, `data.play_rate_curr` has the current playback rate. Valid values are a multiplayer int or float (1 = real time, 2 = double speed, 0.5 = half speed), the string "auto" (determine best speed automatically) and the string "fast-forward" (play as quickly as possible). Optionally, `data.live_point` contains a boolean that is true if the playback is attempting to stick to the live point of a stream. ### Type: `on_stop` Indicates playback has stopped on the sending side. `data.current` has the position the stop happened at. `data.begin` the currently available first playback position, and `data.end` the currently available last playback position. ### Type: `on_time` While playback is happening, regularly sent to indicate current playback position, speed, and available seek range. `data.current` is the current playback position. `data.begin` is the currently available first playback position. `data.end` is the currently available last playback position. `data.next` is the timestamp of the next-to-be-transmitted packet. `data.play_rate_prev` has the previous playback rate, `data.play_rate_curr` has the current playback rate. Valid values are a multiplayer int or float (1 = real time, 2 = double speed, 0.5 = half speed), the string "auto" (determine best speed automatically) and the string "fast-forward" (play as quickly as possible). `data.tracks` contains an array of track indexes that are being played. `data.jitter` contains the estimated data jitter in milliseconds. ## Binary packets The binary data format is determined by the requested URL path. It is not required to implement all types, unimplemented types can be denied by sending a non-1XX and non-2XX HTTP response to requests on their path. ### Payload format: MP4 data (`*.mp4` paths) MP4 data consists of an MP4 mux suitable to be fed into a [MSE][mse] context. ### Payload format: EBML data (`*.mkv`, `*.webm` paths) EBML data consists of an EBML mux suitable to be fed into a [MSE][mse] context. ### Payload format: Raw data (`*.raw`, `*.h264` paths) Header format: - 1 byte track index (0-indexed) - 1 byte frame type (0 = regular, 1 = keyframe, 2 = init data) - 8 bytes timestamp in milliseconds (network byte order == little endian) - 2 bytes time offset in milliseconds (network byte order == little endian) The header is followed by rest-of-the-packet bytes of raw codec data. The timestamp and offset should be ignored when receiving type 2 packets and should be zeroed out when sending them. The payload for type 2 packets after the header for all codecs is the full contents of libav's `codec_private` field (which is, incidentally, also identical to `CodecPrivate` elements in WebM/Matroska format). Binary format is identical to the [WebCodec registration][webcodec_registry] for the codec in use, where "init data" is the DecoderConfig. For codecs that have no WebCodec registration, ffmpeg's codec_private_data can be used for init data and ffmpeg's encoded frame data format for the other packets.