WebRTC MediaStream Identification in the Session Description ProtocolGoogleKungsbron 2Stockholm11122Swedenharald@alvestrand.noexampleMSIDThis document specifies a Session Description Protocol (SDP) grouping
mechanism for RTP media streams that can be used to specify relations
between media streams.This mechanism is used to signal the association between the SDP
concept of "media description" and the Web Real-Time Communication (WebRTC) concept of
MediaStream/MediaStreamTrack using SDP signaling.Status of This Memo
This is an Internet Standards Track document.
This document is a product of the Internet Engineering Task Force
(IETF). It represents the consensus of the IETF community. It has
received public review and has been approved for publication by
the Internet Engineering Steering Group (IESG). Further
information on Internet Standards is available in Section 2 of
RFC 7841.
Information about the current status of this document, any
errata, and how to provide feedback on it may be obtained at
.
Copyright Notice
Copyright (c) 2021 IETF Trust and the persons identified as the
document authors. All rights reserved.
This document is subject to BCP 78 and the IETF Trust's Legal
Provisions Relating to IETF Documents
() in effect on the date of
publication of this document. Please review these documents
carefully, as they describe your rights and restrictions with
respect to this document. Code Components extracted from this
document must include Simplified BSD License text as described in
Section 4.e of the Trust Legal Provisions and are provided without
warranty as described in the Simplified BSD License.
Table of Contents
. Introduction
. Terminology
. Structure of This Document
. Why a New Mechanism Is Needed
. The WebRTC MediaStream
. The MSID Mechanism
. Procedures
. Handling of Nonsignaled Tracks
. Detailed Offer/Answer Procedures
. Generating the Initial Offer
. Answerer Processing of the Offer
. Generating the Answer
. Offerer Processing of the Answer
. Modifying the Session
. Example SDP Description
. IANA Considerations
. Attribute Registration in Existing Registries
. Security Considerations
. References
. Normative References
. Informative References
. Design Considerations, Rejected Alternatives
Acknowledgements
Author's Address
IntroductionTerminologyThis document uses terminology from . In addition, the
following terms are used as described below:
RTP stream:
A stream of RTP packets containing media data .
MediaStream:
An assembly of
MediaStreamTracks . One MediaStream can contain multiple
MediaStreamTracks, of the same or different types.
MediaStreamTrack:
Defined in as a
unidirectional flow of media data (either audio or video, but not
both). Corresponds to the term "source
stream". One MediaStreamTrack can be present in zero, one, or
multiple MediaStreams.
Media description:
Defined in as a set of
fields starting with an "m=" field
and terminated by either the next "m=" field or the end of the
session description.
The key words "MUST", "MUST NOT",
"REQUIRED", "SHALL", "SHALL NOT",
"SHOULD", "SHOULD NOT",
"RECOMMENDED", "NOT RECOMMENDED",
"MAY", and "OPTIONAL" in this document are to be
interpreted as described in BCP 14 when, and only when, they appear in all capitals, as
shown here.
Structure of This DocumentThis document adds a new Session Description Protocol (SDP) mechanism that can attach identifiers to the RTP
streams and attach identifiers to the groupings they form. It is
designed for use with WebRTC . gives the background on why a new
mechanism is needed. gives the definition of the new
mechanism. gives the necessary semantic
information and procedures for using the "msid" attribute to signal the
association of MediaStreamTracks to MediaStreams in support of the
WebRTC API .Why a New Mechanism Is NeededWhen media is carried by RTP , each RTP stream is distinguished inside an RTP
session by its Synchronization Source (SSRC); each RTP
session is distinguished from all other RTP sessions by being on a
different transport association (strictly speaking, two transport
associations, one used for RTP and one used for the RTP Control Protocol
(RTCP), unless RTP/RTCP multiplexing is used).SDP gives a format for describing an SDP
session that can contain multiple media descriptions. According to the
model used in , each media
description describes exactly one media source. If multiple media
sources are carried in an RTP session, this is signaled using BUNDLE
; if BUNDLE is
not used, each media source is carried in its own RTP session.The SDP Grouping Framework can be used to
group media descriptions. However, for the use case of WebRTC, there
is the need for an application to specify some application-level
information about the association between the media description and
the group. This is not possible using the SDP Grouping Framework.The WebRTC MediaStreamThe W3C WebRTC API specification specifies that communication between
WebRTC entities is done via MediaStreams, which contain
MediaStreamTracks. A MediaStreamTrack is generally carried using a
single SSRC in an RTP session, forming an RTP stream. The collision of
terminology is unfortunate. There might possibly be additional SSRCs,
possibly within additional RTP sessions, in order to support
functionality like forward error correction or simulcast. These
additional SSRCs are not affected by this specification.MediaStreamTracks are unidirectional; they carry media in one
direction only.In the RTP specification, RTP streams are identified using the SSRC
field. Streams are grouped into RTP sessions and also carry a CNAME.
Neither CNAME nor RTP session corresponds to a MediaStream. Therefore,
the association of an RTP stream to MediaStreams need to be explicitly
signaled.WebRTC defines a mapping (documented in ) where one SDP media description is
used to describe each MediaStreamTrack, and the BUNDLE mechanism is
used to group
MediaStreamTracks into RTP sessions. Therefore, the need is to specify
the identifier (ID) of the MediaStreamTrack and its associated MediaStream for each
media description, which can be accomplished with a media-level SDP
attribute.This usage is described in .The MSID MechanismThis document defines a new SDP media-level
"msid" attribute.
This new attribute allows endpoints to associate RTP
streams that are described in separate media descriptions with the
right MediaStreams, as defined in . It also allows endpoints to carry an identifier for
each MediaStreamTrack in its "appdata" field.The value of the "msid" attribute consists of an identifier and an
optional "appdata" field.The name of the attribute is "msid".The value of the attribute is specified by the following ABNF grammar:
msid-value = msid-id [ SP msid-appdata ]
msid-id = 1*64token-char ; see RFC 4566
msid-appdata = 1*64token-char ; see RFC 4566
An example "msid" value for a group with the identifier "examplefoo"
and application data "examplebar" might look like this:
msid:examplefoo examplebar
The identifier is a string of ASCII characters that are legal in a
"token", consisting of between 1 and 64 characters.Application data (msid-appdata) is carried on the same line as the
identifier, separated from the identifier by a space.The identifier ("msid-id") uniquely identifies a group within the scope
of an SDP description.There may be multiple "msid" attributes in a single media description.
This represents the case where a single MediaStreamTrack is present in
multiple MediaStreams; the value of "msid-appdata" MUST be identical for
all occurrences.Multiple media descriptions with the same value for "msid-id" and
"msid‑appdata" are not permitted.Endpoints can update the associations between RTP streams as
expressed by "msid" attributes at any time.The "msid" attributes depend on the association of RTP streams with
media descriptions but do not depend on the association of RTP
streams with RTP transports. Therefore, their Mux Category (as defined in
) is NORMAL; the
process of deciding on "msid" attributes doesn't have to take into
consideration whether or not the RTP streams are bundled.ProceduresThis section describes the procedures for associating media
descriptions representing MediaStreamTracks within MediaStreams, as
defined in .In the Javascript API described in that specification, each
MediaStream and MediaStreamTrack has an "id" attribute, which is a
DOMString.The value of the "msid-id" field in the MSID consists of the "id"
attribute of a MediaStream, as defined in the MediaStream's WebIDL
specification . The special value "-" indicates "no MediaStream".The value of the "msid-appdata" field in the MSID, if present,
consists of the
"id" attribute of a MediaStreamTrack, as defined in the
MediaStreamTrack's WebIDL specification.When an SDP session description is updated, a specific "msid-id"
value continues to refer to the same MediaStream, and a specific
"msid-appdata" to the same MediaStreamTrack. There is no memory apart
from the currently valid SDP descriptions; if an MSID "identifier" value
disappears from the SDP and appears in a later negotiation, it will be
taken to refer to a new MediaStream.If the "msid" attribute does not conform to the ABNF given here, it
SHOULD be ignored.The following is a high-level description of the rules for handling
SDP updates. Detailed procedures are located in .
When a new MSID "identifier" value occurs in a session
description, and it is not "-", the recipient can signal to its
application that a new MediaStream has been added.
When a session description is updated to have media descriptions
with an MSID "identifier" value, with one or more different
"appdata" values, the recipient can signal to its application that
new MediaStreamTracks have been added and note to which MediaStream
they have
been added. This is done for each different MSID "identifier"
value, including the special value "-", which indicates that a
MediaStreamTrack has been added with no corresponding
MediaStream.
If an MSID "identifier" value with no "appdata" value appears,
it means that the sender did not inform the recipient of the desired
identifier of the MediaStreamTrack, and the recipient will assign
the "id" value of the created MediaStreamTrack on its own. All
MSIDs in a media section that do not have an "appdata" value are
assumed to refer to the same MediaStreamTrack.
When a session description is updated to no longer list any "msid"
attribute on a specific media description, the recipient can signal
to its application that the corresponding MediaStreamTrack has
ended.
In addition to signaling that the track is ended when its "msid"
attribute disappears from the SDP, the track will also be signaled as
being ended when all associated SSRCs have disappeared by the rules of
, Sections
(BYE packet received) and
(timeout), or when the corresponding media description is disabled by
setting the port number to zero. Changing the direction of the media
description (by setting "sendonly", "recvonly", or "inactive" attributes)
will not end the MediaStreamTrack.The association between SSRCs and media descriptions is specified in
.Handling of Nonsignaled TracksEntities that do not use the mechanism described in this document will not send the
"msid" attribute and thus will not send information allowing the mapping of RTP packets to MediaStreams. This means that there will be some incoming RTP packets for which the recipient has no predefined MediaStream ID value.Note that the handling described below is triggered by incoming RTP packets, not
SDP negotiation.When communicating with entities that use the MSID mechanism, the only time incoming RTP packets
can be received without an associated MediaStream ID value is when, after the
initial negotiation, a negotiation is performed where the answerer
adds a MediaStreamTrack to an already established connection and
starts sending data before the answer is received by the offerer. For
initial negotiation, packets won't flow until the Interactive
Connectivity Establishment (ICE) candidates and
fingerprints have been exchanged, so this is not an issue.The recipient of those packets will perform the following
steps:
When RTP packets are initially received, it will create an
appropriate MediaStreamTrack based on the type of the media
(carried in PayloadType) and use the MID RTP header extension
(if
present) to associate the RTP packets with a specific media
section.
If the connection is not in the RTCSignalingState "stable", it
will wait at this point.
When the connection is in the RTCSignalingState "stable", it
will assign ID values.
The following steps are performed to assign ID values:
If there is an "msid" attribute, it will use that attribute to
populate the "id" field of the MediaStreamTrack and associated
MediaStreams, as described above.
If there is no "msid" attribute, the identifier of the
MediaStreamTrack will be set to a randomly generated string, and
it will be signaled as being part of a MediaStream with the
WebIDL "label" attribute set to "Non-WebRTC stream".
After deciding on the "id" field to be applied to the
MediaStreamTrack, the track will be signaled to the user.
The process above may involve a considerable amount of buffering
before the "stable" state is entered. If the implementation wishes to
limit this buffering, it MUST signal to the user that media has been
discarded.It follows from the above that MediaStreamTracks in the "default"
MediaStream cannot be closed by removing the "msid" attribute; the
application must instead signal these as closed when the SSRC
disappears, either according to the rules of Sections and of or by disabling the media
description by setting its port to zero.Detailed Offer/Answer ProceduresThese procedures are given in terms of sections recommended by
. They describe the actions to be taken in terms of
MediaStreams and MediaStreamTracks; they do not include event
signaling inside the application, which is described in the JavaScript
Session Establishment Protocol (JSEP) .Generating the Initial OfferFor each media description in the offer, if there is an
associated outgoing MediaStreamTrack, the offerer adds one "a=msid"
attribute to the section for each MediaStream with which the
MediaStreamTrack is associated. The "identifier" field of the
attribute is set to the WebIDL "id" attribute of the MediaStream.
If the sender wishes to signal identifiers for the MediaStreamTracks,
the "appdata" field is set to the WebIDL "id" attribute of the
MediaStreamTrack; otherwise, it is omitted.Answerer Processing of the OfferFor each media description in the offer and each "a=msid"
attribute in the media description, the receiver of the offer will
perform the following steps:
Extract the "appdata" field of the "a=msid" attribute,
if present.
If the "appdata" field exists: Check if a MediaStreamTrack
with the same WebIDL "id"
attribute as the "appdata" field already exists and is not in
the "ended" state. If such a MediaStreamTrack is not found, create it.
If the "appdata" field does not exist, and a MediaStreamTrack is
not associated with this media section, create a MediaStreamTrack and associate
it with this media section for future use.
Extract the "identifier" field of the "a=msid" attribute.
Check if a MediaStream with the same WebIDL "id" attribute
already exists. If not, create it.
Add the MediaStreamTrack to the MediaStream.
Signal to the user that a new MediaStreamTrack is
available.
Generating the AnswerThe answer is generated in exactly the same manner as the offer.
"a=msid" values in the offer do not influence the answer.Offerer Processing of the AnswerThe answer is processed in exactly the same manner as the
offer.Modifying the SessionOn subsequent exchanges, precisely the same procedure as for the
initial offer/answer is followed, but with one additional step in
the parsing of the offer and answer:
For each MediaStreamTrack that has been created as a result
of previous offer/answer exchanges, and is not in the "ended"
state, check to see if there is still an "a=msid" attribute in
the present SDP whose "appdata" field is the same as the WebIDL
"id" attribute of the track.
If no such attribute is found, stop the MediaStreamTrack.
This will set its state to "ended".
Example SDP DescriptionThe following SDP description shows the representation of a WebRTC
PeerConnection with two MediaStreams, each of which has one audio and
one video track. Only the parts relevant to the MSID are shown.Line wrapping, empty lines, and comments are added for clarity. They
are not part of the SDP.
# First MediaStream - id is 4701...
m=audio 56500 UDP/TLS/RTP/SAVPF 96 0 8 97 98
a=msid:47017fee-b6c1-4162-929c-a25110252400
f83006c5-a0ff-4e0a-9ed9-d3e6747be7d9
m=video 56502 UDP/TLS/RTP/SAVPF 100 101
a=msid:47017fee-b6c1-4162-929c-a25110252400
b47bdb4a-5db8-49b5-bcdc-e0c9a23172e0
# Second MediaStream - id is 6131....
m=audio 56503 UDP/TLS/RTP/SAVPF 96 0 8 97 98
a=msid:61317484-2ed4-49d7-9eb7-1414322a7aae
b94006c5-cade-4e0a-9ed9-d3e6747be7d9
m=video 56504 UDP/TLS/RTP/SAVPF 100 101
a=msid:61317484-2ed4-49d7-9eb7-1414322a7aae
f30bdb4a-1497-49b5-3198-e0c9a23172e0
IANA ConsiderationsAttribute Registration in Existing RegistriesIANA has registered the "msid" attribute in the
"att-field" (media level only) registry within the "Session
Description Protocol (SDP) Parameters" registry, according to the
procedures of .The "msid" registration information is as follows:
Contact name, email:
IETF, contacted via mmusic@ietf.org, or a
successor address designated by IESG
The attribute value contains only ASCII
characters and is therefore not subject to the charset
attribute.
Purpose:
The attribute can be used to signal the relationship
between a WebRTC MediaStream and a set of media descriptions.
O/A Procedures:
Described in RFC 8830
Appropriate values:
The details of appropriate values are given
in RFC 8830 (this document).
Mux Category:
NORMAL
The Mux Category is defined in .Security ConsiderationsAn adversary with the ability to modify SDP descriptions has the
ability to switch around tracks between MediaStreams. This is a special
case of the general security consideration that modification of SDP
descriptions needs to be confined to entities trusted by the
application.If implementing buffering as mentioned in , the amount of buffering should be limited to
avoid memory exhaustion attacks.Careless generation of identifiers can leak privacy-sensitive
information.
recommends that identifiers be generated using a Universally Unique
IDentifier (UUID) class 3 or 4 as a
basis, which avoids such leakage.No other attacks have been identified that depend on this
mechanism.ReferencesNormative ReferencesKey words for use in RFCs to Indicate Requirement LevelsIn many standards track documents several words are used to signify the requirements in the specification. These words are often capitalized. This document defines these words as they should be interpreted in IETF documents. This document specifies an Internet Best Current Practices for the Internet Community, and requests discussion and suggestions for improvements.RTP: A Transport Protocol for Real-Time ApplicationsThis memorandum describes RTP, the real-time transport protocol. RTP provides end-to-end network transport functions suitable for applications transmitting real-time data, such as audio, video or simulation data, over multicast or unicast network services. RTP does not address resource reservation and does not guarantee quality-of- service for real-time services. The data transport is augmented by a control protocol (RTCP) to allow monitoring of the data delivery in a manner scalable to large multicast networks, and to provide minimal control and identification functionality. RTP and RTCP are designed to be independent of the underlying transport and network layers. The protocol supports the use of RTP-level translators and mixers. Most of the text in this memorandum is identical to RFC 1889 which it obsoletes. There are no changes in the packet formats on the wire, only changes to the rules and algorithms governing how the protocol is used. The biggest change is an enhancement to the scalable timer algorithm for calculating when to send RTCP packets in order to minimize transmission in excess of the intended rate when many participants join a session simultaneously. [STANDARDS-TRACK]SDP: Session Description ProtocolThis memo defines the Session Description Protocol (SDP). SDP is intended for describing multimedia sessions for the purposes of session announcement, session invitation, and other forms of multimedia session initiation. [STANDARDS-TRACK]Augmented BNF for Syntax Specifications: ABNFInternet technical specifications often need to define a formal syntax. Over the years, a modified version of Backus-Naur Form (BNF), called Augmented BNF (ABNF), has been popular among many Internet specifications. The current specification documents ABNF. It balances compactness and simplicity with reasonable representational power. The differences between standard BNF and ABNF involve naming rules, repetition, alternatives, order-independence, and value ranges. This specification also supplies additional rule definitions and encoding for a core lexical analyzer of the type common to several Internet specifications. [STANDARDS-TRACK]Ambiguity of Uppercase vs Lowercase in RFC 2119 Key WordsRFC 2119 specifies common key words that may be used in protocol specifications. This document aims to reduce the ambiguity by clarifying that only UPPERCASE usage of the key words have the defined special meanings.JavaScript Session Establishment Protocol (JSEP)A Framework for Session Description Protocol (SDP) Attributes When MultiplexingWebRTC 1.0: Real-time Communication Between BrowsersW3C Proposed RecommendationMedia Capture and StreamsW3C Candidate RecommendationInformative ReferencesAn Offer/Answer Model with Session Description Protocol (SDP)This document defines a mechanism by which two entities can make use of the Session Description Protocol (SDP) to arrive at a common view of a multimedia session between them. In the model, one participant offers the other a description of the desired session from their perspective, and the other participant answers with the desired session from their perspective. This offer/answer model is most useful in unicast sessions where information from both participants is needed for the complete view of the session. The offer/answer model is used by protocols like the Session Initiation Protocol (SIP). [STANDARDS-TRACK]Multiplexing RTP Data and Control Packets on a Single PortThis memo discusses issues that arise when multiplexing RTP data packets and RTP Control Protocol (RTCP) packets on a single UDP port. It updates RFC 3550 and RFC 3551 to describe when such multiplexing is and is not appropriate, and it explains how the Session Description Protocol (SDP) can be used to signal multiplexed sessions. [STANDARDS-TRACK]The Session Description Protocol (SDP) Grouping FrameworkIn this specification, we define a framework to group "m" lines in the Session Description Protocol (SDP) for different purposes. This framework uses the "group" and "mid" SDP attributes, both of which are defined in this specification. Additionally, we specify how to use the framework for two different purposes: for lip synchronization and for receiving a media flow consisting of several media streams on different transport addresses. This document obsoletes RFC 3388. [STANDARDS-TRACK]A Taxonomy of Semantics and Mechanisms for Real-Time Transport Protocol (RTP) SourcesThe terminology about, and associations among, Real-time Transport Protocol (RTP) sources can be complex and somewhat opaque. This document describes a number of existing and proposed properties and relationships among RTP sources and defines common terminology for discussing protocol entities and their relationships.Overview: Real-Time Protocols for Browser-Based ApplicationsNegotiating Media Multiplexing Using the Session Description Protocol (SDP)Web IDLW3C Editor's DraftDesign Considerations, Rejected AlternativesOne suggested mechanism has been to use CNAME instead of a new
attribute. This was abandoned because CNAME identifies a synchronization
context; one can imagine both wanting to have tracks from the same
synchronization context in multiple MediaStreams and wanting to have
tracks from multiple synchronization contexts within one MediaStream
(but the latter is impossible, since a MediaStream is defined to impose
synchronization on its members).Another suggestion has been to put the "msid" value within an attribute
of RTCP SR (sender report) packets. This doesn't offer the ability to
know that you have seen all the tracks currently configured for a
MediaStream.A suggestion that survived for a number of drafts of this document was to define
MSID as a generic mechanism, where the particular semantics of this
usage of the mechanism would be defined by an "a=wms-semantic"
attribute. This was removed in April 2015.AcknowledgementsThis note is based on sketches from, among others, and
.Special thanks to , , ,
, ,
, ,
, , and for their work in
reviewing this document, with many specific language suggestions.Author's AddressGoogleKungsbron 2Stockholm11122Swedenharald@alvestrand.no