Using WebRTC Encoded Transforms
Limited availability
This feature is not Baseline because it does not work in some of the most widely-used browsers.
WebRTC Encoded Transforms provide a mechanism to inject a high performance Stream API for modifying encoded video and audio frame into the incoming and outgoing WebRTC pipelines. This enables use cases such as end-to-end encryption of encoded frames by third-party code.
The API defines both main thread and worker side objects.
The main-thread interface is a RTCRtpScriptTransform
instance, which on construction specifies the Worker
that is to implement the transformer code.
The transform running in the worker is inserted into the incoming or outgoing WebRTC pipeline by adding the RTCRtpScriptTransform
to RTCRtpReceiver.transform
or RTCRtpSender.transform
, respectively.
A counterpart RTCRtpScriptTransformer
object is created in the worker thread, which has a ReadableStream
readable
property, a WritableStream
writable
property, and an options
object passed from the associated RTCRtpScriptTransform
constructor.
Encoded video frames (RTCEncodedVideoFrame
) or audio frames (RTCEncodedAudioFrame
) from the WebRTC pipeline are enqueued on readable
for processing.
The RTCRtpScriptTransformer
is made available to code as the transformer
property of the rtctransform
event, which is fired at the worker global scope whenever an encoded frame is enqueued for processing (and initially on construction of the corresponding RTCRtpScriptTransform
).
The worker code must implement a handler for the event that reads encoded frames from transformer.readable
, modifies them as needed, and writes them to transformer.writable
in the same order and without any duplication.
While the interface doesn't place any other restrictions on the implementation, a natural way to transform the frames is to create a pipe chain that sends frames enqueued on the event.transformer.readable
stream through an TransformStream
to the event.transformer.writable
stream.
We can use the event.transformer.options
property to configure any transform code that depends on whether the transform is enqueuing incoming frames from the packetizer or outgoing frames from a codec.
The RTCRtpScriptTransformer
interface also provides methods that can be used when sending encoded video to get the codec to generate a "key" frame, and when receiving video to request that a new key frame be sent.
These may be useful to allow a recipient to start viewing the video more quickly, if (for example) they join a conference call when delta frames are being sent.
The following examples provide more specific examples of how to use the framework using a TransformStream
based implementation.
Test if encoded transforms are supported
Test if encoded transforms are supported by checking for the existence of RTCRtpSender.transform
(or RTCRtpReceiver.transform
):
const supportsEncodedTransforms =
window.RTCRtpSender && "transform" in RTCRtpSender.prototype;
Adding a transform for outgoing frames
A transform running in a worker is inserted into the outgoing WebRTC pipeline by assigning its corresponding RTCRtpScriptTransform
to the RTCRtpSender.transform
for an outgoing track.
This example shows how you might stream video from a user's webcam over WebRTC, adding a WebRTC encoded transform to modify the outgoing streams.
The code assumes that there is an RTCPeerConnection
called peerConnection
that is already connected to a remote peer.
First we get a MediaStreamTrack
, using getUserMedia()
to get a video MediaStream
from a media device, and then the MediaStream.getTracks()
method to get the first MediaStreamTrack
in the stream.
The track is added to the peer connection using addTrack()
, which starts streaming it to the remote peer.
The addTrack()
method returns the RTCRtpSender
that is being used to send the track.
// Get Video stream and MediaTrack
const stream = await navigator.mediaDevices.getUserMedia({ video: true });
const [track] = stream.getTracks();
const videoSender = peerConnection.addTrack(track, stream);
An RTCRtpScriptTransform
is then constructed taking a worker script, which defines the transform, and an optional object that can be used to pass arbitrary messages to the worker (in this case we've used a name
property with value "senderTransform" to tell the worker that this transform will be added to the outbound stream).
We add the transform to the outgoing pipeline by assigning it to the RTCRtpSender.transform
property.
// Create a worker containing a TransformStream
const worker = new Worker("worker.js");
videoSender.transform = new RTCRtpScriptTransform(worker, {
name: "senderTransform",
});
The Using separate sender and receiver transforms section below shows how the name
might be used in a worker.
Note that you can add the transform at any time, but by adding it immediately after calling addTrack()
the transform will get the first encoded frame that is sent.
Adding a transform for incoming frames
A transform running in a worker is inserted into the incoming WebRTC pipeline by assigning its corresponding RTCRtpScriptTransform
to the RTCRtpReceiver.transform
for an incoming track.
This example shows how you add a transform to modify an incoming stream.
The code assumes that there is an RTCPeerConnection
called peerConnection
that is already connected to a remote peer.
First we add an RTCPeerConnection
track
event handler to catch the event when the peer starts receiving a new track.
Within the handler we construct an RTCRtpScriptTransform
and add it to event.receiver.transform
(event.receiver
is a RTCRtpReceiver
).
As in the previous section, the constructor takes an object with name
property, but here we use receiverTransform
as the value to tell the worker that frames are incoming.
peerConnection.ontrack = (event) => {
const worker = new Worker("worker.js");
event.receiver.transform = new RTCRtpScriptTransform(worker, {
name: "receiverTransform",
});
received_video.srcObject = event.streams[0];
};
Note again that you can add the transform stream at any time.
However by adding it in the track
event handler ensures that the transform stream will get the first encoded frame for the track.
Worker implementation
The worker script must implement a handler for the rtctransform
event, creating a pipe chain that pipes the event.transformer.readable
(ReadableStream
) stream through a TransformStream
to the event.transformer.writable
(WritableStream
) stream.
A worker might support transforming incoming or outgoing encoded frames, or both, and the transform might be hard coded, or configured at run-time using information passed from the web application.
Basic WebRTC Encoded Transform
The example below shows a basic WebRTC Encoded transform, which negates all bits in queued frames. It does not use or need options passed in from the main thread because the same algorithm can be used in the sender pipeline to negate the bits and in the receiver pipeline to restore them.
The code implements an event handler for the rtctransform
event.
This constructs a TransformStream
, then pipes through it using ReadableStream.pipeThrough()
, and finally pipes to event.transformer.writable
using ReadableStream.pipeTo()
.
addEventListener("rtctransform", (event) => {
const transform = new TransformStream({
start() {}, // Called on startup.
flush() {}, // Called when the stream is about to be closed.
async transform(encodedFrame, controller) {
// Reconstruct the original frame.
const view = new DataView(encodedFrame.data);
// Construct a new buffer
const newData = new ArrayBuffer(encodedFrame.data.byteLength);
const newView = new DataView(newData);
// Negate all bits in the incoming frame
for (let i = 0; i < encodedFrame.data.byteLength; ++i) {
newView.setInt8(i, ~view.getInt8(i));
}
encodedFrame.data = newData;
controller.enqueue(encodedFrame);
},
});
event.transformer.readable
.pipeThrough(transform)
.pipeTo(event.transformer.writable);
});
The implementation of the WebRTC encoded transform is similar to a "generic" TransformStream
, but with some important differences.
Like the generic stream, its constructor takes an object that defines an optional start()
method, which is called on construction, flush()
method, which is called as the stream is about to be closed, and transform()
method, which is called every time there is a chunk to be processed.
Unlike the generic constructor any writableStrategy
or readableStrategy
properties that are passed in the constructor object are ignored, and the queuing strategy is entirely managed by the user agent.
The transform()
method also differs in that it is passed either an RTCEncodedVideoFrame
or RTCEncodedAudioFrame
rather than a generic "chunk".
The actual code shown here for the method isn't notable other than it demonstrates how to convert the frame to a form where you can modify it and enqueue it afterwards on the stream.
Using separate sender and receiver transforms
The previous example works if the transform function is the same when sending and receiving, but in many cases the algorithms will be different. You could use separate worker scripts for the sender and receiver, or handle both cases in one worker as shown below.
If the worker is used for both sender and receiver, it needs to know whether the current encoded frame is outgoing from a codec, or incoming from the packetizer.
This information can be specified using the second option in the RTCRtpScriptTransform
constructor.
For example, we can define a separate RTCRtpScriptTransform
for the sender and receiver, passing the same worker, and an options object with property name
that indicates whether the transform is used in the sender or receiver (as shown in previous sections above).
The information is then available in the worker in event.transformer.options
.
In this example we implement the onrtctransform
event handler on the global dedicated worker scope object.
The value of the name
property is used to determine which TransformStream
to construct (the actual constructor methods are not shown).
// Code to instantiate transform and attach them to sender/receiver pipelines.
onrtctransform = (event) => {
let transform;
if (event.transformer.options.name == "senderTransform")
transform = createSenderTransform(); // returns a TransformStream
else if (event.transformer.options.name == "receiverTransform")
transform = createReceiverTransform(); // returns a TransformStream
else return;
event.transformer.readable
.pipeThrough(transform)
.pipeTo(event.transformer.writable);
};
Note that the code to create the pipe chain is the same as in the previous example.
Runtime communication with the transform
The RTCRtpScriptTransform
constructor allows you to pass options and transfer objects to the worker.
In the previous example we passed static information, but sometimes you might want to modify the transform algorithm in the worker at runtime, or get information back from the worker.
For example, a WebRTC conference call that supports encryption might need to add a new key to the algorithm used by the transform.
While it is possible to share information between the worker running the transform code and the main thread using Worker.postMessage()
, it is generally easier to share a MessageChannel
as an RTCRtpScriptTransform
constructor option, because then the channel context is directly available in the event.transformer.options
when you are handling a new encoded frame.
The code below creates a MessageChannel
and transfers its second port to the worker.
The main thread and transform can subsequently communicate using the first and second ports.
// Create a worker containing a TransformStream
const worker = new Worker("worker.js");
// Create a channel
// Pass channel.port2 to the transform as a constructor option
// and also transfer it to the worker
const channel = new MessageChannel();
const transform = new RTCRtpScriptTransform(
worker,
{ purpose: "encrypt", port: channel.port2 },
[channel.port2],
);
// Use the port1 to send a string.
// (we can send and transfer basic types/objects).
channel.port1.postMessage("A message for the worker");
channel.port1.start();
In the worker the port is available as event.transformer.options.port
.
The code below shows how you might listen on the port's message
event to get messages from the main thread.
You can also use the port to send messages back to the main thread.
event.transformer.options.port.onmessage = (event) => {
// The message payload is in 'event.data';
console.log(event.data);
};
Triggering a key frame
Raw video is rarely sent or stored because it consumes a lot of space and bandwidth to represent each frame as a complete image. Instead, codecs periodically generate a "key frame" that contains enough information to construct a full image, and between key frames sends "delta frames" that just include the changes since the last delta frame. While this is far more efficient that sending raw video, it means that in order to display the image associated with a particular delta frame, you need the last key frame and all subsequent delta frames.
This can cause a delay for new users joining a WebRTC conference application, because they can't display video until they have received their first key frame. Similarly, if an encoded transform was used to encrypt frames, the recipient would not be able to display video until they get the first key frame encrypted with their key.
In order to ensure that a new key frame can be sent as early as possible when needed, the RTCRtpScriptTransformer
object in event.transformer
has two methods: RTCRtpScriptTransformer.generateKeyFrame()
, which causes the codec to generate a key frame, and RTCRtpScriptTransformer.sendKeyFrameRequest()
, which a receiver can use to request a key frame from the sender.
The example below shows how the main thread might pass an encryption key to a sender transform, and trigger the codec to generate a key frame.
Note that the main thread doesn't have direct access to the RTCRtpScriptTransformer
object, so it needs to pass the key and restriction identifier ("rid") to the worker (the "rid" is a stream id, which indicates the encoder that must generate the key frame).
Here we do that with a MessageChannel
, using the same pattern as in the previous section.
The code assumes there is already a peer connection, and that videoSender
is an RTCRtpSender
.
const worker = new Worker("worker.js");
const channel = new MessageChannel();
videoSender.transform = new RTCRtpScriptTransform(
worker,
{ name: "senderTransform", port: channel.port2 },
[channel.port2],
);
// Post rid and new key to the sender
channel.port1.start();
channel.port1.postMessage({
rid: "1",
key: "93ae0927a4f8e527f1gce6d10bc6ab6c",
});
The rtctransform
event handler in the worker gets the port and uses it to listen for message
events from the main thread.
If an event is received it gets the rid
and key
, and then calls generateKeyFrame()
.
event.transformer.options.port.onmessage = (event) => {
const { rid, key } = event.data;
// key is used by the transformer to encrypt frames (not shown)
// Get codec to generate a new key frame using the rid
// Here 'rcEvent' is the rtctransform event.
rcEvent.transformer.generateKeyFrame(rid);
};
The code for a receiver to request a new key frame would be almost identical, except that "rid" isn't specified. Here is the code for just the port message handler:
event.transformer.options.port.onmessage = (event) => {
const { key } = event.data;
// key is used by the transformer to decrypt frames (not shown)
// Request sender to emit a key frame.
transformer.sendKeyFrameRequest();
};