Silverlight, streaming media, Windows Media, VC-1, H.264

Microsoft at NAB 2009: IIS Smooth Streaming Released to Web

April 23rd, 2009 Posted in Internet Information Services, Silverlight, Smooth Streaming | No Comments »

Last day here at NAB in Las Vegas, so it’s a perfect time to take a look at what we’ve done at NAB this year. Ben Waggoner has put together an excellent summary on his blog:

http://on10.net/blogs/benwagg/NAB-Day-1-Smooth-Streaming-released-1080p-in-Silverlight-new-VC-1-and-more/

So the big news is: IIS Media Services 2.0 (featuring on-demand Smooth Streaming) has been officially released to Web, a mere 6 months after first announced at Digital Hollywood as a technology preview! We expect IIS Media Services 3.0 (featuring Live Smooth Streaming, currently in beta) to be released later this year.

Now we watch Smooth Streaming completely change the rules of media delivery on the Web.

Smooth Streaming White Paper

March 27th, 2009 Posted in Internet Information Services, Silverlight, Smooth Streaming | 6 Comments »

I know that many of you have been reading my blog for information about IIS Smooth Streaming. I’ve posted a series of posts on that topic, but sometimes blog posts can be difficult to aggregate. So if you’re looking for a single, one-stop-shop for information about Smooth Streaming, you can now download a white paper on Smooth Streaming that unifies all that information I’ve been posting in a single doc:

http://www.microsoft.com/downloads/details.aspx?displaylang=en&FamilyID=03d22583-3ed6-44da-8464-b1b4b5ca7520

The white paper is available in Word, PDF and XPS formats.

MediaStreamSource Takes On a New Life

March 26th, 2009 Posted in Silverlight | 3 Comments »

When the Silverlight team originally designed the MediaStreamSource API, its main purpose was to allow asynchronous reading of compressed video/audio samples from formats other than ASF. We took full advantage of this API to implement Smooth Streaming support in Silverlight 2. In Silverlight 3, the team decided to extend the API to also allow reading of uncompressed samples - YV12, RGBA and PCM. The primary goal behind this was to allow developers to build their own codecs. If you could parse a format and decode it in .NET - you could now play it back in Silverlight.

But one of the other potential uses for MediaStreamSource that emerged during SL3 development was video and audio synthesis. After all, why limit A/V creation to just decode from existing content? You can create a sound waveform or a raster bitmap using .NET math functions and then present it to the Silverlight runtime to render like any other audio or video.

Well, I was extremely happy to find out today that developers are catching on to this fantastic new feature. Namely, Pete Brown, a Washington DC-based .NET developer and evangelist has been using MediaStreamSource to synthesize video, audio and - my favorite - emulate a Commodore 64 computer!

Check out:

Creating Sound using MediaStreamSource in Silverlight 3 Beta

Silverlight 3 – Creating Video from Raw Bits using a MediaStreamSource

My MIX09 Silverlight 3 ShowOff Video – Commodore 64 Emulator

All awesome stuff. Way to go, Pete!

But what about custom codec development? If you’re a codec developer, I invite you to take a look at MediaStreamSource and consider writing a C# decoder for Silverlight. There are plenty of open-source codecs and formats out there that would make for fantastic Silverlight demos. Just to list a few:

Containers

  • Matroska
  • Ogg
  • Ogg Media

Video

  • Dirac
  • Theora
  • HuffYUV
  • Lagarith

Audio

  • Vorbis
  • FLAC
  • Monkey’s Audio
  • Shorten

These are just some of the codecs and formats out there with easily accessible source code that could be ported to C# or another .NET language. But of course, why stop there? There are also formats such as MPEG-2 TS, FLV, AVI, and codecs such as H.263, MPEG-4 ASP, MJPEG, MPEG Audio Layer II, and others that would be incredibly useful to have supported in Silverlight too.

Will you be the first to develop those?

In Case You Missed MIX…

March 20th, 2009 Posted in Expression Encoder, Internet Information Services, Silverlight, Smooth Streaming | 11 Comments »

If you were unable to attend MIX 2009, you’ll be pleased to know that the videos of sessions are already appearing online: https://content.visitmix.com/2009/sessions/.

Here are the links to the videos of sessions I had previously mentioned:

What’s New in Microsoft Silverlight 3
Speaker
:  Joe Stegman (Microsoft)

Microsoft Silverlight Media End-to-End
Speaker
:  Alex Zambelli (Microsoft)

Creating Media Content for Microsoft Silverlight Using Microsoft Expression Encoder
Speaker
:  James Clarke (Microsoft)

Delivering Media with Microsoft Internet Information Services 7 (IIS) Media Services and Microsoft Silverlight
Speakers
:  John Bocharov (Microsoft), John Bishop (Inlet)

Silverlight 3, IIS Media Services 3.0, Olympics 2010 - Wow, It Truly Is March Madness!

March 19th, 2009 Posted in H.264, Internet Information Services, Olympics, Silverlight, Smooth Streaming | 4 Comments »

Though blogging in Las Vegas might sound like a party foul, this has been an amazing week for Silverlight media - so much that I feel a sudden urge to report on it right now, right here.

Silverlight 3 Beta

A mere 5 months after releasing Silverlight 2 RTW, we have now made Silverlight 3 Beta available to the public. Check out http://silverlight.net/getstarted/silverlight3/default.aspx for the full list of new features and download links. As usual, Scott Guthrie offers some great insight on Silverlight 3 in his MIX Keynote and Channel 9 video.

The most interesting new media features in Silverlight 3 are:

  • Native H.264 video, AAC audio, and MP4 file playback support
    • Take your MP4-contained H.264/AAC encoded files, put them on a Web server and play them directly in Siverlight using progressive download!
  • Extensible media format support
    • Support for raw A/V bitstream playback allows codec and media developers to write custom decoders and format parsers using C#, VB or any other .NET language. Will you be the first to write an Ogg Vorbis or FLAC decoder for Silverlight?
  • GPU accelerated video scaling
    • Stretching the video to full screen can now be offloaded entirely to the video card, freeing up the CPU and enabling smooth video playback.
  • Advanced media logging
    • Log playback usage to Windows Media Services and IIS7 Media Services, like with the good old Windows Media Player.
  • Custom effects / Pixel shaders
    • Apply post-processing effects to your video by writing custom effects using the same HLSL pixel shader code that works in Direct3D and WPF today.
  • Perspective 3D transforms
    • Spin and rotate your video around all 3 axis - X, Y and Z. Video collage? How about a video cube?

I’ll be blogging in the near future in more depth about the details of our H.264/AAC/MP4 support in particular.

 

IIS Media Services 3.0 / Live Smooth Streaming

Just a short month after announcing the availability of Smooth Streaming for On-Demand Video beta, the IIS Media team announced the availability of IIS Media Services 3.0 beta - featuring Live Smooth Streaming. That’s right, with IIS7 you will soon be able to deliver Smooth Streaming video for both on-demand and live!

Inlet Technologies has simultaneously announced they will be the first to add Smooth Streaming support to their line of live and VoD encoding products. Besides Inlet, we are currently working with a number of encoding ISVs on enabling them to add Smooth Streaming support to their products.

Akamai Technologies announced the wide commercial availability of their AdaptiveEdge Streaming service based on IIS Smooth Streaming. Besides Akamai, we are currently working with all the major CDNs on enabling Smooth Streaming support in their networks. As with the encoding ISVs, our goal is to build a rich Smooth Streaming ecosystem to be available to customers by the time Silverlight 3 ships.

Besides the newly redesigned home page, the IIS Media team has also put up a great working example of how Smooth Streaming works.

 

NBC Winter Olympics 2010 - Vancouver

During the MIX 2009 Keynote, Perkins Miller, Senior VP of Digital Media for NBC Universal, announced that NBC Universal has chosen to deliver the NBC Winter Olympics 2010 using Microsoft Silverlight. Watch the MIX Keynote to see his announcement.

Here are the details I can share at this point:

  • All video content, both live and on-demand, will be delivered using Smooth Streaming
  • The live video player will feature DVR-like capabilities (pause, rewind, seek and slo-mo of live video)
  • Video quality will go up to true 720p HD

March Madness

CBS Sports has launched a Silverlight-based March Madness video player that lets you watch all NCAA Basketball Tournament games live. Visit http://mmod.ncaa.com/video to launch the March Madness video player. If you are using Internet Explorer on Windows, the default player will actually be an old-school WMP player, so you’ll need to click on the HQ Player button to launch the new Silverlight player.

The live video for the tournament is being streamed using Windows Media Services. Obviously, we couldn’t use Smooth Streaming because the server technology is still in beta and the encoders aren’t yet commercially available. But CBS did the next best thing! All live streams are available in 4 video quality levels:

Total Bitrate
(kbps)

Video Bitrate
(kbps)

Audio Bitrate
(kbps)

Video Width

Video Height

Pixel Aspect Ratio

1500

1450

48

784

432

1:1

1000

950

48

512

384

4:3

650

615

32

368

272

4:3

350

315

32

240

176

4:3

Video codec used is VC-1 Advanced Profile. Audio codec used is WMA Professional at 44.1 kHz 16-bit stereo.

All March Madness games are being encoded by MLB.com’s encoding facilities using Inlet Spinnaker 7000 encoders. The Spinnakers were configured based on my own recommendations in order to provide maximum quality at all bitrates.

The March Madness Silverlight player uses preroll ad download statistics to estimate available client bandwidth and tries to make an appropriate first choice of bitrate level. Of course none of this would be necessary with Smooth Streaming, but we really tried to make the best of the Windows Media Streaming experience anyway. The player also has built-in heuristics to detect quality-of-service issues, such as frequent rebuffering or low frame rate rendering, at which point it can suggest to the user to choose a lower bitrate. Users can manually switch between available bitrates using the “-” and “+” buttons in the button of the player UI.

Come Join Us at MIX 2009!

March 9th, 2009 Posted in Expression Encoder, H.264, Internet Information Services, Silverlight | 1 Comment »

MIX, Microsoft’s annual conference for Web 2.0 designers and developers - is upon  us. This year it will take place March 18-20 at The Venetian Hotel in Las Vegas.

If you’re interested in the world of online video streaming (and I guess you are since you’re already reading this blog) and plan on visiting MIX 2009, I suggest you check out the following few sessions while there:

What’s New in Microsoft Silverlight 3
Speaker
:  Joe Stegman (Microsoft)
Date/Time:  Wed, March 18, 11:30 AM - 12:15 PM
Location:  Lando 4204

Microsoft Silverlight Media End-to-End
Speaker
:  Alex Zambelli (Microsoft)
Date/Time:  Wed, March 18, 2:15 PM - 3:30 PM
Location:  Delfino 4105

Creating Media Content for Microsoft Silverlight Using Microsoft Expression Encoder
Speaker
:  James Clarke (Microsoft)
Date/Time:  Wed, March 18, 4:00 PM - 5:15 PM
Location:  Delfino 4105

Delivering Media with Microsoft Internet Information Services 7 (IIS) Media Services and Microsoft Silverlight
Speakers
:  John Bocharov (Microsoft), John Bishop (Inlet)
Date/Time:  Thu, March 19, 1:00 PM - 2:15 PM
Location:  Delfino 4105

For a complete list of sessions, see https://content.visitmix.com/2009/sessions/.

To register for MIX 2009, visit http://2009.visitmix.com/Registration/.

Hope to see you there!

Smooth Streaming FAQ

February 25th, 2009 Posted in Expression Encoder, H.264, Internet Information Services, Silverlight, Smooth Streaming | 1 Comment »

The contents of this post have been moved to a new permanent location:

http://alexzambelli.com/blog/smooth-streaming-faq/

Smooth Streaming Beta Released!

February 25th, 2009 Posted in Expression Encoder, Internet Information Services, Silverlight, Smooth Streaming | No Comments »

I am pleased to announce the availability of Smooth Streaming beta! The hot new media extension for IIS7 can now be downloaded from http://www.iis.net/extensions/SmoothStreaming, in both x86 and x64 flavors. The Smooth Streaming beta is also available for install through the uniform Microsoft Web Platform Installer.

Smooth Streaming lead Program Manager John Bocharov has a detailed description of the IIS7 Smooth Streaming extension in his blog post. Other recommended reading: IIS.Net’s Getting Started, Exploring Bit Rate Changes, and Managing Your Presentations articles. 

This IIS7 extension is the final missing piece in the Smooth Streaming puzzle that we’ve been talking about since last October. Remember, it’s already possible to create on-demand Smooth Streaming content with Expression Encoder 2 SP1, and we are actively working on expanding this new format support to other 3rd party encoding products in the coming months. EE2 SP1 also features fully functional Silverlight player templates (with source code!) that enable Silverlight developers to easily add Smooth Streaming playback support to their Silverlight 2 apps. The IIS7 Smooth Streaming extension finally lets you also host Smooth Streaming content on your Windows Server 2008 servers!

Important note: IIS7 Smooth Streaming beta is only intended for testing purposes and doesn’t implicitly grant a “Go-Live” deployment license.

Finally: If you’re planning on attending MIX 2009 in Las Vegas this March, be sure to attend all Silverlight and IIS related sessions for other exciting announcements!

Smooth Streaming Architecture

February 10th, 2009 Posted in H.264, Internet Information Services, Silverlight, Smooth Streaming | 13 Comments »

As described in the previous two posts, Smooth Streaming is Microsoft’s implementation of HTTP-based adaptive streaming, which is a hybrid media delivery method. It acts like streaming, but is in fact based on HTTP progressive download. The HTTP downloads are performed in a series of small chunks, allowing the media to get easily and cheaply cached along the edge of the network, closer to the end users. Providing multiple encoded bitrates of the same media source also allows Silverlight clients to seamlessly and dynamically switch between bitrates depending on network conditions and CPU power. The resulting end user experience is one of reliable, consistent playback without stutter, buffering or “last mile” congestion. In one word: Smooth.

In this post we’ll take a closer look at how Smooth Streaming works: format, server, and client.

 

Smooth Streaming Format

Smooth Streaming is the first Microsoft media format in over a decade to use a file format other than ASF. It is based on the ISO/IEC 14496-12 ISO Base Media File Format specification, better known as the MP4 file specification. Why MP4 and not ASF? Well, there are several reasons:

  • MP4 is a lightweight container format with less overhead than ASF
  • MP4 is easier to parse in managed (.NET) code than ASF
  • MP4 is based on a widely used standard, making 3rd party adoption and support more straightforward
  • MP4 was architected with H.264 video codec support in mind, and we’re counting on H.264 support in Smooth Streaming and Silverlight 3 (ASF can also contain H.264 video, but it’s not as straightforward as with MP4)
  • MP4 was designed to natively support payload fragmentation within the file

There are actually 2 parts to the Smooth Streaming format: the wire format, and the disk file format. In Smooth Streaming a video is recorded in full length to the disk as a single file (one file per encoded bitrate), but it’s transfered to the client as a series of small file chunks. The wire format defines the structure of the chunks that get sent by IIS to the client, whereas the file format defines the structure of the contiguous file on disk. Fortunately, the MP4 specification allows MP4 to be internally organized as a series of fragments, which means that in Smooth Streaming the wire format is a direct subset of the file format.

What are these MP4 “fragments” that I speak of? The basic unit of an MP4 file is called a “box.” These boxes can contain both data and metadata. The MP4 specification allows for various ways to organize data and metadata boxes within a file. In most media scenarios it is considered useful to have the metadata written before the data so that a player client application can have more information about the video/audio it’s about to play before it plays it. However, in live streaming scenarios it is often not possible to write the metadata upfront about the whole data stream because it’s simply not fully known yet. Furthermore, less upfront metadata means less overhead, which can lead to shorter startup times. For these reasons the MP4 ISO Base Media File Format specification was designed to allow MP4 boxes to be organized in a fragmented manner, where the file can be written “as you go” as a series of short metadata/data box pairs, rather than one long metadata/data pair. The Smooth Streaming file format heavily leverages this aspect of the MP4 file specification, to the point where at Microsoft we often interchangeably refer to Smooth Streaming files as “Fragmented MP4 files” or “(f)MP4.”

Here is a high-level overview of what a Smooth Streaming file looks like on the inside:

Smooth Streaming File Format

Smooth Streaming File Format

In a nutshell, the file starts with file-level metadata (’moov‘) which generically describes the file, but the bulk of the payload is actually contained in the fragment boxes which also carry more accurate fragment-level metadata (’moof‘) and media data (’mdat‘). (The diagram only shows 2 fragments, but a typical Smooth Streaming file has a fragment per every 2 seconds of video/audio.) Closing the file is a ‘mfra‘ index box which allows easy and accurate seeking within the file.

When a Silverlight client requests a video time slice from the IIS Smooth Streaming server, the server simply seeks to the approriate starting fragment in the MP4 file and then lifts the fragment out of the file and sends it over the wire to the client. This is why we refer to the fragments as the “wire format.” This technique greatly enhances the efficiency of the IIS server because it requires no remuxing or rewriting overhead.

Here is what an MP4 fragment looks like in more detail:

Smooth Streaming Wire Format

Smooth Streaming Wire Format

We say that the Smooth Streaming format is based on the MP4 file format because even though we’re following the ISO specification, we specify our own box organization schema and some custom boxes. In order to differentiate Smooth Streaming files from “vanilla” MP4 files, we use new file extensions: *.ismv (video+audio) and *.isma (audio only). I keep forgetting to ask the IIS Media team what the acronyms exactly stand for, but my best guess would be “IIS Smooth Streaming Media Video (Audio)”.

 

Smooth Streaming Media Assets

A typical Smooth Streaming media asset therefore consists of the following files:

  • MP4 files containing video/audio
    • *.ismv - contains video and audio, or only video
      • 1 ISMV file per encoded video bitrate
    • *.isma - contains only audio
      • In videos with audio, the audio track can be muxed into an ISMV file instead of a separate ISMA file
  • Server manifest file
    • *.ism
    • Describes the relationships between media tracks, bitrates and files on disk
    • Only used by the IIS Smooth Streaming server - not by client
  • Client manifest file
    • *.ismc
    • Describes to the client the available streams, codecs used, bitrates encoded, video resolutions, markers, captions, etc.
    • It’s the first file delivered to the client

Both manifest file formats are based on XML. The server manifest file format is based specifically on the SMIL 2.0 XML format specification.

A folder containing a single Smooth Streaming media asset might look something like this:

A typical folder containing a Smooth Streaming media asset

A folder containing a Smooth Streaming media asset

In this particular case the audio track is contained in the NBA_3000000.ismv file.

 

Smooth Streaming Manifest Files

The Smooth Streaming Wire/File Format specification defines the manifest XML language as well as the MP4 box structure. Because the manifests are based on XML they are highly extensible. Among the features already included in the current Smooth Streaming format specification is support for:

  • VC-1, WMA, H.264 and AAC codecs
  • Text streams
  • Multi-language audio tracks
  • Alternate video and audio tracks (i.e. multiple camera angles, director’s commentary, etc.)
  • Multiple hardware profiles (i.e. same bitrates targeted at different playback devices)
  • Script commands, markers/chapters, captions
  • Client manifest Gzip compression
  • URL obfuscation
  • Live encoding and streaming

For an example of a Smooth Streaming On-Demand Server Manifest file, see here.

For an example of a Smooth Streaming Client Manifest file, see here.

 

Smooth Streaming Playback: Bringing It All Home

Microsoft’s adaptive streaming prototype (used for NBC Olympics 2008) relied on physically chopping up long video files into small file chunks. In order to retrieve the chunks for the web server, the player client simply needed to download files in a logical sequence: 00001.vid, 00002.vid, 00003.vid, etc.

As I’ve explained in this and previous posts, Smooth Streaming uses a more sophisticated file format and server design. The videos are no longer split up into thousands of file chunks, but are instead “virtually” split up into fragments (typically 1 fragment per video GOP) and stored within a single contiguous MP4 file. This implies two significant changes in server and client design too:

  1. The server must be able to translate URL requests into exact byte range offsets within the MP4 file, and
  2. The client can request chunks in a more developer-friendly manner, such as by timecode instead of by index number

The first thing a Silverlight client requests from the Smooth Streaming server is the *.ismc client manifest. The manifest tells it which codecs were used  to compress the content (so that the Silverlight runtime can initialize the correct decoder and build the playback pipeline), which bitrates and resolutions are available, and a list of all the available chunks and either their start times or durations.

With IIS7 Smooth Streaming, a client is expected to request fragments in the form of RESTful URLs:

http://video.foo.com/NBA.ism/QualityLevels(400000)/Fragments(video=610275114)
http://video.foo.com/NBA.ism/QualityLevels(64000)/Fragments(audio=631931065)

The values passed in the URL represent encoded bitrate (i.e. 400000) and the fragment start offset (i.e. 610275114) expressed in an agreed-upon time unit (usually 100 ns). These values are known from the client manifest.

Upon receiving a request like this, the IIS7 Smooth Streaming component looks up the quality level (bitrate) in the corresponding *.ism server manifest and maps it to a physical *.ismv or *.isma file on disk. It then goes and reads the appropriate MP4 file, and based on its ‘tfra’ index box figures out which fragment box (’moof’ + ‘mdat’) corresponds to the requested start time offset. It then extracts the said fragment box and sends it over the wire to the client as a standalone file. This is a particularly important part of the overall design because the sent fragment/file can now be automatically cached further down the network, potentially saving the origin server from sending the same fragment/file again to another client requesting the same RESTful URL.

As you can see, requesting chunks of video/audio from the server is easy. But what about dynamic bitrate switching that makes adaptive streaming so effective? This part of the Smooth Streaming experience is implemented entirely in client-side Silverlight application code - the server plays no part in the bitrate switching process. The client-side code looks at chunk download times, buffer fullness, rendered frame rates, and other factors - and based on them decides when to request higher or lower bitrates from the server. Remember, if during the encoding process we ensure that all bitrates of the same source are perfectly frame aligned (same length GOPs, no dropped frames), then switching between bitrates is completely seamless - and Smooth.

In my next blog post: Encoding For Smooth Streaming

The Birth of Smooth Streaming

February 4th, 2009 Posted in Expression Encoder, Internet Information Services, Olympics, Silverlight, Smooth Streaming | 14 Comments »

In my last post I talked about the history of multi-bitrate streaming and how we got from RTSP and HTTP streaming back to HTTP download as the primary media web distribution mechanism. In this post we’ll take a closer look at how adaptive streaming differs from traditional streaming (i.e. RTSP) and plain progressive download.

 

Traditional Streaming

Let’s start by first taking a look at RTSP as an example of a traditional streaming protocol. RTSP is defined as a stateful protocol. This means that from the first time a client connects to the streaming server until the time it disconnects from the streaming server, the server keeps track of a client’s state. The client communicates its state to the server by issuing it commands such as PLAY, PAUSE or TEARDOWN (the first two are obvious; the last one is used to disconnect from the server and close the streaming session).

Once a session between the client and the server has been established, the server begins sending down the media as a steady stream of small packets (the format of these packets is known as RTP). The size of a typical RTP packet is 1452 bytes, which means that in a video stream encoded at 1 Mbps each packet only carries roughly about 11 msec of video. In RTSP the packets can be trasmitted either over UDP or TCP transports - the latter is preferred in cases where firewalls or proxies block UDP packets, but can also lead to increased latency (TCP packets get re-sent until received).

RTSP is an example of a traditional streaming protocol

RTSP is an example of a traditional streaming protocol

 

An HTTP protocol, on the other hand, is known as a stateless protocol. If an HTTP client requests some data, the server will respond by sending down the data, but it won’t remember the client or its state. Every HTTP request is handled as a completely standalone one-time session.

Windows Media Services supports streaming over both RTSP and HTTP. Now, you may ask yourself, “But if HTTP is a stateless protocol, how can it be used for streaming?” WMS uses a modified version of HTTP known as MS-WMSP, which uses standard HTTP for transfer of data and messages but also maintains session states, thus effectively turning it into a streaming protocol like RTSP/TCP. Windows Media Services has also supported RTSP streaming since 2003 (9 Series), over both UDP and TCP. Its implementation of the protocol is publicly documented as MS-RTSP.

The important things to remember about traditional streaming protocols like RTSP and WMS-HTTP is that:

  1. The server sends the data packets to the client at a real-time rate only - that is the bit rate at which the media is encoded (i.e. a 500 kbps encoded video is streamed to the client at approx. 500 kbps)
  2. The server only sends ahead enough data packets to fill the client buffer. The client buffer is typically between 1 and 10 seconds (WMP and Silverlight default buffer length is 5 seconds). This means that if you pause a streamed video and wait 10 minutes - still only ~5 seconds of video will have downloaded to your client in that time.

Progressive Download

The other most common form of media delivery on the Web today is progressive download. Progressive download is nothing more than just a plain, ordinary file download from an HTTP web server like IIS or Apache. It is supported by Silverlight, Flash, WMP, and nearly every other media player and platform under the sun. The term “progressive” likely stems from the fact that most player clients allow the media file to be played back while the download is still in progress - before the entire file has been fully written to disk (typically to the browser cache). Clients that support the HTTP 1.1 specification can also seek to positions in the media file that haven’t been downloaded yet by performing byte range requests to the web server (assuming it also supports HTTP 1.1).

The most popular video sharing websites on the Web today almost exclusively use progressive download:  YouTube, Vimeo, MySpace, MSN Soapbox - and even the rather misnamed Silverlight Streaming Service. (See my previous blog posts for a list of reasons why HTTP download is becoming increasingly popular in the online delivery of media.)

Unlike streaming servers which rarely send more than 10 seconds of media data to the client at a time, HTTP web servers keep the data flowing until the download is complete. This leads to the user experience that we have by now grown very accustomed to thanks to YouTube - if you pause a YouTube video at the beginning of playback and wait, eventually the entire video will have downloaded to your browser cache, allowing you to smoothly play the whole video without any hiccups. There is a downside to this behavior as well - if 30 seconds into a fully downloaded 10 minute video you decide you don’t like it and quit the video, both you and your content provider have just wasted 9:30 minutes worth of bandwidth. In an effort to mitigate this problem, IIS7 Media Pack 1.0 provides a cool feature called Bit Rate Throttling which allows content providers to throttle the download bitrate in order to reduce costs. But that’s another story…

 

Adaptive Streaming

Speaking of misnomers… Here’s another one: Adaptive Streaming. Guess what? It’s not really streaming in the classic sense at all.

Adaptive streaming is really a hybrid delivery method. It acts like streaming but is in fact based on HTTP progressive download. A more technically accurate name for adaptive streaming might be “A Series of Progressive Downloads of Variable Sized Video Fragments,” but even the most determined marketing experts would have a hard time selling that one. :)

A very important thing to remember about adaptive streaming is that there doesn’t really exist a standard implementation for it today because it’s just an advanced download concept, rather than a new protocol. This is why we talk about both Microsoft Smooth Streaming and Move Networks Adaptive Stream as examples of adaptive streaming, even though they use mutually incompatible codecs, formats and encryption schemes. They both rely on HTTP as the transport protocol and perform the media download as a long series of very small progressive downloads, rather than one big progressive download.

In a prototypical adaptive streaming implementation, the video/audio source is cut up into many short segments (”chunks”) and encoded to the desired delivery format. Chunks are typically 2-4 seconds in length. On the video codec level this typically means that each chunk is cut along video GOP boundaries (each chunk starts with a key frame) and has no dependencies on past or future chunks/GOPs. This allows every chunk to later be decoded completely independently from other chunks.

The encoded chunks are then hosted on a regular HTTP web server. A client requests the chunks from the web server in a linear fashion and downloads them using plain HTTP progressive download. As the chunks are downloaded to the client, the client plays back the sequence of chunks in linear order. Because the chunks were carefully encoded without any gaps or overlaps between them, the chunks play back as a seamless video.

The “adaptive” part of the solution comes into play when the video/audio source is encoded at multiple bitrates, generating multiple chunks of various sizes for each 2-4 seconds of video. The client now has the option to choose between chunks of different sizes. Because web servers usually deliver data as fast as network bandwidth allows them, the client can easily estimate user bandwidth and decide to download bigger or smaller chunks ahead of time. The size of the playback/download buffer is fully customizable.

Adaptive streaming is a hybrid media delivery method

Adaptive streaming is a hybrid media delivery method

 

Adaptive streaming, like other forms of HTTP delivery, offers the following advantages to the content provider:

  • It’s cheaper to deploy because adaptive streaming can utilize any generic HTTP caches/proxies and doesn’t require specialized servers at every node
  • It offers better scalability and reach, reducing “last mile” issues because it can dynamically adapt to inferior network conditions as it gets closer to the user’s home
  • It lets the audience adapt to the content, rather than requiring the content providers to guess which bitrates are most likely to be accessible to their audience

It also offers the following benefits for the end user:

  • Fast start-up and seek times because start-up/seeking can be initiated on the lowest bitrate before moving up to a higher bitrate
  • No buffering, no disconnects, no playback stutter (as long as the user meets the minimum bitrate requirement)
  • Seamless bitrate switching based on network conditions and CPU capabilities
  • A generally consistent, smooth playback experience

 

Microsoft Adaptive Streaming Prototype: NBC Olympics 2008

We first prototyped an implementation of HTTP-based adaptive streaming as part of the NBC Olympics 2008 project. In order to deliver the desired level of quality in a short period of time, we took the most basic adaptive streaming implementation approach. We had NBC’s Digital Rapids and Anystream encoders produce multiple WMV files of different bitrates/resolutions for each source. The encoders didn’t employ any new encoding tricks but merely followed strict encoding guidelines (closed GOP, fixed length GOP, VC-1 entry point headers) which ensured exact frame alignment across the various bitrates of the same video. Then we ran the WMV files through a post-processing tool which physically split each WMV file into thousands of 2-second chunks (files). The rest of the solution consisted of simply uploading the chunks to Limelight’s origin web servers (running Apache) and then building a Silverlight player that would download the chunks and play them in sequence. Simple!

The good news: Our implementation worked great for the end users. We were able to offer a better-than-WMS streaming experience while using just simple HTTP download!

The bad news: CDN operators lost many hours (days?) managing millions of tiny files in their systems. Imagine: if every 2-seconds of video is split into a separate file and this is repeated for 5 available bitrates, you end up with 150 files for every minute of video. That’s 13,500 files for a 90-minute soccer game!

So despite NBC Olympics being a huge success for Silverlight and HTTP-based adaptive streaming, it quickly became apparent we had to go back to the drawing board on this one.

 

At Last! Smooth Streaming!

The IIS Media team soon took charge of turning the NBC Olympics adaptive streaming solution into a real server product. Its official name - IIS7 Smooth Streaming, an extension for Internet Information Services 7.0.

The IIS Media team redesigned the content creation and delivery aspect of the prototype solution in order to fix the file management issues while still keeping all the advantages of the original solution. The new design eschewed the one-file-per-chunk approach in favor of a single contiguous file for each encoded bitrate. The file format of choice: MPEG-4.

Smooth Streaming server uses the MPEG-4 Part 14 (ISO/IEC 14496-12) file format as its disk (storage) and wire (transport) format. Specifically, the Smooth Streaming specification defines each chunk/GOP as an MPEG-4 Movie Fragment (moof) and stores it within a contiguous MP4 file for easy random access. One MP4 file is expected per each bitrate. When the client requests a specific source time segment from the IIS server, the server dynamically finds the appropriate Movie Fragment box within the contiguous MP4 file and sends it over the wire as a standalone file, thus ensuring full cacheability downstream.

In other words, with Smooth Streaming file chunks are created virtually upon client request, but the actual video is stored on disk as a single full-length file per encoded bitrate.

 

Smooth Streaming Availability

Smooth Streaming server support will ship as part of the next edition of IIS7 Media Pack, a free download for Windows Server 2008. A technology preview of IIS7 Smooth Streaming Server is already available now through Akamai. They are calling this service Akamai AdaptiveEdge Streaming for Microsoft Silverlight. A demo of the service is available at http://www.smoothhd.com.

On the content creation end, creation of on-demand Smooth Streaming-compatible video is already possible with the latest Expression Encoder 2 SP1. Note that you’ll need to purchase the full version of Expression Encoder 2 in order to get Smooth Streaming encoding support - it’s not included in the “Express” trial version. As a helper tool for encoding to multiple-bitrate formats such as Smooth Streaming, I recommend my Smooth Streaming Calculator. (More about Smooth Streaming encoding with Expression Encoder to come soon.)

In addition, we are already working with a number of encoding ISVs on enabling support for the Smooth Streaming format in their professional encoding products.

 

Smooth Streaming Playback:

You probably already know that Silverlight 2 supports playback of Smooth Streaming sources (if you don’t, go to http://www.smoothhd.com). But how does it do it?

Despite popular belief, Silverlight doesn’t actually feature native support for any particular adaptive streaming technology - Microsoft’s, Netflix’s or Move Networks’ for example. Smooth Streaming support in Silverlight is implemented via the MediaStreamSource API. This API allows developers to implement their own media transport methods (instead of relying on MediaElement’s native transport methods) while still leveraging Silverlight’s native decoders and renderers. In other words, Silverlight support for Smooth Streaming is provided entirely in .NET code: the parsing of the MPEG-4 file format, the HTTP download, the bitrate switching heuristics, etc. This allows developers to modify and fine-tune the client adaptive streaming code as needed, instead of waiting for the next Silverlight release and hoping it magically fixes every customer scenario.

The most challenging part of Smooth Streaming Silverlight client development is the heuristics module which determines when and how to switch bitrates. Elementary stream switching functionality requires the ability to swiftly adapt to changing network conditions while never falling too far behind, but that’s often not enough to deliver a great experience. One must also consider: What if the user has enough bandwidth but doesn’t have enough CPU power to consume the high bitrates/resolutions? What happens when the video is paused or hidden in the background (i.e. minimized browser window)? What if the resolution of the best available video stream is actually larger than the screen resolution, thus wasting bandwidth? How large should the download buffer window be? How does one ensure seamless rollover to new media assets such as ads? As any web application developer will tell, you there’s much more to building a good player than just setting a source URL for the media element.

Fortunately for those who prefer not to write such code from scratch, there are already two options available for adding Smooth Streaming support to your Silverlight application:

  1. Expression Encoder 2 SP1 templates
    Every Silverlight 2 player template included with Expression Encoder 2 SP1 includes a ready Smooth Streaming module as well as complete source code (which can be modified and used freely). The Smooth Streaming object (named AdaptiveStreaming.dll) can be easily integrated into any Silverlight project. See James Clarke’s blog for additional Expression Encoder tips & tricks.
  2. Open Video Player (OVP)
    The Akamai-led Open Video Player Initiative is an open-source community project that strives to provide a best-of-breed video player platform for Silverlight and Flash. The Silverlight version of the Open Video Player provides integrated support for Smooth Streaming playback, and is in fact the video player used by Akamai on SmoothHD.com and many of their customer sites.

  These templates provide great out-of-the-box Smooth Streaming experiences while also allowing developers to continue innovating and fine-tuning the client code.

 

In my next blog post:  The Smooth Streaming Format