Stage 1 - Upload
Stage 2 - Encode
Stage 3 - SmoothStreaming
Anatomically speaking, Encoding is the heart of SUESS. (I guess, to stretch out the metaphor, that makes Silverlight the head, SmoothStreaming the eyes and ears, and I guess Uploading the digestive system.) Encoding, implemented via Expression Encoder, is what makes the process of media uploading feasible to end users.
By "feasible" I really mean "usable" - people want online functionality to be really easy to understand. Usability 101 tells us that the more hoops people have to jump through, the less inclined they will be to use your site, regardless of how pretty the UI is. The hoops here are presented by the fact that Silverlight does not support every media format under the sun. We can't simply upload a random file and then watch it; Silverlight has to "know" how to play it.
Do you think if people could only upload Flash media to YouTube it would have been nearly as popular? Of course not. I'm surprised that abilities like uploading media and capturing live web cam feeds are exoteric enough for the general Internet-goer. But encoding a movie or image? Not only is it an extra hoop, but it's simply too much to expect from a user.
By automating the encoding process, we can enable people to provide IIS SmoothStreaming media to their website's audience with just as much effort as it would take to upload an image. SmoothStreaming itself is the other side of the coin: we need to encode anyway if we want to take advantage of this new media streaming technology! So by putting in all this effort to solve these intrinsic media usability problems (large files, wide arrays of formats, etc.) we are awarded by having the otherwise heavily-guarded SmoothStreaming gate automatically unlocked for us! (See the final post on SUESS for more information about IIS SmoothStreaming; we still need to open the gate, of course!)
Those are the two reasons why we encode. The next question, which leads us into our architectural discussion, is of course: how? I went with Microsoft Expression Encoder 3.0 (4.0 was still in beta during development). Here are some of the determining factors that lead me to this decision:
And, I didn't feel like dealing with the Media Player SDK, buying and kluding a third party control, or reverting back to a "Web 1.0" paradigm of an upload screen that says something like "Thank you for uploading your media. Please try back later to see it when it's done because we couldn't figure out a more elegant way to implement our site's media administration."
I wanted this to be first class.
Let's start by first taking a look at the detailed architecture of the Encoder. Back to everyone's favorite Visio diagram:
Basically, there are three major sub components running on the Media Server tier of SUESS. (And boy am I ever excited to have built something complex enough to have "sub components!") The first one is the Media Service WCF endpoint. This is a standard WCF service that, after the Silverlight client completes the Uploading phase of SUESS, it calls to kick off the encoding process.
The interface for the Media Service is very simple, containing only two methods: Encode and Cancel. I tried to choose really intuitive names here; "Encode" performs the encoding, and "Cancel" cancels it. Encode wraps the Expression APIs and does all the work needed to get our media into the IIS SmoothStreaming format.
There is also a web.config file that goes along with the Media Service to store the application settings (along with all the WCF goo). These dictate the size of the thumbnail images, and the locations of both the temporary upload path for our raw media files as well as the destination for the IIS SmoothStreaming-formatted files. These settings, combined with the file name passed in from the Uploader, is all the Encoder needs to do its thing.
There is one more quite architectural consideration to point out before we jump into code. The Expression Encoder APIs are 32-bit only. This is generally not an issue if you are hanging out in 2007 or earlier, because starting with Windows Server 2008, 64-bit has become the de facto standard in making developers crazy. That's not to say that 32-bit applications won't work; .NET handles all of that "WOW" stuff for us.
However, when you try to have a 64 bit process (such as a Windows Service, or an ASP.NET app like SharePoint) call into the 32 bit Encoder DLLs, things die. If your first instinct (like mine) was to compile the Media Service using Visual Studio's "x86" or "Any CPU" configurations, you won't get there. The problem isn't that one piece of code can't talk to another; it's that the entire process needs to be in the same architecture.
So to circumvent this, we need to be running our entire stack in 32-bit mode. Whether it's a Windows Service set explicitly to compile to x86 or a 64-bit IIS application pool that has 32-bit support enabled, since Encoder requires us to be in 32-bit world, we need a 32-bit process. The clear choice here is to use IIS and flip one setting (App Pool -> Advanced Settings -> Enable 32-Bit Applications -> True) verses going through all the hell of building, deploying, and configuring a Windows Service.
Windows Services suck.
The second sub component is the Encoder API itself. Let's look at the Encode method first to get an idea of everything that's going on programmatically. Then, we're going to jump back to the WCF side of things are discuss the bi-directional communication (awesome).
The main workhorse of the Expression Encoder API is the Job object. I couldn't find any formal documentation on the API, so I sort of reverse engineered it by mapping class names in Visual Studio to menu options in the Encoder product. Whenever you encode something using the application, you create a new "Job" and set a bunch of options on it. This is pretty much the same way I learned SharePoint many years ago; the Encoder API, although undocumented, is more intuitive than SharePoint's, so I was able to hack my way though it a little easier.
So let's take a look. Line #5 is the WFC bi-directional stuff, which we'll look at in a bit. Line #9 instantiates our Job object. It's global in the service because, skipping down to Lines #75-77, where we hook the events, we need to refer back to properties on the Job object. Again, we'll look at the event handlers themselves when we move back from the Encoder to the WCF/Silverlight communication logic. Lines #10-14 round out the Job initialization code, calling helper methods that simply grab values from the web.config file.
Starting at Line #16, we get into the meat and potatoes of media side of the Encoder API. Encoder allows you control a vast array of characteristics of an encoding process. The first one here is the video complexity (which I covered in the Uploader post). Basically, this enumeration dictates an arbitrary ratio between video quality and encoding speed. While this particular property is a bit black-box-ish, (I couldn't track it down in Reflector) the rest that we're going to look at really show how much control you have over your users' viewing experiences.
Basically, an Encoder Job needs three pieces of information to encode media: a VideoProfile, an AudioProfile, and a MediaItem. The VideoProfile has information of the "target" video properties; same deal with the AudioProfile. The MediaItem combines the two profiles, as well as contains a myriad of properties pertaining to things such as thumbnail information, markers, general media settings, audio overlays, and on and on.
Let's pick back up with the code at Line #43. An AdvancedVC1VideoProfile is one of many classes that is derived from a deep inheritance tree originating with VideoProfile. All of these classes correspond to options available in Expression Encoder's UI. Without going into too much detail, the type initializer made up of Lines #45-47 basically tells the Encoder API to encode into the IIS SmoothStreaming format using the "Advanced" VC1 codec presets.
Each class derived from VideoProfile adds lots of different options that allow you to tweak your output video format according to your requirements. If we were supporting "straight" HD Silverlight video, we'd probably use the MainH264VideoProfile instead. Note that setting the "SmoothStreaming" property to true on any VideoProfile derivation simply "enables" the IIS SmoothStreaming output format; the files themselves will adhere to which ever video codec you assign them, based on the profile you select and properties you set. And if you enable SmoothStreaming, there are other properties that need to be set a certain way. I haven't dug too deeply on this, but don't worry: the exception details you'll get are description enough to guide you through the programmatic configuration of your video profiles.
With the profile out of the way, we now need to tackle the SmoothStreaming aspect of the video. Check out Lines #50-54. This is great example of what IIS SmoothStreaming is all about. If I had to describe this format in one sentence, it would be thus: The focus is on dynamically assessing the bandwidth of the connection between the Silverlight client and IIS Media Services 3.0 server, and selecting the correct quality version of the requested media.
The aforementioned lines of code determine how many different options we are giving IIS Media Services to work with when it determines quality, and what each option, or "Stream," can deliver. First, Line #50 clears out the "default" stream that "comes with" the AdvancedVC1VideoProfile instance. This is a best practice. Next we need to define our own streams. As you can see from the code, a stream is made up of a bitrate and a size. The next three lines then each define streams of increasing sizes (and therefore qualities) while maintaining a native 4:3 aspect ratio.
So let's dig, taking the second parameter fist. The "Size" portion of a stream is easy enough to understand: a movie that's 800x600 natively will look better in a standardly-sized player than one that's 400x300. But since more pixels are crammed into the same space, the file is denser, and therefore larger on disk. This requires more bandwidth to be pushed down in a timely manner that keeps up with the bitrate. The smaller file has fewer pixels and less quality, and requires less bandwidth to deliver.
Cool. So what's a bitrate? Well, that's a "bit" more complicated. Sorry. There are two types in the Encoder API: VariableBitRate (with its few derivations) and ConstantBitRate. I'll let Wikipedia do the honors here, but at a high level, different segments of a media file have different complexities. If the movie starts out blank with no sound for a few seconds, those frames are very "small" compared to frames later in the movie that have high-speed motion or explosions. These portions of the file have a lot more detail.
A variable bitrate allows the file to sort of throttle itself, so that "easy" portions (like all black intro frames) can transmit over the wire at lower speeds, but more "complicated" segments can start sprinting and stream and higher rates. The drawbacks? Larger files and longer time needed to encode. By comparison, a constant bitrate "forces" every segment of the file to have the same quality. Of course, this is much simpler logic, and transfers at a constant speed making IIS Media Services' life easier, but those aforementioned "more complicated" portions of the film will be scaled back to meet the constant bitrate.
So when you choose the bitrate that works for your situation, you can choose from goodies such as: VariableUnconstrainedBitrate, VariableConstrainedBitrate, VariableQualityBitrate, and ConstantBitrate. You can discern among these by the parameter that each class takes. The code sample uses ConstantBitrate, because SUESS was first created for a SharePoint intranet, where we had complete control over network bandwidth. IIS SmoothStreaming was key in general because the client had offices all over the world, but beyond that, we pushed for speed over quality.
Once thing to note about ConstantBitrate: even though it yields a leaner, faster format, it could take longer to encode! By default, Encoder will make TWO passes through the file, essentially doubling the time it takes to encode. The first pass determines the average bitrate across the file, and the second pass actually does the work with the correct average. That's what the false parameter in the ConstantBitRate constructor in Lines #51-53 controls; it tells Encoder to use the BitRate passed in and to not worry about being too fancy about it.
And that's just the video; audio is a separate story in the media world. When watching any modern media (Film, DVD, Web, etc.) you're actually watching a video track and listening to an audio track - separate from each other - that both started at the same time. In much the same way video was configured with bit rates and quality settings, you can customize the audio portion of SUESS to the same extent.
I won't go into nearly as much detail for audio, because it's really the same idea. In fact, the concepts of profiles, bitrates, and customizable quality settings all work exactly the same; there's just different types (and actually, the bitrates are identical). So you can play around with these settings the same way we played around WAV files ripped from CDs back in Windows 95!
In this example, (which again was for an intranet project) I sacrificed audio quality so I could push as much video through the pipes as possible. What's interesting is that the same audio settings are applied to each video stream; in other words, SmoothStreaming only considers video, and pushes the same audio profile irrespective to which quality it detects. Lines #55-61 configure the audio in the above code example.
Finally we have the MediaItem, which acts as a single encoding "task" for our Job object to perform. If you add multiple MediaItems to the Job (or even if our service gets a request while another encoding operation is in progress) they will queue up and finish in series, so that your processor doesn't have a heart attack.
As I alluded to before, the MediaItem houses the video and audio profiles and collects the rest of the data that tells the Job exactly what to do. For my purposes, which was more to the end of allowing pervasive formats and maximum performance verses actual media processing (I'm a developer, not a DeeJay) I was able to configure everything I needed media-wise in my profiles. However, I again encourage you to experiment with the vast array of options available on MediaItem; the API is extremely powerful!
The only additional customization I needed to do on the MediaItem was to define how to pull a thumbnail from the uploaded media. I new up my object on Line #63, passing it the string path of the final upload destination for the file. Then, Lines #65-67 define how I want my thumbnail to be generated. In this example, it's a 4:3 JPEG of the first frame. The file will be saved in the same directory as the SmoothStream files.
And that's about it! Lines #68-72 pull my video and audio profiles together into an OutputFormat. I'm using WindowsMediaOutputFormat because that's the one directly compatible with Silverlight. There are a few others you can play with, primarily if you are doing Live encoding. This is a bit confusing, since we've been talking about SmoothStreaming this whole time, but now we're telling the Job to dump out a WMV? Think of it this way: we are encoding for Silverlight, so Windows Media is the best option. SmoothStreaming is special type of format for WMV that IIS knows how do all the fancy streaming with.
The code block ends with hooking the events, adding our MediaItem, and then calling Encode! We'll check in with our events in the next section. One quick thing to note is that some operations around MediaItem can take a while (seconds or longer) to execute, so plan your progress bars as necessary. Watch how the Expression Encoder product does its thing. When you select a new file to encode, it chugs for a bit as it pulls thumbnails, applies default profiles, etc. So if you don't start your progress bars until the first progress event fires, your users might think you app is hung while they wait for the Encoder API to warm up!
.NET RIA Services is cool. Web cam integration is fun. File drag & drop is neat. But as someone who pays for his drinks by building business applications, I have to say that support for the WCF bi-directional HTTP Polling Duplex binding is my favorite new Silverlight 4 toy. AJAX greatly improved UX by eliminating post backs and making it easy to implement "please wait" functionality while the page churned. Silverlight takes us further by allowing real time progress bars during long-running processes.
Finally, we can do serious work on the web.
So how does this bi-directional stuff work? First of all, it's still straight WCF: contracts on the server, proxies on the client, and tons of glue in the web.config file. The ServiceContract works the same way, exact that you add a reference to a second ServiceContract that acts as the CallbackContract. What this means is that we have a normal ServiceContact that guarantees certain methods are available on the server. The second ServiceContact (which is a separate interface) defines certain events that the client can handle; in other words, the server can basically call methods on the client!
Well, not really: when the client makes the first call, it keeps a channel open with the server, through which it "polls" [hence the name] - asking the server constantly if something has occurred. When the server raises notification that something has indeed happened, the corresponding event is fired. The cool thing about this is how WCF abstracts away the hairy details, and lets us work in the comfortable event-driven programming model.
Let's look at some code. First, here are the contracts:
Like I said this is pretty much straight WCF. The only additions are in the attribution. The main "new" concept is the "CallbackClient" in Line #2. This basically tells the service what "methods" it can "call" on the client. You'll also notice a lot of attribution setting the IsOneWay switch to true. This enables the "event-driven" style we discussed earlier. Even if methods return voids, if IsOneWay is false, the server is still expecting a reply message form the client. This behavior is more client/server than event-driven, and breaks the duplex model.
Next, let's look at how we define our "global variables." The first one in the code below is our Encoder Job object, which like I said is "global" so that I can more easily reference it in its events. The second one is our Encoder client interface (of type, well, IEncoderClient of course). Line #6 (which is an excerpt from the Encode method in the first code sample [Line #5 there]) shows how we can grab a proxy to our client using the OperationContext. This is a magical call that, when invoked inside a method corresponding to one of our service's operation contracts, "knows" about the client who called it.
Now that we have a solid reference to our client, we can call methods on the global _client interface, and the corresponding events will be raised and handled as normal WCF service events down on our Silverlight clients. Beautiful. Finally, let's look at one of these duplex method calls from both the server's and client's perspective. The example I'll use here is the Encoder API's "EncodeProgress" event. First, the event handler to what we hooked in Line #75 from the first code listing in this post:
The Encoder API gives us a lot of information in the EncodeProgressEventArgs parameter. However, as I defined my IEncoderClient interface, I'm only interesting in the percentage done and the time remaining. So Line #3 subtracts the current time position of the Job from the total duration of the first (and always only) MediaItem and that's the time remaining. I pass the progress straight through to Line #6, which actually fires this event on the client. Speaking of the client, let's see what's going on in the Silverlight side of things to bring this full circle.
As you can see, this is very standard WCF/Silverlight functionality. The UI is updated with values that the server sends down. And since this event is hooked the same exact way as any other WCF method (just get a reference to the service proxy object and wire up an event handler) we are guaranteed to be on the UI thread when we update controls. It's...and I'm getting emotional here...just beautiful...
The last thing we need to talk about is the configuration. Did you think we'd talk about anything WCF related and not have to dig into a web.config file? Actually, like everything else we've discussed on Silverlight side of the Encoder stage, there's nothing new, per se; just additional. Since the Polling Duplex binding didn't ship with Silverlight 4, (and if your service is still in .NET 3.5.1) we need to manually reference it.
First, we need to get the correct WCF assembly. Once you install all the latest bits for the Silverlight 4 developer run time, go to this folder on your development box (assuming it's an x64 OS): C:\Program Files (x86)\Microsoft SDKs\Silverlight\v4.0\Libraries (and then either the \Client or \Server folder). Grab the appropriate DLL (System.ServiceModel.PollingDuplex.dll) and reference the "client" version in your Silverlight project and the "server" one in the WCF project.
Now, let's look at the server side web.config file:
In Lines #9-14, we reference the aforementioned assembly as a valid binding. Then, in Lines #14-18, we wire up the binding like another WCF service. Note that Visual Studio will complain, since the "pollingDuplexHttpBinding" isn't part of the out-of-the-box schema. On the Silverlight side, I programmatically create my binding objects so I don't have to worry about updating client-side configuration files. Here's the code for that (refer to the Uploader post for the details).
It is against the object that this method returns that you can hook the "polling" events to enable bi-directional communitcation.
That closes the book on the Encoding phase of SUESS. As you can see, there's a lot going on, but all of the pieces are very straightforward. It's taking all of the cutting edge functionality from Encoder, IIS, WCF, and Silverlight, and gluing them all together. The hardest thing I ran into when built all of this is finding the right glue...and knowing when to use a lot and when to use a little.
Onto the final stage: SmoothStreaming!