cat aws-live-streaming-explained:-mediaconnect,-medialive,-mediapackage-&-cloudfront.md
AWS Live Streaming Explained: MediaConnect, MediaLive, MediaPackage & CloudFront
A volunteer stood next to me on stage at the AWS Summit Hamburg community track, looking a little unsure about the ball and the small goal we had set up under the conference lights. I asked the room to scan a QR code, and a few hundred phones lit up. Then the volunteer took a penalty, an iPhone caught the shot, and a second or two later that same shot played back on every phone in the audience.
That was the whole trick, and it was also the whole point. The camera was an iPhone. The production setup was OBS running on my laptop. Everything between those two ends ran through AWS. No broadcast truck parked outside, no rack of specialized hardware, no team watching over a row of encoders.
I like starting here because it turns a fuzzy question into a concrete one. When you watch a live match, a concert, or someone gaming at three in the morning, what actually happens in the gap between the glass of the camera lens and the glass of the screen in your hand? For a long time that gap belonged to a small circle of broadcast specialists. Most of it has quietly become a handful of cloud services you can wire together yourself.
Live streaming used to be a dark art
If you wanted to put a live signal in front of an audience, you bought hardware. Real, heavy, expensive hardware. Encoders that cost as much as a car, packagers that filled a rack, and a contract with someone who knew how to keep all of it running during a live event, because there is no second take once the whistle blows.
What has changed over the last decade has less to do with any single product and more with how the whole media supply chain gets built. The industry moved away from racks of purpose-built boxes and toward software-defined workflows. Encoding, packaging, and delivery increasingly happen as software running on general-purpose infrastructure, which means they can live in the cloud and scale up only when there is something to stream.
That shift is the reason the penalty demo works at all. I did not buy anything, and I did not own a single encoder. I wired together a few managed services, pointed OBS at them, and the broadcast machinery spun up behind the scenes. The dark art became something closer to assembling building blocks, where each block does one part of the job well and then hands off to the next.
The rest of this post is about those blocks. What each one does, why it earns its place as its own service, and how they fit together into a pipeline that carries a single frame from a camera lens to a screen.
The four steps every stream takes
Every live stream, whether it is a world title fight or a volunteer taking a penalty in a conference hall, makes the same journey. Once you can see that journey clearly, the individual AWS services stop looking like an intimidating menu and start looking like answers to four simple questions.
The first step is ingest. The signal has to get from wherever it is captured into the system that will work on it. A camera, a hardware encoder, or in my case OBS on a laptop, has to hand the video off reliably, even when the network between the venue and the cloud is far from perfect.
The second step is process. Raw video is enormous, and it arrives in one fixed shape. Before it can reach a phone on a moving train and a TV on fast home internet at the same time, it has to be compressed and prepared in several versions, so each viewer gets a quality that fits their connection and their device.
The third step is store, or more precisely, package and originate. The processed video has to be held somewhere and handed out in the exact format each device asks for. The same stream might leave as one format for an iPhone and another for a smart TV, assembled on demand.
The fourth step is deliver. The video has to reach a global audience without every viewer reaching all the way back to a single origin. That is the job of a content delivery network, which keeps copies close to the viewer so the last leg of the trip stays short.
Ingest, process, store, deliver. Four steps, and one service for each of them. This is what people in the industry mean when they talk about going “from glass to glass”: from the glass of the camera lens that first catches the light, all the way to the glass of the screen in someone’s hand. The next sections walk through the four steps in order, following our penalty across that whole distance.
Ingest: MediaConnect
Our penalty starts as light hitting the sensor of an iPhone. OBS turns that into a video stream on my laptop, and now comes the first real problem. That stream has to travel out of a conference hall, across whatever network happens to be available, and arrive in the cloud intact. Live video is unforgiving here. A web page can wait half a second for a slow packet and nobody notices. A live stream that loses packets stutters, freezes, or falls apart in front of the audience.
This is the problem AWS Elemental MediaConnect is built to solve. Think of it as the on-ramp for live video, designed first and foremost for reliability over networks you do not fully control, the public internet included.
MediaConnect organizes everything around a flow. A flow has a source, the signal coming in, and one or more outputs, the places that signal goes next. To keep the picture intact over an imperfect connection, it relies on transport protocols built for exactly this job, such as Zixi, RIST, RTP with forward error correction, and SRT, which add error correction and recovery on top of the raw stream. What comes out the other side is a contribution feed that holds together even when the underlying network gets shaky.
Two ideas make MediaConnect more than a simple pipe. The first is the split between contribution and distribution. Contribution is getting a signal into the cloud from the field, the upstream leg from camera to system. Distribution is sending a signal back out again, for example to a partner who needs to receive the feed. The second idea is entitlements, which let you grant someone access to one of your flows without ever handing over connection details or credentials. You share the stream, not the keys.
For the penalty demo, none of this has to be elaborate. OBS pushes a single contribution feed into the cloud, and that feed becomes the reliable starting point for everything that follows. The same service that carries one volunteer’s penalty is the one that carries feeds for events where a single dropped frame is not an option.
Process: MediaLive
Now the signal is in the cloud, and we hit a problem of sheer size. Raw, uncompressed video is enormous. A single uncompressed 4K stream at 60 frames per second runs at roughly 12 gigabits per second, while even a good home internet connection delivers only a small fraction of that. If we tried to ship the picture exactly as the camera sees it, nobody would ever watch it. Something has to make the video dramatically smaller without making it look bad.
I sometimes make this concrete with my own household. If I tried to push uncompressed video like that through my home network, I would not get far before I heard my kids yelling from the other room, because their Netflix, their Prime Video, and their Spotify had all ground to a halt at the same moment. One high-resolution stream would swallow the entire connection. Compression is what turns “technically impossible at home” into “works on a phone on a train.”
That something is AWS Elemental MediaLive, and the word for what it does is encoding. Encoding is compression. MediaLive takes the contribution feed and squeezes it down into sizes that can actually travel over real networks to real devices.
It does not produce just one version. A phone on a train and a TV on fast home internet have very different limits, so MediaLive produces several renditions of the same content at different resolutions and bitrates. The player on the viewer’s side then picks whichever rendition fits the connection it has right now, and switches on the fly as conditions change. This is adaptive bitrate streaming, and it is the reason a stream gets blurry for a moment instead of stopping dead when your signal drops.
To make that work, MediaLive does not send one long file. It cuts the stream into small segments of a few seconds each and writes a manifest, a kind of playlist that tells the player which segments exist and where to find them. In the HLS world this manifest is the .m3u8 file, and a multi-variant playlist points to one playlist per rendition. The player reads the menu and assembles a smooth experience out of small pieces.
MediaLive also does the production touches you rarely think about as a viewer. It can burn in graphics and overlays, carry timecode, and generate thumbnails. For our purposes the headline is simpler: this is the step that turns one impossibly large signal into a set of right-sized, segmented streams ready to be handed out.
Store: MediaPackage
At this point the penalty exists as neat, compressed segments. But there is still a gap between “the video is ready” and “this specific device can play it.” An older smart TV might ask for one streaming format, an iPhone for another, a browser for a third. You do not want to run a separate encode for every one of them.
AWS Elemental MediaPackage sits in this gap as the origin. It holds the processed video and hands it out in the exact format each device asks for, using just-in-time packaging. Instead of pre-building every possible format in advance, it keeps one set of segments and repackages them on demand, in the moment a player requests them. One source, many shapes, assembled when needed, whether that shape is HLS, DASH, or Microsoft Smooth Streaming.
Because everything funnels through this origin, MediaPackage is also the natural place for a set of capabilities that would be painful to bolt on anywhere else. It can apply DRM and content protection, handle captions, insert ads, and offer time-shifting features like start-over and catch-up. It can even harvest a live stream into an on-demand asset, so the moment the event ends there is already a recording to watch.
We will not wire all of that up for a single penalty. The point is the shape of the thing: one well-defined origin that speaks every device’s language, and a place to add protection and monetization later without rebuilding the pipeline. I will come back to a couple of these features near the end.
Deliver: CloudFront
The final step is the one most people forget until it breaks. The packaged video is sitting at an origin in one place, and the audience is everywhere. If every viewer reached all the way back to that single origin for every segment, the origin would buckle and the people furthest away would get the worst experience.
Amazon CloudFront is the content delivery network that fixes this. It is a global network of edge locations, and its job is to keep copies of content close to viewers so the last leg of the journey stays short. When a viewer asks for a segment, they are served from a nearby edge rather than from the distant origin.
Live streaming makes this especially powerful, because thousands of viewers tend to want the very same segment at almost the same moment. CloudFront uses a caching hierarchy to take advantage of that. A request first hits a nearby edge location, which can fall back to a larger regional edge cache, which only then reaches back to the MediaPackage origin if nobody has fetched that segment yet. One trip to the origin can serve a whole crowd.
This is also where the old fallacies of distributed computing come back to bite you. The network is not reliable, latency is not zero, and bandwidth is not infinite. Delivery is the last mile where all of that becomes real, which is why it deserves the same attention as encoding, and why observability at this stage matters so much. For the penalty in the hall, the audience scanned a QR code and watched on their own phones a second or two later. That short delay is the whole pipeline, ending exactly where it should, at the glass of a screen.
What I’d tell someone starting today
When the volunteer’s penalty came back onto a few hundred phones in that hall, it had quietly traveled the whole pipeline. It was ingested through MediaConnect, compressed and segmented by MediaLive, packaged and originated by MediaPackage, and delivered by CloudFront to every screen in the room. From the glass of an iPhone lens to the glass of each phone in the audience, in a second or two. That is the whole trip, from glass to glass.
So what would I tell someone who wants to build their own live stream today? The most useful thing I learned is that the hard part is no longer the hardware. You do not need a truck, a rack, or a standing crew to put live video in front of an audience. You need to understand the four steps, and then pick the service that owns each one. Ingest, process, store, deliver. Once that shape is clear in your head, the long menu of media services stops being intimidating and starts being obvious.
The other thing I would say is to start exactly as small as my demo did. One contribution feed, one set of renditions, one origin, one distribution. The same four building blocks that carried a single penalty are the ones that carry the largest live events in the world. The shape does not change as you grow. You mostly just turn the dials.
There is an important caveat here, and it came back to me as feedback after I gave this talk at an AWS user group. These services are genuinely easy to start with, and they integrate cleanly with one another, but that ease can be deceptive. Behind the simple console screens sits decades of broadcast engineering. Understanding what is really happening, redundancy and failover, latency budgets, audio, color, compliance, still takes deep domain knowledge. For a hobby project you can lean on the defaults. For a professional broadcast environment, that expertise is not optional.
I do not read this as a threat to the people who hold that knowledge. If anything, it is the opposite. The mental model shifts, and the craft has to move with it. The broadcast engineers who used to keep rooms full of hardware alive are exactly the people this new world needs, now carrying their experience into a software-defined pipeline. The hardware got easier. The expertise that makes a stream genuinely good did not go away. It just changed where it lives.
There is also plenty I left out here, and most of it sits at the edges of this same pipeline. You can harvest a live stream into an on-demand asset the moment it ends, so the replay is ready before the audience has even left. You can insert ads for each viewer instead of for the whole broadcast. You can add DRM, captions, and low-latency modes without tearing anything down. Each of those is a small addition to a shape you already understand, and any one of them would make a good follow-up post.
If you take one thing from this, let it be the mental model rather than the service names. Watch the next live stream you happen to catch, a match, a concert, a game at three in the morning, and try to trace the frame backwards: delivered, stored, processed, ingested, all the way back to the lens that first caught the light. Once you can see the pipeline, you can build it.