Monday, 9 July 2012

Scraping the BBC Olympics schedule

You'd have thought that a downloadable CSV of the Olympics schedule would be available somewhere, but apparently not. Luckily, the BBC site has a grid view version of the schedule that is fairly easy to scrape and parse - mind you, that link was very hard to find (it's nowhere to be seen on the BBC Olympics page - duh!).

A bit of lynx and bash scripting came up with the start and end times of each sport on each day, though remember that these are split into sub-events, which often overlap and hence will require multiple HD channels. This first attempt was solely to find out the earliest start time (08:00 - Modern Pentathlon) and end times (23:50 for Beach Volleyball and the "winner" Basketball at 00:00 midnight) that the sports are due to be broadcast, ignoring any potential overruns.

Surprisingly, the athletics don't start until at least 09:00 (twice) and the earliest sessions are often 10:00 or even later. The next enhancement of the script will be dig down and find out the start and end times of sub-events, because we want to find out how many sub-events will air simultaneously (i.e. track the maximum for each day) across all sports for, say, any 1-hour period in the day. This will give a simple set of daily figures for the maximum simultaneous recordings that will be needed. In theory, it should be 24 on at least a few days you'd expect, but I wonder if this really is the case!

What may happen is that the BBC will allocate a certain number of channels per sport (and these blocks of channels may occasionally change during the Olympics because some sports take place exclusively before or after others, so you may as well re-use some blocks in those cases). We'll know later this week once 14-day EPGs start to trickle in for the 24 HD channels. I subscribed to recently as well because it looks like they're getting the 24 HD channels in their listings.

No comments:

Post a Comment