After the deluge … dealing with changes to the gameday xml repository at MLB

I retrieve MLB’s gameday xml files on a daily basis.  Of course, at the start of the season, it’s a good idea to make sure things have worked as expected.  So, when I checked after day 1 of the season, it was a bit of a shock to see that no data had been downloaded.

Turns out, MLB made a very slight change to the syntax of directory listings from:



The only difference?  Missing the slash at the end, but that broke the contruction of all URLs.

If you use anything based on the scripts in Baseball Hacks, you’ll need to change your PERL scripts.


Here are the changes I had to make:

$dayurl = “$baseurl/year_$year/month_$mon/day_$mday/”;


$dayurl = “$baseurl/year_$year/month_$mon/day_$mday”;


while($html =~ m/<a href=\”(gid_\w+\/)\”/g ) {


while($html =~ m/<a href=\”day\_[0-9]{1,2}\/(gid_\w+\/)\”/g ) {


and then wherever you have:


change it to:

” . “filename.xml”



Pretty sure that’s it.  Goodness, I thought the world had ended for a bit, but it’s all good now …