It’s back to Hack28.
I’ve been working on this for some time. I wanted to be able to generate an events list throughout the season using the data from MLB’s Gameday XML files – instead of having to wait for Retrosheet at the end of the season.
I’ve managed to get results that are very close to agreeing with Retrosheet (gotta figure that they’re right and mine are wrong) but had a difference of 62 events out of a total of ~189700. So I made a massive spreadsheet (over 7 million cells) of the two datasets side-by-side and looked for lines that were different by totalling up the playerIDs of the baserunners.
The way to make the script generate results similar to Retrosheet is to change the atbat number in the xml file. But this still left me with 62 additional events. Turns out that every time Pat Venditte changed his throwing hand, it was an event.
Now, I have to either edit my results or add a line to my script to treat this special case.
I did it the hard way first and I’m here to tell you that Pat Venditte switched his throwing hand 62 times and I’ve found and deleted every damn one of them! aarrgghh :-þ
But now I’ve got an event list that has the same total number of events as Retrosheet. … and all I have to do is add a couple of lines to the script for next season’s data collection.