… especially when the source contains odd descriptions.
One thing I have to do with my parser is to decipher text descriptions of base-running plays. It’s clunky but it works. Except when this sort of thing appears:
<action b=”0“ s=”1“ o=”0“ des=”With Welington Castillo batting, Michael Bourn advances to 3rd base on a caught stealing error by Tommy Joseph, assist to pitcher Adam Morgan to third baseman Maikel Franco to second baseman Cesar Hernandez to third baseman Maikel Franco. “ des_es=”Con Welington Castillo bateando, Michael Bourn atrapado robando, avanza a 3ra por error de Tommy Joseph, asistencia para lanzador Adam Morgan a tercera base Maikel Franco a segunda base Cesar Hernandez a tercera base Maikel Franco. “ event=”Caught Stealing 3B“ event_es=”Retirado en Intento de Robo 3B“ tfs=”003748“ tfs_zulu=”2016-06-18T00:37:48Z“ player=”456422“ pitch=”1“ event_num=”300“ play_guid=”3bc5d844-1d09-46b7-8d49-8365b8466403“ home_team_runs=”2“ away_team_runs=”3“/>
This is from the 5th inning of the PHI-ARI game on June 17. Bourn actually winds up at 2nd base when the ball is dropped there. Officially, he is caught stealing, but he doesn’t wind up at 3rd. So, I have to go into the original xml file and edit out the offending passage in order to generate the correct transition state change.
Looks like I’m going to have to live with this sort of thing for now. And it’s good motivation to rewrite the parser so that it can ignore text descriptions and just use the information from the ‘runner’ child elements in atbats … assuming that everything else is correct.
The bright side is that it’s only one event out of almost 200,000 …. btw I’m down to the last 20 differences.