Parsing is hard work …

… especially when the source contains odd descriptions.

One thing I have to do with my parser is to decipher text descriptions of base-running plays.  It’s clunky but it works.  Except when this sort of thing appears:

<action b=”0 s=”1 o=”0 des=”With Welington Castillo batting, Michael Bourn advances to 3rd base on a caught stealing error by Tommy Joseph, assist to pitcher Adam Morgan to third baseman Maikel Franco to second baseman Cesar Hernandez to third baseman Maikel Franco. des_es=”Con Welington Castillo bateando, Michael Bourn atrapado robando, avanza a 3ra por error de Tommy Joseph, asistencia para lanzador Adam Morgan a tercera base Maikel Franco a segunda base Cesar Hernandez a tercera base Maikel Franco. event=”Caught Stealing 3B event_es=”Retirado en Intento de Robo 3B tfs=”003748 tfs_zulu=”2016-06-18T00:37:48Z player=”456422 pitch=”1 event_num=”300 play_guid=”3bc5d844-1d09-46b7-8d49-8365b8466403 home_team_runs=”2 away_team_runs=”3/>

This is from the 5th inning of the PHI-ARI game on June 17.  Bourn actually winds up at 2nd base when the ball is dropped there.  Officially, he is caught stealing, but he doesn’t wind up at 3rd.  So, I have to go into the original xml file and edit out the offending passage in order to generate the correct transition state change.

Looks like I’m going to have to live with this sort of thing for now.  And it’s good motivation to rewrite the parser so that it can ignore text descriptions and just use the information from the ‘runner’ child elements in atbats … assuming that everything else is correct.

The bright side is that it’s only one event out of almost 200,000 …. btw I’m down to the last 20 differences.

Published by

cap56cruncher

Long-time resident of London, Ontario - with an all-too-short diversion to Quebec City. Married to my best friend for 35 years and counting, and proud father of the five nicest kids on the face of the planet.

Leave a Reply