I’m interested in simulating batting orders to compare the results with the actual orders used by major league teams. There are two reasons: first is the hope that testing will confirm that the simulator works reasonably well, second is to generate orders that can be used to resimulate the season after the fact as another way to test function and to make improvements.
I retrieve gameday files every day of the season, so I have an events file that lists the batter. I queried the database table to copy the list of events to excel and then used MATCH to find the first batter for each team in each game. From there, I looked for the first nine batters assuming that this was the batting order. Results were pooled and a list of all batters who played for each team during the season was generated. Some more excel logic tabulated the number of times each batter batted in each position (1-9) in the batting order. There are more players involved than you might think, and more variation from a set order as well. The result for each team is a table of about 20 players x the nine spots in the order.
The next step is to determine the consensus order for each team. First, players with only a few games can be eliminated. Then it’s a matter of finding the maximum sum path through the array. You can eyeball this, but to be sure it’s better to use the computer. The problem is the immense number of combinations that need to be tested. And there is no method on the web that does the job either … at least I couldn’t find one. So, I decided to figure it out.
A list of 12 players x nine spots in the order yields 12^9 possible (5,159,780,352) combinations. Even with eight processors, it’s going to take a very long time … and that’s just for one team. So more players need to be eliminated. I sorted each list and took the player who started the most games at each position … giving me a list of nine players per team. The 9 x 9 array still has 387,420,489 combinations but this can be reviewed in about 20 minutes.
That’s still too long though as that’s just for one team. Then I realized that there were a great many combinations that did not need to be tested, i.e. all the ones with duplicate positions. The problem was how to do that programmatically. Then it hit me: what I really wanted to test is the 362,880 (9!) permutations for 9 players. I realized that if I could generate a list of these permutations, I could use the numbers as coordinates to indicate which array element was needed.
So, what I needed was a list of permutations like this:
1 2 3 4 5 6 7 8 9
1 2 3 4 5 6 7 9 8
1 2 3 4 5 6 9 7 8
1 2 3 4 5 6 9 8 7
1 2 3 4 5 9 6 7 8
9 8 7 6 5 4 3 2 1
i.e. all the combinations of 1-9 where no number is repeated. The problem was: how do I generate this list. I tried a few things and then decided to poke around the web. What a struggle … nothing was straightforward, and nothing was going to give me the list I wanted … until I stumbled across TextMechanic. This site has a whole host of text manipulation tools and among them is a permutations generator. Now, I have my list.
All I had to do was revise the R script to use each row of the permutations file as a list of array elements. Now I can find the maximum sum path through a 9 x 9 array in about 7 seconds.
Here is the R script that finds the path … all it takes is about 30 lines.
team.csv is a list of batting order spots by the number of times each player hit in that spot.
output.csv is the list of permutations obtained by using the app at TextMechanic.com
The result for the Jays is:
3 7 4 9 6 5 2 8 1
Looks like this:
|name||position||spot in batting order|