A BIPEvIface object represents the history of the state of an in-play event as
a list of incidents that themselves represent an event in the game (such as a
goal being scored). It provides three operations*:
- get-state: which returns the current state
- add-incident i: which appends an incident `i` to the history
- set-state s: which appends incidents such the the current state becomes `s`
Each different "kind" of sport has its own BIPEvSport subclass, so we have a
class for goal-based, set-based, and innings-based sports etc. Each of these
has a unique implementation of the above operations with near-zero code reuse
between them which has caused the file to bloat to a whopping greater-than
ten-thousand lines!
Consider a three-set tennis match, we can model it as an FSM as follows:
2-0
1-0 < 2-1
0-0 < 1-1 <
0-1 < 1-2
0-2
This is equivalent to the following function, which returns a dict of the
incidents that cause transitions from state (s1, s2) and the state to
transition to:
>>> def tennis_set_transitions((s1, s2)):
... if s1 == 2 or s2 == 2:
... return {}
... else:
... return {1: (s1 + 1, s2), 2: (s1, s2 + 1)}
>>> tennis_set_transitions.initial = 0, 0
>>> tennis_set_transitions((0, 0))
{1: (1, 0), 2: (0, 1)}
1 and 2 represent player 1 and player 2 winning the set respectively.
We can build all of get-state, add-incident and set-state from this function
and its initial state:
>>> def get_state(history):
... state = tennis_set_transitions.initial
... for incident in history:
... state = tennis_set_transitions(state)[incident]
... return state
>>> get_state([])
(0, 0)
>>> get_state([1, 2])
(1, 1)
>>> def add_incident(history, incident):
... return history + [incident]
>>> add_incident([], 1)
[1]
set-state is a little less trivial, we start with the simple case of moving to
a state exactly one incident away:
>>> def set_state(history, target):
... current = get_state(history)
... transitions = tennis_set_transitions(current)
... for incident, next in transitions.iteritems():
... if next == target:
... return add_incident(history, incident)
... raise Exception("Unreachable")
>>> set_state([1], (2, 0))
[1, 1]
>>> set_state([], (0, 1))
[2]
>>> set_state([], (2, 0))
Traceback (most recent call last):
...
Exception: Unreachable
Moving through multiple states is simply a search of a tree (actually a
potentially cyclic graph in the general case):
>>> def set_state(history, target):
... open = {(get_state(history), ())}
... closed = set()
... while open:
... current, path = open.pop()
... if current == target:
... return history + list(path)
... else:
... closed.add(current)
... transitions = tennis_set_transitions(current)
... for incident, next in transitions.iteritems():
... if next not in closed:
... open.add((next, path + (incident,)))
... raise Exception("Unreachable")
>>> set_state([1], (2, 0))
[1, 1]
>>> set_state([], (2, 0))
[1, 1]
>>> set_state([], (1, 1))
[2, 1]
>>> set_state([], (3, 0))
Traceback (most recent call last):
...
Exception: Unreachable
This last result is iffy, there are two paths from (0, 0) to (1, 1): [1, 2] and
[2, 1] but we have arbitrarily picked one of them. This is more-or-less the
behavior of the current implementation, but I think it would be preferable to
return all the paths that lead to the target state so that we know that we
cannot rely on the order of the incidents.
We will represent the set of alternatives as a list of the possibilities, for
example:
>>> set_state([], (1, 0))
[[[1]]]
>>> set_state([], (1, 1))
[[[1, 2], [2, 1]]]
>>> set_state([[[1]]], (2, 1))
[[[1]], [[2, 1]]]
These histories represent "just 1", "either 1, 2 or 2, 1" and "1 followed by
2, 1"; the last history would ideally be written as [[[1, 2, 1]]]. This
change to the history structure requires us to rewrite all the functions so
far:
>>> def get_state(history):
... state = tennis_set_transitions.initial
... for group in history:
... # All alternatives must lead to the same state.
... for incident in group[0]:
... state = tennis_set_transitions(state)[incident]
... return state
>>> def add_incident(history, incident):
... return history + [[incident]]
>>> def set_state(history, target):
... open = {(get_state(history), ())}
... # Note that without closed we cannot deal with infinite transitions
... # such as in a game that reaches a deuce. We can solve this by
... # detecting cycles.
... successes = set()
... while open:
... current, path = open.pop()
... if current == target:
... successes.add(path)
... else:
... transitions = tennis_set_transitions(current)
... for incident, next in transitions.iteritems():
... open.add((next, path + (incident,)))
... if successes:
... return simplify(history + [map(list, successes)])
... else:
... raise Exception("Unreachable")
>>> def simplify(history):
... if len(history) < 2:
... return history
... elif len(history[0]) == 1 and len(history[1]) == 1:
... return simplify([[history[0][0] + history[1][0]]] + history[2:])
... else:
... return [history[0]] + simplify(history[1:])
>>> set_state([], (1, 0))
[[[1]]]
>>> set_state([], (1, 1))
[[[1, 2], [2, 1]]]
>>> set_state([[[1]]], (2, 1))
[[[1, 2, 1]]]
>>> set_state([], (2, 1))
[[[1, 2, 1], [2, 1, 1]]]
This iteration of set-state is still incomplete because it fails to terminate
if we have to consider a graph with cycles as would be the case in a game with
a deuce. Also we should prune the search tree using knowledge of how the
components of the state change as we transition, either increasing, decreasing,
or both; e.g. if our target is (0, 2) there's no need to look past (1, 0).
The notion of alternative paths is equivalent to BIPEvIface's concept of
incidents that are tentative in order, a property used in resulting that needs
to be explicitly set instead of being derived as it is here.
If we extend simplification to extract common subsequences we are able to
identify incidents that are not tentative in order, despite being inserted by
set-state.
>>> def simplify(history):
... if len(history[0]) > 1:
... subseq = subsequences(history[0])
... if subseq[0] != history[0]:
... return simplify(subseq + history[1:])
... if len(history) < 2:
... return history
... elif len(history[0]) == 1 and len(history[1]) == 1:
... return simplify([[history[0][0] + history[1][0]]] + history[2:])
... else:
... return [history[0]] + simplify(history[1:])
>>> def subsequences(group):
... iters = map(iter, group)
... common_streak = None
... sequences = []
... streak = []
... try:
... while True:
... values = map(next, iters)
... common = all(v == values[0] for v in values)
... if common_streak is None or common_streak == common:
... if common:
... streak.append(values[0])
... else:
... streak.append(values)
... else:
... if common_streak:
... sequences.append([streak])
... streak = [values]
... else:
... sequences.append(map(list, zip(*streak)))
... streak = [values[0]]
... common_streak = common
... except StopIteration:
... if common_streak:
... sequences.append([streak])
... else:
... sequences.append(map(list, zip(*streak)))
... remaining = map(list, iters)
... if any(map(len, remaining)):
... sequences.append(remaining)
... return sequences
>>> set_state([], (2, 1))
[[[1, 2], [2, 1]], [[1]]]
We can see that the last incident is not tentative because it is not part of an
alternation. Again this algorithm needs improvement when infinite sequences
are possible, e.g. we'd want to match the two 1 incidents at the end given we
are in a deuce that player 1 wins.
Time-based sports can also be viewed as state machines, but with a transition
every unit of time. Consider this model of football goals in minute-time:
>>> def football_goal_transitions((g1, g2, t)):
... if t == 5:
... return {}
... else:
... return {
... None: (g1, g2, t + 1),
... 1: (g1 + 1, g2, t + 1),
... 2: (g1, g2 + 1, t + 1),
... }
... football_goal_transitions.initial = 0, 0, 1
>>> set_state([], (1, 0, 4))
[[[None, 1, None], [1, None, None], [None, None, 1]]]
>>> set_state([], (1, 1, 4))
[[[2, None, 1], [1, None, 2], [None, 2, 1], [2, 1, None], [None, 1, 2], [1, 2, None]]]
We have had to restrict our model to the first five minutes because without the
optimization that prunes the search tree it will take a long time to probe all
the states. We can also see that this model of time causes the number of
alternatives to explode. To prevent this we can push the concept of time into
each alternation group and make `None` transitions implicit, e.g. the above
would become:
[((1, 4), [[1]])]
[((1, 4), [[1, 2], [2, 1]])]
As in "between minute 1 and 4 home scored" and "between minute 1 and 4 either
home scored, then away or away scored then home".
Finally we can represent an alternation group that is all permutations of [i0,
i1, ..., iN] as P[i0, i1, ..., iN]:
[((1, 4), P[1, 2])]
Some incidents have additional attributes that don't affect the transitions,
for example goal incidents may have player and method. We can include this
information in the history, but discard it for the purposes of get-state and
set-state.
So far we have considered the entire event to be a single state machine. This
works even when we consider "sub-scores" such as games in tennis because we can
expand the state from (s1, s2) to (s1, s2, g1, g2), but then we lose the
ability to prune the search tree because g1 and g2 will decrease when a set is
won. Instead we can treat these lesser scores as sub-state machines.
Each state either has transitions mapping incidents to the next state, or a
sub-machine and transitions mapping accepting states to the next state. For
example a 3-set, 3-game tennis match:
>>> def transition_game((g1, g2), i):
... g1n = g1 + (i == 1)
... g2n = g2 + (i == 2)
... accept = g1n == 2 or g2n == 2
... return (g1n, g2n), accept, None
... transition_game.initial = (0, 0), False, None
>>> def transition_set((s1, s2), (g1, g2)):
... s1n = s1 + (g1 > g2)
... s2n = s2 + (g2 > g1)
... accept = s1n == 2 or s2n == 2
... return (s1n, s2n), accept, not accept and transition_game
... transition_set.initial = (0, 0), False, transition_game
>>> def get_state(history):
... states, fsms = initial(transition_set)
... for group in history:
... for incident in group[0]:
... states, fsms = next_state(states, fsms, incident)
... return states
>>> def initial(fsm):
... state, _, sub = fsm.initial
... if sub:
... states, fsms = initial(sub)
... return [state] + states, [fsm] + fsms
... else:
... return [state], [fsm]
>>> def next_state(states, fsms, incident):
... if states and fsms:
... next, accept, sub = fsms[-1](states[-1], incident)
... assert not accept or not sub, (accept, sub)
... states = states[:-1] + [next]
... if accept:
... return next_state(states[:-1], fsms[:-1], next)
... elif sub:
... substates, subfsms = initial(sub)
... return states + substates, fsms + subfsms
... else:
... return states, fsms
... else:
... return [], []
>>> get_state([])
[(0, 0), (0, 0)]
>>> get_state([[[1]]])
[(0, 0), (1, 0)]
>>> get_state([[[1, 1]]])
[(1, 0), (0, 0)]
>>> get_state([[[1, 1, 1, 1]]])
[]
So far we have neglected to mention that BIPEvIface collects subtotals about
the state, e.g. for football it stores the scores in each period in addition to
the whole match. We are able to use our model of substates to extract all the
subtotals you could ever need:
>>> def get_state(history):
... states, fsms = initial(transition_set)
... subtotals = {}
... for alternative in history:
... groups = defaultdict(list)
... for g, group in enumerate(alternative):
... gstates, gfsms = states[:], fsms[:]
... for incident in group:
... for i in xrange(len(gstates)):
... groups[(g, tuple(gstates[:i]))].append(incident)
... gstates, gfsms = next_state(gstates, gfsms, incident)
... subtotals = combine(subtotals, groups)
... states, fsms = gstates, gfsms
... return states, subtotals
>>> def combine(subtotals, groups):
... by_state = defaultdict(list)
... for (g, state), incidents in groups.iteritems():
... by_state[state] += [[incidents]]
... new = {s: map(list, zip(*i)) for s, i in by_state.iteritems()}
... states = subtotals.viewkeys() | new.viewkeys()
... return {s: subtotals.get(s, []) + new.get(s, []) for s in states}
>>> state, subtotals = get_state([[[1, 1, 2, 2]]])
{((1, 0),): [[[2, 2]]], (): [[[1, 1, 2, 2]]], ((0, 0),): [[[1, 1]]]}
>>> state, subtotals = get_state([[[1, 2], [2, 1]], [[1]]])
>>> subtotals
{(): [[[1, 2], [2, 1]], [[1]]], ((0, 0),): [[[2, 1], [1, 2]], [[1]]]}
These subtotals list the incidents that occurred for each state in a format
similar to the history. We can see that both in the whole match and in the
first set we either had (1, 2, 1) or (2, 1, 1). If we request the winner of
the final game we would get 1, whereas the first game reports 1 or 2 because
the ordering was tentative. Despite the ordering being tentative the counts
of the incidents is known in this case, there has been two 1s and one 2:
>>> def nth(subtotal, n):
... lengths = [len(a[0]) for a in subtotal] + [0]
... i = 0
... while n >= lengths[i]:
... n -= lengths[i]
... i += 1
... return subtotal[i][n]
>>> def counts(subtotal):
... alternatives = [map(Counter, a) for a in subtotal]
... unified = [reduce(add, p) for p in product(*alternatives)]
... unique = []
... for count in unified:
... if count not in unique:
... unique.append(dict(count))
... return unique
>>> def is_tentative(value):
... return len(value) > 1
>>> nth(subtotals[((0, 0),)], 0)
[2, 1]
>>> is_tentative(nth(subtotals[((0, 0),)], 0))
True
>>> nth(subtotals[((0, 0),)], 2)
[1]
>>> is_tentative(nth(subtotals[((0, 0),)], 2))
False
>>> counts(subtotals[((0, 0),)])
[{1: 2, 2: 1}]
>>> is_tentative(counts(subtotals[((0, 0),)]))
False
Obviously counts is never tentative for this model of tennis, but once we
introduce points there is possibility for an infinite number of potential
counts for both players scores. As before we will have to revisit these
functions to support cyclic states.
Finally we will want to record the transitions that occurred as a result of
a sub-machine accepting for those states so that we can work with less granular
data. This will be useful when working with more than two layers of nesting,
in tennis we would record games won in a set this way, and points by counting
the incidents.
At this point we have the basis of an elegant, sport-agnostic replacement for
the heart of BIPEvIface.
* in addition to "get_resulting_state" and collecting subtotals (e.g. scores
for each period).