Introduction

In this chapter, I review how to extract information on a player's build strategy. In accordance with this review I then define the build_parser module. This module contains functions that developers can use to parse the players' build orders, and to inventory their army, base and technology upgrades.

Exportable Members

Builds-orders

A crucial part of how players execute a strategy is how they construct their bases and their armies. Currently, in the game, players can review the order in which the game participants build their first 40 elements once a match is over. These build-orders include buildings (built, upgraded, expanded), units (trained, mutated, warped, merged), and any tech researched.

Since this package is meant to build player profiles based on the players' performance indicators, I need to find a way to capture the overall composition of the players' bases and armies. Moreover, afterwards, I must be able to generalise this composition data to build the players profiles.

Initially, my first instinct was to store the build orders as a time series. I thought of recording a sequence of snapshots, evenly spaced over time, that marked the time and order in which each element entered the game. However, storing the build-orders as a time series is too detailed and is not suited to practical generalisation. Hence, I am opting to extract units, buildings and research compositions at four different game intervals, i.e. whole, early, mid, and late games. These intervals match the measures that I take for macroeconomic indicators in <<Chapter 3 - Parsing Macroeconomic Indicators>>.

This section explores the factors I need to consider to define a set of functions that users can call to extract the player's elements compositions at the different game intervals and other related indicators.

Listing a player's elements

In any case, the first step to parse any indicators related to the build order is to obtain a list of the player's elements.

In the following code, I use the Player object's units attribute to extract a list of all these elements owned by the player during the match. I also use a list of all the units I want to include in my analysis (i.e. UNIT_NAMES) to filter unwanted units. For example, I exclude Zerg larvae from my analysis because they are generated automatically by the game with little control from the player.

Note: In this example, I exclude some units that change their play-state during the game (e.g. burrowed troops). I exclude these units for the time being to focus on the game’s basic unit types. Later, I expand on some considerations related to this stage changes that I need to account for in the module’s composition_df helper function to generate an accurate unit count.

Allow me to describe the code step by step to clarify the ideas behind this module. First, I will define multiple constant values that I will use through the module's development to filter various data characteristics. These constants store information from multiple data files saved in this project's data folder.

The constants include:

UNIT_NAMES: list of names for all the player-controllable units (buildings or troops) in the game. This list only contains one name per unit and excludes the various states a unit can have.
RACE_ARMIES: list of controllable troops separated by race; excludes workers and structures.
RACE_BUILDINGS: list of controllable structures separated by race.
RACE_UPGRADES: list of tech updates that players can research during a match. The list excludes any default upgrades that players do not directly trigger.

import sc_training.ingest.build_parser as bp
Path(bp.__file__)

Path('c:/Users/david/Documents/phdcode/sc_training/sc_training/ingest/build_parser.py')

After loading these data, I load multiple sample replays. In this case, I need to use a larger pool of test cases to ensure that I am considering all the different types of game units and build variations.

rps_path = Path("./test_replays")

# single_replay is the base case I use to develop the functions 
# in this module.
single_replay = sc2reader.load_replay(str(rps_path/"Jagannatha LE.SC2Replay"))

# The following replays have various compositions of races, armies, 
# structures, tech updates and other game elements that allow me to test 
# and debug the module's functions.
sing_zerg = sc2reader.load_replay(str(rps_path/"Oxide LE (14).SC2Replay"))
sing_protoss= sc2reader.load_replay(str(rps_path/"Oxide LE (13).SC2Replay"))
zustates = sc2reader.load_replay(str(rps_path/'zustates.SC2Replay'))
tustates = sc2reader.load_replay(str(rps_path/'tustates.SC2Replay'))
tfly = sc2reader.load_replay(str(rps_path/'terranfly.SC2Replay'))

# I store some basic variables out of the test case replay to make the  
# sample code more readable.
match_events = [event for event in single_replay.events]
rpl_duration = single_replay.length.seconds
rpl_rec_duration = match_events[-1].second
rpl_fps = single_replay.game_fps

With this setup in place, I can proceed to extract a list of all the units owned by a player during the course of the game.

Note: In this first example, I concetrate on player 2 of the sing_zerg replay, because the initial unit list of zerg players has some particularities that I want to ilustrate straight away.

p2_units = [u for u in sing_zerg.player[2].units 
            if u.name.lower() in UNIT_NAMES]

# Extract and print a sample containing the first 20 units owned by the
# player during the game for examination.
sample_prints = [f'{ind+1:<3} Name: {u.name:<15} \
                Start: {u.started_at:>5.0f} End: {str(u.died_at):>5}' 
                for ind, u in enumerate(p2_units[:20])]

for string in sample_prints:
    print(string)

1   Name: Lair                            Start:     0 End:  None
2   Name: Drone                           Start:     0 End:  5184
3   Name: Drone                           Start:     0 End:  3530
4   Name: Drone                           Start:     0 End:  2046
5   Name: Drone                           Start:     0 End:  None
6   Name: Drone                           Start:     0 End: 13011
7   Name: Drone                           Start:     0 End:  2739
8   Name: Drone                           Start:     0 End:  None
9   Name: Drone                           Start:     0 End:  None
10  Name: Drone                           Start:     0 End:  None
11  Name: Drone                           Start:     0 End:  None
12  Name: Drone                           Start:     0 End:  2226
13  Name: Drone                           Start:     0 End:  4969
14  Name: Overlord                        Start:     0 End:  None
15  Name: Overlord                        Start:   594 End:  None
16  Name: Drone                           Start:   675 End:  None
17  Name: Drone                           Start:   678 End:  None
18  Name: Drone                           Start:   976 End:  None
19  Name: Drone                           Start:   987 End:  None
20  Name: SpawningPool                    Start:  1186 End:  None

The example above shows how each player's unit list begins with thirteen or fourteen starting units. For instance, players who play with Protoss or Terran start with thirteen units (one main base plus twelve workers). However, if they play with Zerg, they begin with fourteen units (one main base, twelve workers, and one overlord). Because the players do not build these elements, I do not count them as part of their build strategy.

The following code extracts the unit lists for both players in the sample match, ignoring the starting units. In the examples, I print their first ten elements to show the difference from the previous list. Note that none of the elements has a starting time of 0.

p1_race = sing_zerg.player[1].play_race
p1_units = [u for u in sing_zerg.players[0].units 
            if u.name.lower() in UNIT_NAMES]


# Extract a sub-list of all elements excludind the starting elements 
# according to the player's race.
p1_units_no_inits = p1_units[(13 if p1_race != 'Zerg' else 14):]


# Print the new list of elements to show they skip the starting elements.
p1_unit_list_print = [
        (f'{ind+1:<3} Name: {u.name:<15}' \
        + f'Start: {u.started_at:>5.0f} End: {str(u.died_at):>5}') 
        for ind, u in enumerate(p1_units_no_inits[:10])]

for string in p1_unit_list_print:
    print(string)

1   Name: SCV            Start:   305 End:  None
2   Name: SCV            Start:   576 End:  None
3   Name: SCV            Start:   847 End:  None
4   Name: SCV            Start:  1239 End:  None
5   Name: Refinery       Start:  1367 End:  None
6   Name: Refinery       Start:  1385 End:  None
7   Name: Barracks       Start:  1416 End:  None
8   Name: SCV            Start:  1580 End:  None
9   Name: SCV            Start:  1851 End:  None
10  Name: SCV            Start:  2122 End:  None

# This second player is playing with Zerg. Hence, they have more
# starting units.
p2_race = sing_zerg.player[2].play_race
p2_units_no_inits = p2_units[(13 if p2_race != 'Zerg' else 14):]

p2_unit_list_print = [
        (f'{ind+1:<3} Name: {u.name:<15}'
        + f'Start: {u.started_at:>5.0f} End: {str(u.died_at):>5}') 
        for ind, u in enumerate(p2_units_no_inits[:10])]

for string in p2_unit_list_print:
    print(string)

1   Name: Overlord       Start:   594 End:  None
2   Name: Drone          Start:   675 End:  None
3   Name: Drone          Start:   678 End:  None
4   Name: Drone          Start:   976 End:  None
5   Name: Drone          Start:   987 End:  None
6   Name: SpawningPool   Start:  1186 End:  None
7   Name: Drone          Start:  1223 End:  None
8   Name: Extractor      Start:  1566 End:  None
9   Name: Drone          Start:  1675 End:  None
10  Name: Drone          Start:  1878 End:  None

Alternative Implementation with `UnitTrackerEvents`

A different way to generate these lists is to use the Replay's UnitBornEvent, UnitInitEvent, UnitDoneEvent and UnitTypeChangeEvent instances. This second approach offers access to the unit's spawning time through the event's second attribute. However, it also means having to consolidate four discrete lists with overlapping data.

In the code below, I collect the events that store information on player 2's unit-spawning for the same sample Replay used in the previous examples. I split this information into four lists, according to different types of TrackerEvents. With these lists, I can review some of their differences and similarities. This comparison shows the information developers could use to separate them if need be.

p1_uborn_e = [event for event in sing_zerg.events 
            if isinstance(event, sc2reader.events.tracker.UnitBornEvent) 
            and event.control_pid == 1
            and event.unit.name.lower() in UNIT_NAMES]
p1_uinit_e = [event for event in sing_zerg.events 
            if isinstance(event, sc2reader.events.tracker.UnitInitEvent) 
            and event.control_pid == 1
            and event.unit.name.lower() in UNIT_NAMES]
p1_udone_e = [event for event in sing_zerg.events 
            if isinstance(event, sc2reader.events.tracker.UnitDoneEvent) 
            and event.unit.owner.pid == 1
            and event.unit.name.lower() in UNIT_NAMES]
p1_uchange_e = [event for event in sing_zerg.events 
            if isinstance(event, sc2reader.events.tracker.UnitTypeChangeEvent) 
            and event.unit.owner.pid == 1
            and event.unit.name.lower() in UNIT_NAMES]

print(f'Units owned during the match: {len(p1_units)}')
print(f'UnitsBorn: {len(p1_uborn_e)} Init: {len(p1_uinit_e)} \
Done: {len(p1_udone_e)} Change: {len(p1_uchange_e)}')

Units owned during the match: 138
UnitsBorn: 106 Init: 32 Done: 31 Change: 28

Next, I will use various set operationsto illustrate the relation between the lists. For example, the following code shows that, at least in this case, the union of the units linked to the Replay's UnitBornEvent and UnitInitEvent is the same as the list of units linked directly to the player.

p1_u_names = [u.id for u in p1_units]
p1_u_born = [u.unit.id for u in p1_uborn_e]
p1_u_init = [u.unit.id for u in p1_uinit_e]

set(p1_u_names) == (set(p1_u_born).union(set(p1_u_init)))

True

Meanwhile, looking at the intersections between the lists, I can see some overlap between them. See the code below.

u_init_done_intersection = set([e.unit.id for e in p1_udone_e]) \
                           & set([e.unit.id for e in p1_uinit_e])

print(f'{len(u_init_done_intersection)} where initialised and done.')

u_init_done_diff = set([e.unit.id for e in p1_uinit_e]) \
                        - set([e.unit.id for e in p1_udone_e])

print(f'{len(u_init_done_diff)} was destructed before completion.')

31 where initialised and done.
1 was destructed before completion.

u_born_change_intersection = set([e.unit.id for e in p1_uborn_e]) \
                             & set([e.unit.id for e in p1_uchange_e])

print(f'{len(u_born_change_intersection)} changed during the game.')

1 changed during the game.

u_init_change_intersect = set([e.unit.id for e in p1_uinit_e]) \
                             & set([e.unit.id for e in p1_uchange_e])
u_done_change_intersect = set([e.unit.id for e in p1_udone_e]) \
                             & set([e.unit.id for e in p1_uchange_e])

print(f'{len(u_init_change_intersect)} where initialised and change')
print(f'{len(u_done_change_intersect)} complete their building and changed')

8 where initialised and change
8 complete their building and changed

This overlap means that, while building a player's unit list from these events may be possible, it may be impractical compared to the first possibility. Still, I can learn how to extract the times for each unit's life stages (building initiation and completion, state change or death) during the match from the information contained in these events.

For instance, take the case of UnitBornEvents. The frame at which these events are executed is equal to the start (building initiation) and finished (building completion) frames recorded by the units linked to them.

Additionally, this data shows that the quotient of the units recorded birth frame (i.e. its finished_at attribute) and the replay's registered frames-per-second (i.e. Replay.game_fps) is equal to the UnitBornEvent's recorded execution time in seconds. I can convert this time into the real-time index using the calc_realtime_index function defined in <<Chapter 2 - Handling Tracker Events>>.

match_fps = sing_zerg.game_fps
[(f'UName:{e.unit.name:<7} e_rec_sec:{e.second:>7.0f}  '
    + f'U_time_quotient:{e.unit.finished_at//match_fps:>7.0f}')
for e in p1_uborn_e][15:20]

['UName:SCV     e_rec_sec:     52  U_time_quotient:     52',
 'UName:SCV     e_rec_sec:     77  U_time_quotient:     77',
 'UName:SCV     e_rec_sec:     98  U_time_quotient:     98',
 'UName:SCV     e_rec_sec:    115  U_time_quotient:    115',
 'UName:SCV     e_rec_sec:    132  U_time_quotient:    132']

Similarly, the UnitInitEvent's recoded execution frame is the same as the unit's recorded started_at frame.

[(f'UName: {e.unit.name:<15}'
  + f'e_rec_frame: {str(e.frame):>4} '
  + f'U_rec_start_frame: {str(e.unit.started_at):>5}') 
for e in p1_uinit_e][-5:]

['UName: SupplyDepot    e_rec_frame: 12816 U_rec_start_frame: 12816',
 'UName: SupplyDepot    e_rec_frame: 12835 U_rec_start_frame: 12835',
 'UName: SupplyDepot    e_rec_frame: 12863 U_rec_start_frame: 12863',
 'UName: SupplyDepot    e_rec_frame: 12966 U_rec_start_frame: 12966',
 'UName: Bunker         e_rec_frame: 13796 U_rec_start_frame: 13796']

Meanwhile, in the case of the UnitDoneEvents, the event's recoded execution frame is the same as the unit's recorded finish_at frame.

Note: if a unit is killed before it finishes construction, it will not generate an UnitDoneEvent. Nevertheless, it still appears as part of the player’s units list. Look at the last unit in the previous list. It is absent from the next.

[(f'UName: {e.unit.name:<15}' 
    + f'Event_rec_frame: {e.frame:>7.0f} '
    + f'U_rec_finish_frame: {e.unit.finished_at:>7.0f}')
for e in p1_udone_e][-5:]

['UName: SupplyDepot    Event_rec_frame:   13315 U_rec_finish_frame:   13315',
 'UName: Armory         Event_rec_frame:   13333 U_rec_finish_frame:   13333',
 'UName: SupplyDepot    Event_rec_frame:   13343 U_rec_finish_frame:   13343',
 'UName: Armory         Event_rec_frame:   13391 U_rec_finish_frame:   13391',
 'UName: SupplyDepot    Event_rec_frame:   13446 U_rec_finish_frame:   13446']

Meanwhile, I can demonstrate the relation between initiated and completed units examining another match to show the links between UnitInitEvent, UnitDoneEvent and a player's units list.

For instance, the following code shows that three units get started but are never completed.

uinit_e = [event for event in single_replay.events 
            if isinstance(event, sc2reader.events.tracker.UnitInitEvent) 
            and event.control_pid == 2
            and event.unit.name.lower() in UNIT_NAMES]

udone_e = [event for event in single_replay.events 
            if isinstance(event, sc2reader.events.tracker.UnitDoneEvent) 
            and event.unit.owner.pid == 2
            and event.unit.name.lower() in UNIT_NAMES]

print(f'UnitsInit: {len(uinit_e)} UnitsDone: {len(udone_e)}')

UnitsInit: 24 UnitsDone: 21

The next one shows that these units can be identified in the player's units-list by the difference between the started_at and finished_at values.

incomplete_u = [
    (f'UNane: {u.name:<18} unitId: {u.id:<10.0f}' 
        + f'u_start_frame: {u.started_at:>8.0f}' 
        + f' u_finish_frame: {str(u.finished_at):>8}')
    for u in single_replay.players[1].units 
    if u.name.lower() in UNIT_NAMES
    and u.finished_at == None]

incomplete_u

['UNane: CommandCenter      unitId: 94109700  u_start_frame:    13111 u_finish_frame:     None',
 'UNane: Refinery           unitId: 92012546  u_start_frame:    13175 u_finish_frame:     None',
 'UNane: Refinery           unitId: 94633985  u_start_frame:    13213 u_finish_frame:     None']

Interestingly, the UnitDiedEvent list only includes these incomplete units' destruction if they are killed by another player, as shown by the following code.

p1_incomplet_units_ids = [f'{u.name}, {u.id} '
        for u in single_replay.player[1].units 
        if u.name.lower() in UNIT_NAMES
        and u.finished_at == None]

print(f'List of incomplete units on player 1\'s ', 
      f'units-list {p1_incomplet_units_ids}')


p1_udied_e = [f'{event.unit.name}, {event.unit.id}' 
            for event in single_replay.events 
            if isinstance(event, sc2reader.events.tracker.UnitDiedEvent)
            and event.unit.owner != None
            and event.unit.owner.pid == 1
            and event.unit.name.lower() in UNIT_NAMES
            and event.unit.finished_at == None]
print(f'List of units in UnitDiedEvent for player 1 {p1_udied_e}')

print('------------------------------------------------')

p2_incomplet_units_ids = [f'{u.name}, {u.id}'
        for u in single_replay.player[2].units 
        if u.name.lower() in UNIT_NAMES
        and u.finished_at == None]

print(f'List of incomplete units on player 2\'s ',
      f'units-list {p2_incomplet_units_ids}')


p2_udied_e = [f'{event.unit.name}, {event.unit.id}' 
            for event in single_replay.events 
            if isinstance(event, sc2reader.events.tracker.UnitDiedEvent)
            and event.unit.owner != None
            and event.unit.owner.pid == 2
            and event.unit.name.lower() in UNIT_NAMES
            and event.unit.finished_at == None]

print(f'List of units in UnitDiedEvent for player 2 {p2_udied_e}')

List of incomplete units on player 1's  units-list ['PhotonCannon, 78118917 ']
List of units in UnitDiedEvent for player 1 ['PhotonCannon, 78118917']
------------------------------------------------
List of incomplete units on player 2's  units-list ['CommandCenter, 94109700', 'Refinery, 92012546', 'Refinery, 94633985']
List of units in UnitDiedEvent for player 2 []

In the case above case, units player 2's incomplete units do not generate UnitDiedEvents becaused they were cancelled, not killed. This is the reason why player 2's UnitDiedEvent list is empty.

Counting Units With Multiple States

In the examples above, I have only counted units that remain in their primary state throughout the match for simplicity. However, in most games, some units will change states. I must consider this factor because of how sc2reader keeps track of the units.

For example, suppose I extract a list of all Zerg units' names in a match. In that case, I may notice that some Infestor units are counted as such, but others are counted as InfestorBurrowed.

Meanwhile, a similar operation around a Terran player's units shows units such as a SiegeTank have secondary stages like SiegeTankSieged. Similarly, a Hellion can also appear as a BattleHellion and a WidowMine as a WidowMineBurrowed.

zerg_player_units = [u.name for u in zustates.player[1].units
                    if u.is_army]
print('Sample Zerg unit set in zustates replay.')
pprint(set(zerg_player_units))

# Print set of terran units in a match
terran_player_units = [u.name for u in tustates.player[1].units
                    if u.is_army]
print('\nSample Terran unit set in tustates replay.')
pprint(set(terran_player_units))

Sample Zerg unit set in zustates replay.
{'Baneling',
 'Hydralisk',
 'Infestor',
 'InfestorBurrowed',
 'Lurker',
 'Overlord',
 'Overseer',
 'Queen',
 'Ravager',
 'Roach',
 'Zergling'}

Sample Terran unit set in tustates replay.
{'BattleHellion',
 'Hellion',
 'Marauder',
 'Marine',
 'Medivac',
 'SiegeTank',
 'SiegeTankSieged',
 'Thor',
 'WidowMine',
 'WidowMineBurrowed'}

However, an army composition should count these units in different stages as the same. Thus I need to account for how sc2reader stores the unit according to the state in which they finished or exited the game. In the following code, I use a unit types list and several conditions to demonstrate how one can filter the initial list. I also build a DataFrame with a Unit column that records the same name for units of the same type in multiple states to normalise the unit classification.

Note that, if I count the units based on the type recorded by sc2reader, the count includes the different unit states. Meanwhile, the normalised count adds the units of the same type that are in different states.

terran_player_units = [(uname, u, u.id) for u in tustates.player[1].units 
                    for uname in RACE_ARMIES['Terran']
                    if uname in u.name.lower() # Use the naming convention 
                                               # to get all units in 
                                               # different states
                    and u.is_army == True]


tpunits_df = pd.DataFrame({
        'Unit':[uname for uname, u, id in terran_player_units],
        'Uname': [u.name for uname, u, id in terran_player_units],
        'UnitID':[id for uname, u, id in terran_player_units]})

# print(tpunits_df.groupby('Uname').size().to_markdown())
# print(tpunits_df.groupby('Unit').size().to_markdown())

This table shows the count based on sc2reader type register.

Uname
BattleHellion	10
Hellion	6
Marauder	10
Marine	15
Medivac	3
SiegeTank	1
SiegeTankSieged	2
Thor	1
WidowMine	5
WidowMineBurrowed	7

This table shows the count based on the normalised names.

Unit
hellion	16
marauder	10
marine	15
medivac	3
siegetank	3
thor	1
widowmine	12

This same rule applies to buildings. However, in this case, there are two caveats when counting Terran buildings. First, TechLab and Reactor instances do not follow the same Unit/State naming convention as the other multi-state units. Instead, they follow the inverse pattern, State/Unit. Second, both TechLab and Reactor generate a double count. Firstly, they appear as themselves, and, secondly, they re-register the production buildings they expand (i.e. barracks, starports, and factories). In this case, they re-register these production buildings with the same hash-id that identifies them. Thus, when counting Terran buildings, I must re-filter the DataFrame to account for these anomalies. The following code illustrates this issue.

terran_player_buildings = [(uname, u, u.id) for u in tustates.player[1].units 
                    for uname in RACE_BUILDINGS['Terran']
                    if uname in u.name.lower() # Use the naming 
                                               # convention to get all units 
                                               # in different states
                    and u.is_building == True]

tbunits_df = pd.DataFrame({
        'Unit':[uname for uname, u, id in terran_player_buildings],
        'Uname': [u.name for uname, u, id in terran_player_buildings],
        'UnitID':[id for uname, u, id in terran_player_buildings]})

# print(tbunits_df[9:25].to_markdown())

Note: see the rows 9, 10, 14, 15, 20 and 21 of the table that results from the code above.

	Unit	Uname	UnitID
9	barracks	BarracksTechLab	67895297
10	techlab	BarracksTechLab	67895297
11	factory	Factory	68681729
12	supplydepot	SupplyDepot	71041026
13	supplydepot	SupplyDepot	74186754
14	factory	FactoryTechLab	74711042
15	techlab	FactoryTechLab	74711042
16	sensortower	SensorTower	76021761
17	refinery	Refinery	76546050
18	factory	Factory	77856769
19	refinery	Refinery	78381057
20	supplydepot	SupplyDepot	83099649
21	factory	FactoryReactor	83361793
22	reactor	FactoryReactor	83361793
23	armory	Armory	83623937
24	supplydepot	SupplyDepot	83886081

tbunits_df.drop_duplicates(subset='UnitID', keep='last', inplace=True) 

# Correct misslabeling of reactors     
tbunits_df.loc[tbunits_df['Uname'].str.contains('Reactor'), 'Unit'] = 'reactor'
tbunits_df.loc[tbunits_df['Uname'].str.contains('TechLab'), 'Unit'] = 'techlab'
# print(tbunits_df[8:25].to_markdown())

Note: in the following table the duplicates in lines 9, 14 and 21 have been filtered.

	Unit	Uname	UnitID
8	planetaryfortress	PlanetaryFortress	64749570
10	techlab	BarracksTechLab	67895297
11	factory	Factory	68681729
12	supplydepot	SupplyDepot	71041026
13	supplydepot	SupplyDepot	74186754
15	techlab	FactoryTechLab	74711042
16	sensortower	SensorTower	76021761
17	refinery	Refinery	76546050
18	factory	Factory	77856769
19	refinery	Refinery	78381057
20	supplydepot	SupplyDepot	83099649
22	reactor	FactoryReactor	83361793
23	armory	Armory	83623937
24	supplydepot	SupplyDepot	83886081
25	planetaryfortress	PlanetaryFortress	84672513
26	supplydepot	SupplyDepot	29097986
27	supplydepot	SupplyDepot	86507521

Functions

In this section, I develop the functions this module exports. These functions allow for the extraction of various performance indicators relative to the units trained, buildings built and upgrades researched by players through a match.

As is the case for other modules in this package, the exportable functions use several helper functions that can be consulted in the module's development notebooks or the module's source code. However, these helper functions are not included in this documentation.

Composition functions

The following functions generate lists of dictionaries that describe a player's army or buildings composition (count_composition) and the number of units that started training or buildings that started construction (count_started) during the whole match and through the early, mid and late games.

In this case, I define composition as the number of active units of different types a player has in the game. This count goes up every time a unit is created and down if they are killed. Meanwhile, count_started refers to the player's intended army, i.e. the number of units of different types they try to create at each interval of the game.

The two functions extract their information from a pandas.DataFrame generated by the helper function composition_df. This DataFrame includes each unit's type, the time they entered the game and their time of death. I illustrate this DataFrame's composition with a portion of the players' units during a sample match in the following table.

The following table shows a the tail of a sample DataFrame generated by calling composition_df helper function on the tfly replay.

	Unit	started_building	enter_game_time	died_time
44	marine	745.87	745.87	NaT
45	autoturret	748.883	748.883	759.0440051020407
46	marine	763.585	763.585	NaT
47	marine	763.81	763.81	NaT
48	autoturret	783.998	783.998	794.2940051020408

Similarly, the functions use the helper count_active_units function in conjunction with the composition_df's output to generate DataFrames that counts a player's units in a specific period of time.

The following are tables show the DataFrames that result from counting the units in the sample composition DataFrame.

Wole game table:

Unit	started	born	died	total
marauder	5	5	nan	5
marine	27	27	nan	27
medivac	5	5	nan	5
raven	1	1	nan	1

Early game:

Unit	started	born	died	total
marine	1	1	nan	1

Mid-game:

Unit	started	born	died	total
marine	12	12	nan	12

Late game:

Unit	started	born	died	total
marauder	5	5	nan	5
marine	14	14	nan	14
medivac	5	5	nan	5
raven	1	1	nan	1

After calculating a player's army composition or unit started counts, I need to format the output of the functions so that I can process them with the results of other matches.

In this regard, I considered two options. On the one hand, I could store counts for all units of all races for each player in every match. Following this approach, I would have a single set of replays for each player that would, by averaging all unit counts, express the general building preferences of each player. On the other hand, I could segregate the results by game race. This second option implies that I would have to keep three separate sets of replays per player. I would also have to process three profiles per player that express their preferences when playing each game race.

Although initially, I was inclined to opt for the first option, I decided on the second because it seems closer to the actual game experience. For example, in StarCraft II, players are classified separately in leagues when playing with different game races. Similarly, many of the game's achievements are repeated for each race. Thus, it felt more akin to the game experience to provide threes profiles. This second approach also means that each match's record will contain fewer blanc data points when processing the profiles, which safes storage and processing memory.

With this in mind, the last step of each module's functions is to complete their outcomes to include values for all the units or buildings of each player's race.

The following code demonstrates the result of the complete_count helper function as applyed to player 2's army composition for the whole game in the sample match.

army_count_df_whole = count_active_units(army_df, start = 0, end=700)
comp_test = complete_count([army_count_df_whole['total']], 'Terran', False)
df = pd.DataFrame(comp_test, index=['Player2_ArmyComp'])

df.iloc[0]

autoturret        0
banshee           0
battlecruiser     0
cyclone           0
ghost             0
hellion           0
marauder          5
marine           27
medivac           5
raven             1
reaper            0
siegetank         0
thor              0
viking            0
warhound          0
widowmine         0
Name: Player2_ArmyComp, dtype: int64

- rpl (sc2reader.resources.Replay)
    Replay being processed
- pid (int)
    In-game id for the player being analysed.
- buildings (bool)=False
    Flag indicating if the function should count buildings (True)
    or troops (False)

- dict
    Tally of a player's active units during a match

test_army = count_composition(sing_zerg, 1)
army_comp_df = pd.DataFrame(test_army)
# print(army_comp_df.to_markdown())

	whole_comp	early_comp	mid_comp	late_comp
autoturret	0	0	0	0
banshee	0	0	0	0
battlecruiser	0	0	0	0
cyclone	10	0	2	10
ghost	0	0	0	0
hellion	14	2	7	14
marauder	0	0	0	0
marine	0	0	0	0
medivac	0	0	0	0
raven	1	0	0	1
reaper	0	0	0	0
siegetank	0	0	0	0
thor	0	0	0	0
viking	1	1	1	1
warhound	0	0	0	0
widowmine	4	0	4	4

test_buildings_comp = count_composition(sing_zerg, 1, buildings=True)
buildings_comp_df = pd.DataFrame(test_buildings_comp)
# print(buildings_comp_df.to_markdown())

	whole_comp	early_comp	mid_comp	late_comp
armory	3	0	1	3
barracks	1	1	1	1
bunker	0	0	0	0
commandcenter	1	0	0	1
engineeringbay	1	0	0	1
factory	6	2	6	6
fusioncore	0	0	0	0
ghostacademy	0	0	0	0
missileturret	0	0	0	0
orbitalcommand	2	1	2	2
planetaryfortress	1	0	0	1
reactor	0	1	0	0
refinery	6	2	4	6
sensortower	0	0	0	0
starport	2	1	1	2
supplydepot	13	3	8	13
techlab	6	1	6	6

- rpl (sc2reader.resources.Replay)
    Replay being processed
- pid (int)
    In-game id for the player being analysed.
- buildings (bool)=False
    Flag indicating if the function should count buildings (True)
    or troops (False)

army_training_count = count_started(sing_zerg, 2)
atc_df = pd.DataFrame(army_training_count)

# print(atc_df.to_markdown())

	whole_started	early_started	mid_started	late_started
baneling	0	0	0	0
broodling	0	0	0	0
broodlord	0	0	0	0
corruptor	0	0	0	0
hydralisk	3	0	0	3
infestedterran	0	0	0	0
infestor	0	0	0	0
infestorburrowed	0	0	0	0
locust	0	0	0	0
lurker	0	0	0	0
mutalisk	0	0	0	0
overlord	12	3	7	2
overseer	0	0	0	0
queen	7	1	4	2
ravager	14	1	13	0
roach	0	0	0	0
swarmhost	0	0	0	0
ultralisk	0	0	0	0
viper	0	0	0	0
zergling	10	0	0	10

buildings_started_count = count_started(sing_zerg, 2, buildings=True)
bsc_df = pd.DataFrame(buildings_started_count)

# print(bsc_df.to_markdown())

	whole_started	early_started	mid_started	late_started
banelingnest	0	0	0	0
creeptumor	2	1	1	0
evolutionchamber	0	0	0	0
extractor	4	2	0	2
greaterspire	0	0	0	0
hatchery	0	0	0	0
hive	0	0	0	0
hydraliskden	1	0	0	1
infestationpit	0	0	0	0
lair	1	1	0	0
lurkerden	0	0	0	0
nydusnetwork	0	0	0	0
nydusworm	0	0	0	0
roachwarren	1	1	0	0
spawningpool	1	1	0	0
spinecrawler	0	0	0	0
spire	0	0	0	0
sporecrawler	0	0	0	0
ultraliskcavern	0	0	0	0

Base Expansion

To build their economy, players will, in most cases, establish more than one base. These expansions allow them to more speedily and efficiently collect and prevent running out of resources. To be precise, I define expansion as building one of the main base structures for the player's play race in a location that allows for exploiting complementary reservoirs of resources. These main structures are a Nexus for Protoss, a Command Center for Terrans, or a Hatchery, Lair or Hive for the Zerg.

In this case, I am using the speed with which players build their expansions and the amounts they maintain at each stage as indicators for their economic development strategy.

In this regard, I define two exportable functions that extract two performance indicators:

get_expan_times extracts the time of the first three expansions
get_expan_counts exports a dictionary containing the expansion counts for the differt game stages.

- rpl (sc2reader.resources.Replay)
        Replay containing the data of the match.
- pid (int)
        Player id during the match.

- dict[str, float]
        Dictionary containing the names and completion times of the
        player's first three expansions.

The code bellow shows how get_expan_times works.

print(get_expan_times(zustates, 1))

{'expan_1': 344.77611940298505, 'expan_2': 791.4626865671642, 'expan_3': 984.9850746268656}

test = test_rpl = sc2reader.load_replay("./test_replays/TestProfilerBatch/2000 Atmospheres LE (14).SC2Replay")
print(get_expan_times(test, 2))

{'expan_1': 261.0943458686441, 'expan_2': 515.4802039194916}

- rpl (sc2reader.resources.Replay)
    Replay containing a match's information.
- pid (int)
    The match player ID for the player being consider in the
    analysis.

- dict[str, int]
    Dictionary containing the base count for the whole, early,
    mid and late stages of the game.

The following are two examples of the use of get_expan_counts.

exp_counts = get_expan_counts(zustates, 1)
exp_counts

{'total_expan': 3, 'earlyg_expan': 0, 'midg_expan': 1, 'lateg_expan': 2}

exp_counts = get_expan_counts(sing_protoss, 1)
exp_counts

{'total_expan': 2, 'earlyg_expan': 1, 'midg_expan': 0, 'lateg_expan': 1}

Player Tech Update

Beyond buildings and training units, the third way players can spend their resources is by researching tech updates.

However, contrary to units and buildings, player objects do not store a list of tech upgrades. Thus, I need to use the match's UpgradeCompleteEvents to build this list.

Another difference between tracking units and upgrades is that it makes no sense to count the occurrences of each update because players can only 'buy' them once during each match. For this reason, I record the second at which the update takes place instead. Based on this record, when building the player profiles, I can average the times a player researched each update to get a rough measurement of the game stage when they prefer to use them. At that moment, I can also count the number of times they research each upgrade to see what upgrades they favour.

Bellow, I define the list_player_upgrades function, which returns a dictionary of all the player's race upgrades and when they were completed.

The following table shows a sample result from applaying list_player_upgrades on a replay.

player1_upgrades = list_player_upgrades(sing_zerg, 1)
# print(pd.DataFrame(player1_upgrades, index=['P_1 Upgrdes']).T.to_markdown())

	P_1 Upgrdes
BansheeCloak	0
BansheeSpeed	0
BattlecruiserEnableSpecializations	0
CycloneLockOnDamageUpgrade	546.304
DrillClaws	427.76
EnhancedShockwaves	0
HiSecAutoTracking	0
HighCapacityBarrels	347.778
LiberatorAGRangeUpgrade	0
MedivacIncreaseSpeedBoost	0
PersonalCloaking	0
PunisherGrenades	0
RavenCorvidReactor	0
ShieldWall	0
SmartServos	526.309
Stimpack	0
TerranBuildingArmor	0
TerranInfantryArmorsLevel1	0
TerranInfantryArmorsLevel2	0
TerranInfantryArmorsLevel3	0
TerranInfantryWeaponsLevel1	0
TerranInfantryWeaponsLevel2	0
TerranInfantryWeaponsLevel3	0
TerranShipWeaponsLevel1	0
TerranShipWeaponsLevel2	0
TerranShipWeaponsLevel3	0
TerranVehicleAndShipArmorsLevel1	0
TerranVehicleAndShipArmorsLevel2	0
TerranVehicleAndShipArmorsLevel3	0
TerranVehicleWeaponsLevel1	574.155
TerranVehicleWeaponsLevel2	0
TerranVehicleWeaponsLevel3	0

6 - Parsing Build Orders

Introduction

Exportable Members

Builds-orders

Listing a player's elements

Alternative Implementation with `UnitTrackerEvents`

Counting Units With Multiple States

Functions

Composition functions

`count_composition`[source]

`count_started`[source]

Base Expansion

`get_expan_times`[source]

`get_expan_counts`[source]

Player Tech Update

`list_player_upgrades`[source]

6 - Parsing Build Orders

Introduction

Exportable Members

Builds-orders

Listing a player's elements

Alternative Implementation with UnitTrackerEvents

Counting Units With Multiple States

Functions

Composition functions

count_composition[source]

count_started[source]

Base Expansion

get_expan_times[source]

get_expan_counts[source]

Player Tech Update

list_player_upgrades[source]

Alternative Implementation with `UnitTrackerEvents`

`count_composition`[source]

`count_started`[source]

`get_expan_times`[source]

`get_expan_counts`[source]

`list_player_upgrades`[source]