Introduction

In this chapter, I review how to extract information on a player's build strategy. In accordance with this review I then define the build_parser module. This module contains functions that developers can use to parse the players' build orders, and to inventory their army, base and technology upgrades.

Exportable Members

Builds-orders

A crucial part of how players execute a strategy is how they construct their bases and their armies. Currently, in the game, players can review the order in which the game participants build their first 40 elements once a match is over. These build-orders include buildings (built, upgraded, expanded), units (trained, mutated, warped, merged), and any tech researched.

Since this package is meant to build player profiles based on the players' performance indicators, I need to find a way to capture the overall composition of the players' bases and armies. Moreover, afterwards, I must be able to generalise this composition data to build the players profiles.

Initially, my first instinct was to store the build orders as a time series. I thought of recording a sequence of snapshots, evenly spaced over time, that marked the time and order in which each element entered the game. However, storing the build-orders as a time series is too detailed and is not suited to practical generalisation. Hence, I am opting to extract units, buildings and research compositions at four different game intervals, i.e. whole, early, mid, and late games. These intervals match the measures that I take for macroeconomic indicators in <<Chapter 3 - Parsing Macroeconomic Indicators>>.

This section explores the factors I need to consider to define a set of functions that users can call to extract the player's elements compositions at the different game intervals and other related indicators.

Listing a player's elements

In any case, the first step to parse any indicators related to the build order is to obtain a list of the player's elements.

In the following code, I use the Player object's units attribute to extract a list of all these elements owned by the player during the match. I also use a list of all the units I want to include in my analysis (i.e. UNIT_NAMES) to filter unwanted units. For example, I exclude Zerg larvae from my analysis because they are generated automatically by the game with little control from the player.

Allow me to describe the code step by step to clarify the ideas behind this module. First, I will define multiple constant values that I will use through the module's development to filter various data characteristics. These constants store information from multiple data files saved in this project's data folder.

The constants include:

  • UNIT_NAMES: list of names for all the player-controllable units (buildings or troops) in the game. This list only contains one name per unit and excludes the various states a unit can have.
  • RACE_ARMIES: list of controllable troops separated by race; excludes workers and structures.
  • RACE_BUILDINGS: list of controllable structures separated by race.
  • RACE_UPGRADES: list of tech updates that players can research during a match. The list excludes any default upgrades that players do not directly trigger.
import sc_training.ingest.build_parser as bp
Path(bp.__file__)
Path('c:/Users/david/Documents/phdcode/sc_training/sc_training/ingest/build_parser.py')

After loading these data, I load multiple sample replays. In this case, I need to use a larger pool of test cases to ensure that I am considering all the different types of game units and build variations.

rps_path = Path("./test_replays")

# single_replay is the base case I use to develop the functions 
# in this module.
single_replay = sc2reader.load_replay(str(rps_path/"Jagannatha LE.SC2Replay"))

# The following replays have various compositions of races, armies, 
# structures, tech updates and other game elements that allow me to test 
# and debug the module's functions.
sing_zerg = sc2reader.load_replay(str(rps_path/"Oxide LE (14).SC2Replay"))
sing_protoss= sc2reader.load_replay(str(rps_path/"Oxide LE (13).SC2Replay"))
zustates = sc2reader.load_replay(str(rps_path/'zustates.SC2Replay'))
tustates = sc2reader.load_replay(str(rps_path/'tustates.SC2Replay'))
tfly = sc2reader.load_replay(str(rps_path/'terranfly.SC2Replay'))

# I store some basic variables out of the test case replay to make the  
# sample code more readable.
match_events = [event for event in single_replay.events]
rpl_duration = single_replay.length.seconds
rpl_rec_duration = match_events[-1].second
rpl_fps = single_replay.game_fps

With this setup in place, I can proceed to extract a list of all the units owned by a player during the course of the game.

p2_units = [u for u in sing_zerg.player[2].units 
            if u.name.lower() in UNIT_NAMES]

# Extract and print a sample containing the first 20 units owned by the
# player during the game for examination.
sample_prints = [f'{ind+1:<3} Name: {u.name:<15} \
                Start: {u.started_at:>5.0f} End: {str(u.died_at):>5}' 
                for ind, u in enumerate(p2_units[:20])]

for string in sample_prints:
    print(string)
1   Name: Lair                            Start:     0 End:  None
2   Name: Drone                           Start:     0 End:  5184
3   Name: Drone                           Start:     0 End:  3530
4   Name: Drone                           Start:     0 End:  2046
5   Name: Drone                           Start:     0 End:  None
6   Name: Drone                           Start:     0 End: 13011
7   Name: Drone                           Start:     0 End:  2739
8   Name: Drone                           Start:     0 End:  None
9   Name: Drone                           Start:     0 End:  None
10  Name: Drone                           Start:     0 End:  None
11  Name: Drone                           Start:     0 End:  None
12  Name: Drone                           Start:     0 End:  2226
13  Name: Drone                           Start:     0 End:  4969
14  Name: Overlord                        Start:     0 End:  None
15  Name: Overlord                        Start:   594 End:  None
16  Name: Drone                           Start:   675 End:  None
17  Name: Drone                           Start:   678 End:  None
18  Name: Drone                           Start:   976 End:  None
19  Name: Drone                           Start:   987 End:  None
20  Name: SpawningPool                    Start:  1186 End:  None

The example above shows how each player's unit list begins with thirteen or fourteen starting units. For instance, players who play with Protoss or Terran start with thirteen units (one main base plus twelve workers). However, if they play with Zerg, they begin with fourteen units (one main base, twelve workers, and one overlord). Because the players do not build these elements, I do not count them as part of their build strategy.

The following code extracts the unit lists for both players in the sample match, ignoring the starting units. In the examples, I print their first ten elements to show the difference from the previous list. Note that none of the elements has a starting time of 0.

p1_race = sing_zerg.player[1].play_race
p1_units = [u for u in sing_zerg.players[0].units 
            if u.name.lower() in UNIT_NAMES]


# Extract a sub-list of all elements excludind the starting elements 
# according to the player's race.
p1_units_no_inits = p1_units[(13 if p1_race != 'Zerg' else 14):]


# Print the new list of elements to show they skip the starting elements.
p1_unit_list_print = [
        (f'{ind+1:<3} Name: {u.name:<15}' \
        + f'Start: {u.started_at:>5.0f} End: {str(u.died_at):>5}') 
        for ind, u in enumerate(p1_units_no_inits[:10])]

for string in p1_unit_list_print:
    print(string)
1   Name: SCV            Start:   305 End:  None
2   Name: SCV            Start:   576 End:  None
3   Name: SCV            Start:   847 End:  None
4   Name: SCV            Start:  1239 End:  None
5   Name: Refinery       Start:  1367 End:  None
6   Name: Refinery       Start:  1385 End:  None
7   Name: Barracks       Start:  1416 End:  None
8   Name: SCV            Start:  1580 End:  None
9   Name: SCV            Start:  1851 End:  None
10  Name: SCV            Start:  2122 End:  None
# This second player is playing with Zerg. Hence, they have more
# starting units.
p2_race = sing_zerg.player[2].play_race
p2_units_no_inits = p2_units[(13 if p2_race != 'Zerg' else 14):]

p2_unit_list_print = [
        (f'{ind+1:<3} Name: {u.name:<15}'
        + f'Start: {u.started_at:>5.0f} End: {str(u.died_at):>5}') 
        for ind, u in enumerate(p2_units_no_inits[:10])]

for string in p2_unit_list_print:
    print(string)
1   Name: Overlord       Start:   594 End:  None
2   Name: Drone          Start:   675 End:  None
3   Name: Drone          Start:   678 End:  None
4   Name: Drone          Start:   976 End:  None
5   Name: Drone          Start:   987 End:  None
6   Name: SpawningPool   Start:  1186 End:  None
7   Name: Drone          Start:  1223 End:  None
8   Name: Extractor      Start:  1566 End:  None
9   Name: Drone          Start:  1675 End:  None
10  Name: Drone          Start:  1878 End:  None

Alternative Implementation with UnitTrackerEvents

A different way to generate these lists is to use the Replay's UnitBornEvent, UnitInitEvent, UnitDoneEvent and UnitTypeChangeEvent instances. This second approach offers access to the unit's spawning time through the event's second attribute. However, it also means having to consolidate four discrete lists with overlapping data.

In the code below, I collect the events that store information on player 2's unit-spawning for the same sample Replay used in the previous examples. I split this information into four lists, according to different types of TrackerEvents. With these lists, I can review some of their differences and similarities. This comparison shows the information developers could use to separate them if need be.

p1_uborn_e = [event for event in sing_zerg.events 
            if isinstance(event, sc2reader.events.tracker.UnitBornEvent) 
            and event.control_pid == 1
            and event.unit.name.lower() in UNIT_NAMES]
p1_uinit_e = [event for event in sing_zerg.events 
            if isinstance(event, sc2reader.events.tracker.UnitInitEvent) 
            and event.control_pid == 1
            and event.unit.name.lower() in UNIT_NAMES]
p1_udone_e = [event for event in sing_zerg.events 
            if isinstance(event, sc2reader.events.tracker.UnitDoneEvent) 
            and event.unit.owner.pid == 1
            and event.unit.name.lower() in UNIT_NAMES]
p1_uchange_e = [event for event in sing_zerg.events 
            if isinstance(event, sc2reader.events.tracker.UnitTypeChangeEvent) 
            and event.unit.owner.pid == 1
            and event.unit.name.lower() in UNIT_NAMES]

print(f'Units owned during the match: {len(p1_units)}')
print(f'UnitsBorn: {len(p1_uborn_e)} Init: {len(p1_uinit_e)} \
Done: {len(p1_udone_e)} Change: {len(p1_uchange_e)}')
Units owned during the match: 138
UnitsBorn: 106 Init: 32 Done: 31 Change: 28

Next, I will use various set operationsto illustrate the relation between the lists. For example, the following code shows that, at least in this case, the union of the units linked to the Replay's UnitBornEvent and UnitInitEvent is the same as the list of units linked directly to the player.

p1_u_names = [u.id for u in p1_units]
p1_u_born = [u.unit.id for u in p1_uborn_e]
p1_u_init = [u.unit.id for u in p1_uinit_e]

set(p1_u_names) == (set(p1_u_born).union(set(p1_u_init)))
True

Meanwhile, looking at the intersections between the lists, I can see some overlap between them. See the code below.

u_init_done_intersection = set([e.unit.id for e in p1_udone_e]) \
                           & set([e.unit.id for e in p1_uinit_e])

print(f'{len(u_init_done_intersection)} where initialised and done.')

u_init_done_diff = set([e.unit.id for e in p1_uinit_e]) \
                        - set([e.unit.id for e in p1_udone_e])

print(f'{len(u_init_done_diff)} was destructed before completion.')
31 where initialised and done.
1 was destructed before completion.
u_born_change_intersection = set([e.unit.id for e in p1_uborn_e]) \
                             & set([e.unit.id for e in p1_uchange_e])

print(f'{len(u_born_change_intersection)} changed during the game.')
1 changed during the game.
u_init_change_intersect = set([e.unit.id for e in p1_uinit_e]) \
                             & set([e.unit.id for e in p1_uchange_e])
u_done_change_intersect = set([e.unit.id for e in p1_udone_e]) \
                             & set([e.unit.id for e in p1_uchange_e])

print(f'{len(u_init_change_intersect)} where initialised and change')
print(f'{len(u_done_change_intersect)} complete their building and changed')
8 where initialised and change
8 complete their building and changed

This overlap means that, while building a player's unit list from these events may be possible, it may be impractical compared to the first possibility. Still, I can learn how to extract the times for each unit's life stages (building initiation and completion, state change or death) during the match from the information contained in these events.

For instance, take the case of UnitBornEvents. The frame at which these events are executed is equal to the start (building initiation) and finished (building completion) frames recorded by the units linked to them.

Additionally, this data shows that the quotient of the units recorded birth frame (i.e. its finished_at attribute) and the replay's registered frames-per-second (i.e. Replay.game_fps) is equal to the UnitBornEvent's recorded execution time in seconds. I can convert this time into the real-time index using the calc_realtime_index function defined in <<Chapter 2 - Handling Tracker Events>>.

match_fps = sing_zerg.game_fps
[(f'UName:{e.unit.name:<7} e_rec_sec:{e.second:>7.0f}  '
    + f'U_time_quotient:{e.unit.finished_at//match_fps:>7.0f}')
for e in p1_uborn_e][15:20]
['UName:SCV     e_rec_sec:     52  U_time_quotient:     52',
 'UName:SCV     e_rec_sec:     77  U_time_quotient:     77',
 'UName:SCV     e_rec_sec:     98  U_time_quotient:     98',
 'UName:SCV     e_rec_sec:    115  U_time_quotient:    115',
 'UName:SCV     e_rec_sec:    132  U_time_quotient:    132']

Similarly, the UnitInitEvent's recoded execution frame is the same as the unit's recorded started_at frame.

[(f'UName: {e.unit.name:<15}'
  + f'e_rec_frame: {str(e.frame):>4} '
  + f'U_rec_start_frame: {str(e.unit.started_at):>5}') 
for e in p1_uinit_e][-5:]
['UName: SupplyDepot    e_rec_frame: 12816 U_rec_start_frame: 12816',
 'UName: SupplyDepot    e_rec_frame: 12835 U_rec_start_frame: 12835',
 'UName: SupplyDepot    e_rec_frame: 12863 U_rec_start_frame: 12863',
 'UName: SupplyDepot    e_rec_frame: 12966 U_rec_start_frame: 12966',
 'UName: Bunker         e_rec_frame: 13796 U_rec_start_frame: 13796']

Meanwhile, in the case of the UnitDoneEvents, the event's recoded execution frame is the same as the unit's recorded finish_at frame.

[(f'UName: {e.unit.name:<15}' 
    + f'Event_rec_frame: {e.frame:>7.0f} '
    + f'U_rec_finish_frame: {e.unit.finished_at:>7.0f}')
for e in p1_udone_e][-5:]
['UName: SupplyDepot    Event_rec_frame:   13315 U_rec_finish_frame:   13315',
 'UName: Armory         Event_rec_frame:   13333 U_rec_finish_frame:   13333',
 'UName: SupplyDepot    Event_rec_frame:   13343 U_rec_finish_frame:   13343',
 'UName: Armory         Event_rec_frame:   13391 U_rec_finish_frame:   13391',
 'UName: SupplyDepot    Event_rec_frame:   13446 U_rec_finish_frame:   13446']

Meanwhile, I can demonstrate the relation between initiated and completed units examining another match to show the links between UnitInitEvent, UnitDoneEvent and a player's units list.

For instance, the following code shows that three units get started but are never completed.

uinit_e = [event for event in single_replay.events 
            if isinstance(event, sc2reader.events.tracker.UnitInitEvent) 
            and event.control_pid == 2
            and event.unit.name.lower() in UNIT_NAMES]

udone_e = [event for event in single_replay.events 
            if isinstance(event, sc2reader.events.tracker.UnitDoneEvent) 
            and event.unit.owner.pid == 2
            and event.unit.name.lower() in UNIT_NAMES]

print(f'UnitsInit: {len(uinit_e)} UnitsDone: {len(udone_e)}')
UnitsInit: 24 UnitsDone: 21

The next one shows that these units can be identified in the player's units-list by the difference between the started_at and finished_at values.

incomplete_u = [
    (f'UNane: {u.name:<18} unitId: {u.id:<10.0f}' 
        + f'u_start_frame: {u.started_at:>8.0f}' 
        + f' u_finish_frame: {str(u.finished_at):>8}')
    for u in single_replay.players[1].units 
    if u.name.lower() in UNIT_NAMES
    and u.finished_at == None]

incomplete_u
['UNane: CommandCenter      unitId: 94109700  u_start_frame:    13111 u_finish_frame:     None',
 'UNane: Refinery           unitId: 92012546  u_start_frame:    13175 u_finish_frame:     None',
 'UNane: Refinery           unitId: 94633985  u_start_frame:    13213 u_finish_frame:     None']

Interestingly, the UnitDiedEvent list only includes these incomplete units' destruction if they are killed by another player, as shown by the following code.

p1_incomplet_units_ids = [f'{u.name}, {u.id} '
        for u in single_replay.player[1].units 
        if u.name.lower() in UNIT_NAMES
        and u.finished_at == None]

print(f'List of incomplete units on player 1\'s ', 
      f'units-list {p1_incomplet_units_ids}')


p1_udied_e = [f'{event.unit.name}, {event.unit.id}' 
            for event in single_replay.events 
            if isinstance(event, sc2reader.events.tracker.UnitDiedEvent)
            and event.unit.owner != None
            and event.unit.owner.pid == 1
            and event.unit.name.lower() in UNIT_NAMES
            and event.unit.finished_at == None]
print(f'List of units in UnitDiedEvent for player 1 {p1_udied_e}')

print('------------------------------------------------')

p2_incomplet_units_ids = [f'{u.name}, {u.id}'
        for u in single_replay.player[2].units 
        if u.name.lower() in UNIT_NAMES
        and u.finished_at == None]

print(f'List of incomplete units on player 2\'s ',
      f'units-list {p2_incomplet_units_ids}')


p2_udied_e = [f'{event.unit.name}, {event.unit.id}' 
            for event in single_replay.events 
            if isinstance(event, sc2reader.events.tracker.UnitDiedEvent)
            and event.unit.owner != None
            and event.unit.owner.pid == 2
            and event.unit.name.lower() in UNIT_NAMES
            and event.unit.finished_at == None]

print(f'List of units in UnitDiedEvent for player 2 {p2_udied_e}')
List of incomplete units on player 1's  units-list ['PhotonCannon, 78118917 ']
List of units in UnitDiedEvent for player 1 ['PhotonCannon, 78118917']
------------------------------------------------
List of incomplete units on player 2's  units-list ['CommandCenter, 94109700', 'Refinery, 92012546', 'Refinery, 94633985']
List of units in UnitDiedEvent for player 2 []

In the case above case, units player 2's incomplete units do not generate UnitDiedEvents becaused they were cancelled, not killed. This is the reason why player 2's UnitDiedEvent list is empty.

Counting Units With Multiple States

In the examples above, I have only counted units that remain in their primary state throughout the match for simplicity. However, in most games, some units will change states. I must consider this factor because of how sc2reader keeps track of the units.

For example, suppose I extract a list of all Zerg units' names in a match. In that case, I may notice that some Infestor units are counted as such, but others are counted as InfestorBurrowed.

Meanwhile, a similar operation around a Terran player's units shows units such as a SiegeTank have secondary stages like SiegeTankSieged. Similarly, a Hellion can also appear as a BattleHellion and a WidowMine as a WidowMineBurrowed.

zerg_player_units = [u.name for u in zustates.player[1].units
                    if u.is_army]
print('Sample Zerg unit set in zustates replay.')
pprint(set(zerg_player_units))

# Print set of terran units in a match
terran_player_units = [u.name for u in tustates.player[1].units
                    if u.is_army]
print('\nSample Terran unit set in tustates replay.')
pprint(set(terran_player_units))
Sample Zerg unit set in zustates replay.
{'Baneling',
 'Hydralisk',
 'Infestor',
 'InfestorBurrowed',
 'Lurker',
 'Overlord',
 'Overseer',
 'Queen',
 'Ravager',
 'Roach',
 'Zergling'}

Sample Terran unit set in tustates replay.
{'BattleHellion',
 'Hellion',
 'Marauder',
 'Marine',
 'Medivac',
 'SiegeTank',
 'SiegeTankSieged',
 'Thor',
 'WidowMine',
 'WidowMineBurrowed'}

However, an army composition should count these units in different stages as the same. Thus I need to account for how sc2reader stores the unit according to the state in which they finished or exited the game. In the following code, I use a unit types list and several conditions to demonstrate how one can filter the initial list. I also build a DataFrame with a Unit column that records the same name for units of the same type in multiple states to normalise the unit classification.

Note that, if I count the units based on the type recorded by sc2reader, the count includes the different unit states. Meanwhile, the normalised count adds the units of the same type that are in different states.

terran_player_units = [(uname, u, u.id) for u in tustates.player[1].units 
                    for uname in RACE_ARMIES['Terran']
                    if uname in u.name.lower() # Use the naming convention 
                                               # to get all units in 
                                               # different states
                    and u.is_army == True]


tpunits_df = pd.DataFrame({
        'Unit':[uname for uname, u, id in terran_player_units],
        'Uname': [u.name for uname, u, id in terran_player_units],
        'UnitID':[id for uname, u, id in terran_player_units]})

# print(tpunits_df.groupby('Uname').size().to_markdown())
# print(tpunits_df.groupby('Unit').size().to_markdown())

This table shows the count based on sc2reader type register.

Uname
BattleHellion 10
Hellion 6
Marauder 10
Marine 15
Medivac 3
SiegeTank 1
SiegeTankSieged 2
Thor 1
WidowMine 5
WidowMineBurrowed 7

This table shows the count based on the normalised names.

Unit
hellion 16
marauder 10
marine 15
medivac 3
siegetank 3
thor 1
widowmine 12

This same rule applies to buildings. However, in this case, there are two caveats when counting Terran buildings. First, TechLab and Reactor instances do not follow the same Unit/State naming convention as the other multi-state units. Instead, they follow the inverse pattern, State/Unit. Second, both TechLab and Reactor generate a double count. Firstly, they appear as themselves, and, secondly, they re-register the production buildings they expand (i.e. barracks, starports, and factories). In this case, they re-register these production buildings with the same hash-id that identifies them. Thus, when counting Terran buildings, I must re-filter the DataFrame to account for these anomalies. The following code illustrates this issue.

terran_player_buildings = [(uname, u, u.id) for u in tustates.player[1].units 
                    for uname in RACE_BUILDINGS['Terran']
                    if uname in u.name.lower() # Use the naming 
                                               # convention to get all units 
                                               # in different states
                    and u.is_building == True]

tbunits_df = pd.DataFrame({
        'Unit':[uname for uname, u, id in terran_player_buildings],
        'Uname': [u.name for uname, u, id in terran_player_buildings],
        'UnitID':[id for uname, u, id in terran_player_buildings]})

# print(tbunits_df[9:25].to_markdown())

Unit Uname UnitID
9 barracks BarracksTechLab 67895297
10 techlab BarracksTechLab 67895297
11 factory Factory 68681729
12 supplydepot SupplyDepot 71041026
13 supplydepot SupplyDepot 74186754
14 factory FactoryTechLab 74711042
15 techlab FactoryTechLab 74711042
16 sensortower SensorTower 76021761
17 refinery Refinery 76546050
18 factory Factory 77856769
19 refinery Refinery 78381057
20 supplydepot SupplyDepot 83099649
21 factory FactoryReactor 83361793
22 reactor FactoryReactor 83361793
23 armory Armory 83623937
24 supplydepot SupplyDepot 83886081
tbunits_df.drop_duplicates(subset='UnitID', keep='last', inplace=True) 

# Correct misslabeling of reactors     
tbunits_df.loc[tbunits_df['Uname'].str.contains('Reactor'), 'Unit'] = 'reactor'
tbunits_df.loc[tbunits_df['Uname'].str.contains('TechLab'), 'Unit'] = 'techlab'
# print(tbunits_df[8:25].to_markdown())

Unit Uname UnitID
8 planetaryfortress PlanetaryFortress 64749570
10 techlab BarracksTechLab 67895297
11 factory Factory 68681729
12 supplydepot SupplyDepot 71041026
13 supplydepot SupplyDepot 74186754
15 techlab FactoryTechLab 74711042
16 sensortower SensorTower 76021761
17 refinery Refinery 76546050
18 factory Factory 77856769
19 refinery Refinery 78381057
20 supplydepot SupplyDepot 83099649
22 reactor FactoryReactor 83361793
23 armory Armory 83623937
24 supplydepot SupplyDepot 83886081
25 planetaryfortress PlanetaryFortress 84672513
26 supplydepot SupplyDepot 29097986
27 supplydepot SupplyDepot 86507521

Functions

In this section, I develop the functions this module exports. These functions allow for the extraction of various performance indicators relative to the units trained, buildings built and upgrades researched by players through a match.

As is the case for other modules in this package, the exportable functions use several helper functions that can be consulted in the module's development notebooks or the module's source code. However, these helper functions are not included in this documentation.

Composition functions

The following functions generate lists of dictionaries that describe a player's army or buildings composition (count_composition) and the number of units that started training or buildings that started construction (count_started) during the whole match and through the early, mid and late games.

In this case, I define composition as the number of active units of different types a player has in the game. This count goes up every time a unit is created and down if they are killed. Meanwhile, count_started refers to the player's intended army, i.e. the number of units of different types they try to create at each interval of the game.

The two functions extract their information from a pandas.DataFrame generated by the helper function composition_df. This DataFrame includes each unit's type, the time they entered the game and their time of death. I illustrate this DataFrame's composition with a portion of the players' units during a sample match in the following table.

The following table shows a the tail of a sample DataFrame generated by calling composition_df helper function on the tfly replay.

Unit started_building enter_game_time died_time
44 marine 745.87 745.87 NaT
45 autoturret 748.883 748.883 759.0440051020407
46 marine 763.585 763.585 NaT
47 marine 763.81 763.81 NaT
48 autoturret 783.998 783.998 794.2940051020408

Similarly, the functions use the helper count_active_units function in conjunction with the composition_df's output to generate DataFrames that counts a player's units in a specific period of time.

The following are tables show the DataFrames that result from counting the units in the sample composition DataFrame.

Wole game table:

Unit started born died total
marauder 5 5 nan 5
marine 27 27 nan 27
medivac 5 5 nan 5
raven 1 1 nan 1

Early game:

Unit started born died total
marine 1 1 nan 1

Mid-game:

Unit started born died total
marine 12 12 nan 12

Late game:

Unit started born died total
marauder 5 5 nan 5
marine 14 14 nan 14
medivac 5 5 nan 5
raven 1 1 nan 1

After calculating a player's army composition or unit started counts, I need to format the output of the functions so that I can process them with the results of other matches.

In this regard, I considered two options. On the one hand, I could store counts for all units of all races for each player in every match. Following this approach, I would have a single set of replays for each player that would, by averaging all unit counts, express the general building preferences of each player. On the other hand, I could segregate the results by game race. This second option implies that I would have to keep three separate sets of replays per player. I would also have to process three profiles per player that express their preferences when playing each game race.

Although initially, I was inclined to opt for the first option, I decided on the second because it seems closer to the actual game experience. For example, in StarCraft II, players are classified separately in leagues when playing with different game races. Similarly, many of the game's achievements are repeated for each race. Thus, it felt more akin to the game experience to provide threes profiles. This second approach also means that each match's record will contain fewer blanc data points when processing the profiles, which safes storage and processing memory.

With this in mind, the last step of each module's functions is to complete their outcomes to include values for all the units or buildings of each player's race.

The following code demonstrates the result of the complete_count helper function as applyed to player 2's army composition for the whole game in the sample match.

army_count_df_whole = count_active_units(army_df, start = 0, end=700)
comp_test = complete_count([army_count_df_whole['total']], 'Terran', False)
df = pd.DataFrame(comp_test, index=['Player2_ArmyComp'])

df.iloc[0]
autoturret        0
banshee           0
battlecruiser     0
cyclone           0
ghost             0
hellion           0
marauder          5
marine           27
medivac           5
raven             1
reaper            0
siegetank         0
thor              0
viking            0
warhound          0
widowmine         0
Name: Player2_ArmyComp, dtype: int64

count_composition[source]

count_composition(rpl:Replay, pid:int, buildings:bool=False)

Generate a tally of all of a player's active units at different stages of the match.

The function returns a dictionary of with four keys ('whole_comp', 'early_comp', 'mid_comp', 'late_comp') each of which refers to a dictionary that stores pairs of 'unit_type' : 'active_unit_type_count`. There are values for all player's race unit types, even if the player has no active units of some types.

Args

- rpl (sc2reader.resources.Replay)
    Replay being processed
- pid (int)
    In-game id for the player being analysed.
- buildings (bool)=False
    Flag indicating if the function should count buildings (True)
    or troops (False)

Returns

- dict
    Tally of a player's active units during a match
test_army = count_composition(sing_zerg, 1)
army_comp_df = pd.DataFrame(test_army)
# print(army_comp_df.to_markdown())
whole_comp early_comp mid_comp late_comp
autoturret 0 0 0 0
banshee 0 0 0 0
battlecruiser 0 0 0 0
cyclone 10 0 2 10
ghost 0 0 0 0
hellion 14 2 7 14
marauder 0 0 0 0
marine 0 0 0 0
medivac 0 0 0 0
raven 1 0 0 1
reaper 0 0 0 0
siegetank 0 0 0 0
thor 0 0 0 0
viking 1 1 1 1
warhound 0 0 0 0
widowmine 4 0 4 4
test_buildings_comp = count_composition(sing_zerg, 1, buildings=True)
buildings_comp_df = pd.DataFrame(test_buildings_comp)
# print(buildings_comp_df.to_markdown())
whole_comp early_comp mid_comp late_comp
armory 3 0 1 3
barracks 1 1 1 1
bunker 0 0 0 0
commandcenter 1 0 0 1
engineeringbay 1 0 0 1
factory 6 2 6 6
fusioncore 0 0 0 0
ghostacademy 0 0 0 0
missileturret 0 0 0 0
orbitalcommand 2 1 2 2
planetaryfortress 1 0 0 1
reactor 0 1 0 0
refinery 6 2 4 6
sensortower 0 0 0 0
starport 2 1 1 2
supplydepot 13 3 8 13
techlab 6 1 6 6

count_started[source]

count_started(rpl:Replay, pid:int, buildings:bool=False)

Generate a tally of all of a player's started units at different stages of the match.

The function returns a dictionary of with four keys ('whole_started', 'early_started', 'mid_started', 'late_started') each of which refers to a dictionary that stores pairs of 'unit_type' : 'started_unit_type_count`. There are values for all player's race unit types, even if the player has no units of some types.

Args

- rpl (sc2reader.resources.Replay)
    Replay being processed
- pid (int)
    In-game id for the player being analysed.
- buildings (bool)=False
    Flag indicating if the function should count buildings (True)
    or troops (False)

Returns dict Tally of a player's started units during a match

army_training_count = count_started(sing_zerg, 2)
atc_df = pd.DataFrame(army_training_count)

# print(atc_df.to_markdown())
whole_started early_started mid_started late_started
baneling 0 0 0 0
broodling 0 0 0 0
broodlord 0 0 0 0
corruptor 0 0 0 0
hydralisk 3 0 0 3
infestedterran 0 0 0 0
infestor 0 0 0 0
infestorburrowed 0 0 0 0
locust 0 0 0 0
lurker 0 0 0 0
mutalisk 0 0 0 0
overlord 12 3 7 2
overseer 0 0 0 0
queen 7 1 4 2
ravager 14 1 13 0
roach 0 0 0 0
swarmhost 0 0 0 0
ultralisk 0 0 0 0
viper 0 0 0 0
zergling 10 0 0 10
buildings_started_count = count_started(sing_zerg, 2, buildings=True)
bsc_df = pd.DataFrame(buildings_started_count)

# print(bsc_df.to_markdown())
whole_started early_started mid_started late_started
banelingnest 0 0 0 0
creeptumor 2 1 1 0
evolutionchamber 0 0 0 0
extractor 4 2 0 2
greaterspire 0 0 0 0
hatchery 0 0 0 0
hive 0 0 0 0
hydraliskden 1 0 0 1
infestationpit 0 0 0 0
lair 1 1 0 0
lurkerden 0 0 0 0
nydusnetwork 0 0 0 0
nydusworm 0 0 0 0
roachwarren 1 1 0 0
spawningpool 1 1 0 0
spinecrawler 0 0 0 0
spire 0 0 0 0
sporecrawler 0 0 0 0
ultraliskcavern 0 0 0 0

Base Expansion

To build their economy, players will, in most cases, establish more than one base. These expansions allow them to more speedily and efficiently collect and prevent running out of resources. To be precise, I define expansion as building one of the main base structures for the player's play race in a location that allows for exploiting complementary reservoirs of resources. These main structures are a Nexus for Protoss, a Command Center for Terrans, or a Hatchery, Lair or Hive for the Zerg.

In this case, I am using the speed with which players build their expansions and the amounts they maintain at each stage as indicators for their economic development strategy.

In this regard, I define two exportable functions that extract two performance indicators:

  • get_expan_times extracts the time of the first three expansions
  • get_expan_counts exports a dictionary containing the expansion counts for the differt game stages.

get_expan_times[source]

get_expan_times(rpl:Replay, pid:int)

Gets a dictionary with the finished_at times for a player's first three expansions.

The functions searches a player's list of buildings and extracts the times (in seconds) when the first three base buildings are finished. These times are indexed as expan_1, expan_2 and expan_3.

If the player had less than three expansions during the game the missing values are filled with np.nan. If they have more than 3 expansions, the rest of the expansions are ignored.

Args

- rpl (sc2reader.resources.Replay)
        Replay containing the data of the match.
- pid (int)
        Player id during the match.

Returns

- dict[str, float]
        Dictionary containing the names and completion times of the
        player's first three expansions.

The code bellow shows how get_expan_times works.

print(get_expan_times(zustates, 1))
{'expan_1': 344.77611940298505, 'expan_2': 791.4626865671642, 'expan_3': 984.9850746268656}
test = test_rpl = sc2reader.load_replay("./test_replays/TestProfilerBatch/2000 Atmospheres LE (14).SC2Replay")
print(get_expan_times(test, 2))
{'expan_1': 261.0943458686441, 'expan_2': 515.4802039194916}

get_expan_counts[source]

get_expan_counts(rpl:Replay, pid:int)

The function counts the number of base structures a player built at each game stage.

Args

- rpl (sc2reader.resources.Replay)
    Replay containing a match's information.
- pid (int)
    The match player ID for the player being consider in the
    analysis.

Returns

- dict[str, int]
    Dictionary containing the base count for the whole, early,
    mid and late stages of the game.

The following are two examples of the use of get_expan_counts.

exp_counts = get_expan_counts(zustates, 1)
exp_counts
{'total_expan': 3, 'earlyg_expan': 0, 'midg_expan': 1, 'lateg_expan': 2}
exp_counts = get_expan_counts(sing_protoss, 1)
exp_counts
{'total_expan': 2, 'earlyg_expan': 1, 'midg_expan': 0, 'lateg_expan': 1}

Player Tech Update

Beyond buildings and training units, the third way players can spend their resources is by researching tech updates.

However, contrary to units and buildings, player objects do not store a list of tech upgrades. Thus, I need to use the match's UpgradeCompleteEvents to build this list.

Another difference between tracking units and upgrades is that it makes no sense to count the occurrences of each update because players can only 'buy' them once during each match. For this reason, I record the second at which the update takes place instead. Based on this record, when building the player profiles, I can average the times a player researched each update to get a rough measurement of the game stage when they prefer to use them. At that moment, I can also count the number of times they research each upgrade to see what upgrades they favour.

Bellow, I define the list_player_upgrades function, which returns a dictionary of all the player's race upgrades and when they were completed.

list_player_upgrades[source]

list_player_upgrades(rpl:Replay, pid:int)

Lists the times at wich the player completed their updates

The following table shows a sample result from applaying list_player_upgrades on a replay.

player1_upgrades = list_player_upgrades(sing_zerg, 1)
# print(pd.DataFrame(player1_upgrades, index=['P_1 Upgrdes']).T.to_markdown())
P_1 Upgrdes
BansheeCloak 0
BansheeSpeed 0
BattlecruiserEnableSpecializations 0
CycloneLockOnDamageUpgrade 546.304
DrillClaws 427.76
EnhancedShockwaves 0
HiSecAutoTracking 0
HighCapacityBarrels 347.778
LiberatorAGRangeUpgrade 0
MedivacIncreaseSpeedBoost 0
PersonalCloaking 0
PunisherGrenades 0
RavenCorvidReactor 0
ShieldWall 0
SmartServos 526.309
Stimpack 0
TerranBuildingArmor 0
TerranInfantryArmorsLevel1 0
TerranInfantryArmorsLevel2 0
TerranInfantryArmorsLevel3 0
TerranInfantryWeaponsLevel1 0
TerranInfantryWeaponsLevel2 0
TerranInfantryWeaponsLevel3 0
TerranShipWeaponsLevel1 0
TerranShipWeaponsLevel2 0
TerranShipWeaponsLevel3 0
TerranVehicleAndShipArmorsLevel1 0
TerranVehicleAndShipArmorsLevel2 0
TerranVehicleAndShipArmorsLevel3 0
TerranVehicleWeaponsLevel1 574.155
TerranVehicleWeaponsLevel2 0
TerranVehicleWeaponsLevel3 0