Introduction

In this section, I use the database built in Section 1.7 to define a player_profiler function that will take the data of each player's performances and then it will process it to compile three player profiles for each player, each corresponding to one of the play races of StarCraft 2. The section compiles this function into the profiler module.

Exportable Members

Querying the database

Once the ingestion process is done, the next step is to turn the replays data in the database into player profiles. To build these profiles, I need to separate the replays by player and then by race. To accomplish the former, I need to extract a list of usernames from the database.

The following code shows how to extract a list of player usernames looping through the replays collection created in the ingest process.

# Load database
working_db = set_up_db()

# Define username patters to ignore
ai_pat = re.compile(r'^A\.I\. [\d] [(][\w\s]*[)]$')
barcode_pat = re.compile(r'^l+$')

# Iterate through the records in the `replays` collection to get all valid
# user names.
players_match_count = dict()
for rec in working_db['replays'].find():
    for player in rec['players']:
        if not (ai_pat.findall(player['username']) 
                or barcode_pat.findall(player['username'])
                or player['username'] == 'Player 2'):
            players_match_count.setdefault(player['username'], 0)
            players_match_count[player['username']] += 1
            
# I will ignore players that only have one record in the database.
{name: count for name , count in players_match_count.items() if count >= 2}
{'HDEspino': 149,
 'DaveyC': 2,
 'Xnorms': 2,
 'Shah': 3,
 'Razer': 2,
 'gae': 2,
 'SenorCat': 2,
 'Worawit': 2,
 'aria': 2,
 'xiiaoyao': 2}

Of this players I will focus only on HDEspino given that the player has a substancial number of replays in the test database.

In any case, once I have a list of user names in a database, I can extract all the replays replative to that player with simple queries to the data base.

For example, the following queries extract all replays were HDEspino was playing either as player one or two.

print(len([rpl for rpl 
           in working_db['replays'].find({'players.0.username':'HDEspino',
                                          'players.0.race':'Protoss'})]))
print(len([rpl for rpl 
           in working_db['replays'].find({'players.1.username':'HDEspino',
                                          'players.1.race':'Protoss'})]))
91
39

Building the profile

Based on this list, I will build the Protoss profile for this player to illustrate what this process would entail.

First, I will query the system to identify the replays where the user was one of the players and was playing as Protoss. Then, I use that information to build a DataFrame containing all of the indicators for the player's performances in these replays.

# Query `replays` and build a list of replays the user played as
# Protoss and Player 1. 
player_1_protoss = [rpl['replay_name'] for rpl 
                   in working_db['replays'].
                      find({'players.0.username':'HDEspino', 
                            'players.0.race':'Protoss'},
                            {'replay_name':1, 'players':1})]

# Based on the list query `indicators` to get the performance scores of 
# Player 1 in each replay of the previous list.
working_repls = {}
for rpl in player_1_protoss:
    for cur in working_db['indicators'].find({'replay_name':rpl, 
                                              'player_id': 1}, 
                                             {'_id':0, 'replay_name':0,
                                              'player_username':0,
                                              'player_id': 0}):
        working_repls[rpl] = cur
        
len(working_repls)
91
# Repeat the process above but focused on the replays where the player
# played as Player 2.

player_2_protoss = [rpl['replay_name'] for rpl 
                   in working_db['replays'].
                      find({'players.1.username':'HDEspino', 
                            'players.1.race':'Protoss'},
                            {'replay_name':1, 'players':1})]

for rpl in player_2_protoss:
    for cur in working_db['indicators'].find({'replay_name':rpl, 
                                              'player_id': 2}, 
                                             {'_id':0, 'replay_name':0,
                                              'player_username':0,
                                              'player_id': 0}):
        working_repls[rpl] = cur
   
working_df = (pd.DataFrame(working_repls.values(), 
                           index=working_repls.keys()).reset_index()
                                                      .drop('index', axis=1))
working_df.info(memory_usage=False, show_counts=False)
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 130 entries, 0 to 129
Columns: 389 entries, unspent_minerals_avg_whole to late_started_zealot
dtypes: float64(97), int64(284), object(8)

After extracting all replays relative to a player and race, I group them into a DataFrame. In the sample case, the DataFrame has 130 entries and 385 columns. These columns represent the indicators stored by inventory_replays into the indicators collection.

More importantly, I see that there are three types of data stored in the columns (97 store decimals (type float64), 284 store integers (type int64) and 8 store other value types). In this case, the other value types are categorical values in the form of strings, which store the players' first and second prefered special abilities, as I show in the code below.

categorical_columns = working_df.dtypes[working_df.dtypes == object]

cat_features = working_df[[x for x in categorical_columns.index]]

# I only include 4 of the 8 columns for space.
pref_abil_df = working_df[['first_whole_pref_sab',
 'second_whole_pref_sab',
 'first_mid_pref_sab',
 'second_mid_pref_sab']]

# print(pref_abil_df.tail(5).to_markdown())
first_whole_pref_sab second_whole_pref_sab first_mid_pref_sab second_mid_pref_sab
125 ChronoBoostEnergyCost None None None
126 ChronoBoostEnergyCost UnloadTargetWarpPrism ChronoBoostEnergyCost UnloadTargetWarpPrism
127 ForceField ChronoBoostEnergyCost ForceField GuardianShield
128 ChronoBoostEnergyCost ForceField ChronoBoostEnergyCost ForceField
129 ChronoBoostEnergyCost None None None

I can process this categories using the value_counts function to get the most common preffered ability. Next I define get_top_of_category to extract the most used attribute in a column.

get_top_of_category(pref_abil_df.first_whole_pref_sab)
'ChronoBoostEnergyCost'
cate_profile = cat_features.apply(get_top_of_category, axis=0)
cate_profile
first_whole_pref_sab     ChronoBoostEnergyCost
second_whole_pref_sab           GuardianShield
first_early_pref_sab     ChronoBoostEnergyCost
second_early_pref_sab                     None
first_mid_pref_sab       ChronoBoostEnergyCost
second_mid_pref_sab                       None
first_late_pref_sab                       None
second_late_pref_sab                      None
dtype: object

Meanwhile, I will simply average all other columns to get a single value for the players profile.

non_cat_columns = working_df.dtypes[working_df.dtypes != object]

non_cat_features = working_df[[x for x in non_cat_columns.index]]
non_cat_features.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 130 entries, 0 to 129
Columns: 381 entries, unspent_minerals_avg_whole to late_started_zealot
dtypes: float64(97), int64(284)
memory usage: 387.1 KB
non_cate_profile = non_cat_features.mean()
non_cate_profile
unspent_minerals_avg_whole    1068.323130
unspent_minerals_avg_early     184.596550
unspent_minerals_avg_mid       651.947283
unspent_minerals_avg_late     2307.746755
unspent_vespene_avg_whole      531.042321
                                 ...     
late_started_stalker             5.123077
late_started_tempest             0.584615
late_started_voidray             5.069231
late_started_warpprism           0.123077
late_started_zealot              3.215385
Length: 381, dtype: float64

Once these two sets of values are defined, I can join them in a single profile.

profile_name = 'player_profile'
left = pd.DataFrame(non_cate_profile.to_dict(), index=[0])
left.insert(0, profile_name, 'HDEspino_protoss')
right = pd.DataFrame(cate_profile.to_dict(), index=[0])
right.insert(0, profile_name, 'HDEspino_protoss')

full_profile =  left.merge(right, how='inner', on=profile_name)
full_profile.info()
<class 'pandas.core.frame.DataFrame'>
Int64Index: 1 entries, 0 to 0
Columns: 390 entries, player_profile to second_late_pref_sab
dtypes: float64(381), object(9)
memory usage: 3.1+ KB

The following table shows the resultof the ten first and last indicators in the profile and their values.

Indicator Value
player_profile HDEspino_protoss
unspent_minerals_avg_whole 1068.32313026423
unspent_minerals_avg_early 184.59654999017428
unspent_minerals_avg_mid 651.9472831545554
unspent_minerals_avg_late 2307.7467547558495
unspent_vespene_avg_whole 531.0423213983343
unspent_vespene_avg_early 109.9563382651504
unspent_vespene_avg_mid 504.7169891278007
unspent_vespene_avg_late 1075.1499164552638
unspent_resources_avg_whole 1599.365451662564
late_started_warpprism 0.12307692307692308
late_started_zealot 3.2153846153846155
first_whole_pref_sab ChronoBoostEnergyCost
second_whole_pref_sab GuardianShield
first_early_pref_sab ChronoBoostEnergyCost
second_early_pref_sab None
first_mid_pref_sab ChronoBoostEnergyCost
second_mid_pref_sab None
first_late_pref_sab None
second_late_pref_sab None

Exportable function

Here, I define build_player_race_profiles as a function that converts all replays in a database into a set of player profiles. The function uses four helper functions:

build_player_race_profiles[source]

build_player_race_profiles()

Converts all replays in the project's database, defined in the project's config.json file, into a set of player profiles stored in that same database in the 'Protoss_Profiles', 'Terran_Profiles', and 'Zerg_Profiles' collections.

Once, I run the function. There is one record in each of the profile databases; the profile of HDEspino for each race.

build_player_race_profiles()
Accessing: TEST_library
1 users found in database
Generating Player Profiles
Created the following profiles
Protoss: 1
Zerg: 1
Terran: 1
print(working_db['Protoss_Profiles'].estimated_document_count())
print(working_db['Terran_Profiles'].estimated_document_count())
print(working_db['Zerg_Profiles'].estimated_document_count())
1
1
1