Introduction
In this section, I use the database built in Section 1.7 to define a player_profiler
function that will take the data of each player's performances and then it will process it to compile three player profiles for each player, each corresponding to one of the play races of StarCraft 2. The section compiles this function into the profiler
module.
Exportable Members
Querying the database
Once the ingestion process is done, the next step is to turn the replays data in the database into player profiles. To build these profiles, I need to separate the replays by player and then by race. To accomplish the former, I need to extract a list of usernames from the database.
The following code shows how to extract a list of player usernames looping through the replays
collection created in the ingest process.
# Load database
working_db = set_up_db()
# Define username patters to ignore
ai_pat = re.compile(r'^A\.I\. [\d] [(][\w\s]*[)]$')
barcode_pat = re.compile(r'^l+$')
# Iterate through the records in the `replays` collection to get all valid
# user names.
players_match_count = dict()
for rec in working_db['replays'].find():
for player in rec['players']:
if not (ai_pat.findall(player['username'])
or barcode_pat.findall(player['username'])
or player['username'] == 'Player 2'):
players_match_count.setdefault(player['username'], 0)
players_match_count[player['username']] += 1
# I will ignore players that only have one record in the database.
{name: count for name , count in players_match_count.items() if count >= 2}
Of this players I will focus only on HDEspino
given that the player has a substancial number of replays in the test database.
In any case, once I have a list of user names in a database, I can extract all the replays replative to that player with simple queries to the data base.
For example, the following queries extract all replays were HDEspino
was playing either as player one or two.
print(len([rpl for rpl
in working_db['replays'].find({'players.0.username':'HDEspino',
'players.0.race':'Protoss'})]))
print(len([rpl for rpl
in working_db['replays'].find({'players.1.username':'HDEspino',
'players.1.race':'Protoss'})]))
Building the profile
Based on this list, I will build the Protoss profile for this player to illustrate what this process would entail.
First, I will query the system to identify the replays where the user was one of the players and was playing as Protoss. Then, I use that information to build a DataFrame containing all of the indicators for the player's performances in these replays.
# Query `replays` and build a list of replays the user played as
# Protoss and Player 1.
player_1_protoss = [rpl['replay_name'] for rpl
in working_db['replays'].
find({'players.0.username':'HDEspino',
'players.0.race':'Protoss'},
{'replay_name':1, 'players':1})]
# Based on the list query `indicators` to get the performance scores of
# Player 1 in each replay of the previous list.
working_repls = {}
for rpl in player_1_protoss:
for cur in working_db['indicators'].find({'replay_name':rpl,
'player_id': 1},
{'_id':0, 'replay_name':0,
'player_username':0,
'player_id': 0}):
working_repls[rpl] = cur
len(working_repls)
# Repeat the process above but focused on the replays where the player
# played as Player 2.
player_2_protoss = [rpl['replay_name'] for rpl
in working_db['replays'].
find({'players.1.username':'HDEspino',
'players.1.race':'Protoss'},
{'replay_name':1, 'players':1})]
for rpl in player_2_protoss:
for cur in working_db['indicators'].find({'replay_name':rpl,
'player_id': 2},
{'_id':0, 'replay_name':0,
'player_username':0,
'player_id': 0}):
working_repls[rpl] = cur
working_df = (pd.DataFrame(working_repls.values(),
index=working_repls.keys()).reset_index()
.drop('index', axis=1))
working_df.info(memory_usage=False, show_counts=False)
After extracting all replays relative to a player and race, I group them into a DataFrame. In the sample case, the DataFrame has 130 entries and 385 columns. These columns represent the indicators stored by inventory_replays
into the indicators
collection.
More importantly, I see that there are three types of data stored in the columns (97 store decimals (type float64), 284 store integers (type int64) and 8 store other value types). In this case, the other value types are categorical values in the form of strings, which store the players' first and second prefered special abilities, as I show in the code below.
categorical_columns = working_df.dtypes[working_df.dtypes == object]
cat_features = working_df[[x for x in categorical_columns.index]]
# I only include 4 of the 8 columns for space.
pref_abil_df = working_df[['first_whole_pref_sab',
'second_whole_pref_sab',
'first_mid_pref_sab',
'second_mid_pref_sab']]
# print(pref_abil_df.tail(5).to_markdown())
first_whole_pref_sab | second_whole_pref_sab | first_mid_pref_sab | second_mid_pref_sab | |
---|---|---|---|---|
125 | ChronoBoostEnergyCost | None | None | None |
126 | ChronoBoostEnergyCost | UnloadTargetWarpPrism | ChronoBoostEnergyCost | UnloadTargetWarpPrism |
127 | ForceField | ChronoBoostEnergyCost | ForceField | GuardianShield |
128 | ChronoBoostEnergyCost | ForceField | ChronoBoostEnergyCost | ForceField |
129 | ChronoBoostEnergyCost | None | None | None |
I can process this categories using the value_counts
function to get the most common preffered ability. Next I define get_top_of_category
to extract the most used attribute in a column.
get_top_of_category(pref_abil_df.first_whole_pref_sab)
cate_profile = cat_features.apply(get_top_of_category, axis=0)
cate_profile
Meanwhile, I will simply average all other columns to get a single value for the players profile.
non_cat_columns = working_df.dtypes[working_df.dtypes != object]
non_cat_features = working_df[[x for x in non_cat_columns.index]]
non_cat_features.info()
non_cate_profile = non_cat_features.mean()
non_cate_profile
Once these two sets of values are defined, I can join them in a single profile.
profile_name = 'player_profile'
left = pd.DataFrame(non_cate_profile.to_dict(), index=[0])
left.insert(0, profile_name, 'HDEspino_protoss')
right = pd.DataFrame(cate_profile.to_dict(), index=[0])
right.insert(0, profile_name, 'HDEspino_protoss')
full_profile = left.merge(right, how='inner', on=profile_name)
full_profile.info()
The following table shows the resultof the ten first and last indicators in the profile and their values.
Indicator | Value |
---|---|
player_profile | HDEspino_protoss |
unspent_minerals_avg_whole | 1068.32313026423 |
unspent_minerals_avg_early | 184.59654999017428 |
unspent_minerals_avg_mid | 651.9472831545554 |
unspent_minerals_avg_late | 2307.7467547558495 |
unspent_vespene_avg_whole | 531.0423213983343 |
unspent_vespene_avg_early | 109.9563382651504 |
unspent_vespene_avg_mid | 504.7169891278007 |
unspent_vespene_avg_late | 1075.1499164552638 |
unspent_resources_avg_whole | 1599.365451662564 |
late_started_warpprism | 0.12307692307692308 |
late_started_zealot | 3.2153846153846155 |
first_whole_pref_sab | ChronoBoostEnergyCost |
second_whole_pref_sab | GuardianShield |
first_early_pref_sab | ChronoBoostEnergyCost |
second_early_pref_sab | None |
first_mid_pref_sab | ChronoBoostEnergyCost |
second_mid_pref_sab | None |
first_late_pref_sab | None |
second_late_pref_sab | None |
Exportable function
Here, I define build_player_race_profiles
as a function that converts all replays in a database into a set of player profiles. The function uses four helper functions:
Once, I run the function. There is one record in each of the profile databases; the profile of HDEspino
for each race.
build_player_race_profiles()
print(working_db['Protoss_Profiles'].estimated_document_count())
print(working_db['Terran_Profiles'].estimated_document_count())
print(working_db['Zerg_Profiles'].estimated_document_count())