Introduction
This module defines the inventory_replays
function following the design proposed in the Ingestion Sequence Diagram (see <<2 - Data Ingestion and Clustering Process>>). It exports these functions into the ingest module of the ingest sub-package.
Exportable Members
Exportable Helper functions
Inventoring and Storing replays
collection
To inventory the replays in a replay batch I can use the get_replay_info
defined in the summarise_rpl
module (see Section 1.1).
In the following code, I load a batch of replays and I print the summary of the first two replays using the get_replay_info
function.
replay_batch = sc2reader.load_replays(str(test_batch_path))
for i, rpl in enumerate(replay_batch):
print(get_replay_info(rpl))
print(type(get_replay_info(rpl)))
if i == 1:
break
Storing inventory in Database
I can store the Replay_data
objects returned by get_replay_info
in a document-based database. Storing this inventory in a database means that other processes could access it to navigate the information built by the ingest process.
sc_training
, uses a config.json
file, which users should create and store in the working project's data folder, to set up a MongoDB local client. Using this client and the information in the config file the solution also creates the database. The following code loads this file and defines the loading procedure for when this module is imported.
I will divide this loading process into three steps.
1 - Handle Config file
I locate and load the config
file. This location process allows users to create a custom config.json
in a data directory in their projects. That file can be used to customise the name of the database, the port address and number for the MongoDB client, and the location of the replays the user wants to process. Once the file is located, I also define a procedure to ensure it contains the necessary information to connect with the MongoDB client and access the appropriate database.
Apart from this option, the code also defines how the module can default to a config
file stored in the library's data folder if the user fails to provide this file. This default file allows the system to attempt to function based on the assumption that the users are using a Windows computer and have a traditional installation of StarCraft II and MongoDB. Of course, this default set-up would not be expected to work in all cases, but it provides an option and a sample of the config
file.
The load_configurations
function, uses various internal helper functions to locate, open, verify and load the information from a project's config.json
file. It stores this information as a Config_settings
object.
The following sample code shows the use of load_configurations
to set up a test data base.
db_settings = load_configurations()
mongo_client = pymongo.MongoClient(db_settings.port_address,
db_settings.port_number)
worcking_bd = mongo_client[db_settings.db_name]
print('config.json content:')
print(db_settings)
print('Local Mongo DB Client')
print(mongo_client, end='\n\n')
print('Database: ')
print(worcking_bd)
2 - Index replay batch data in the "replays" collection
Once the database is set, I store the main descriptive data of each replay into the replays
collection of the database. This information is crucial to be able to iterate through the batch and also to have an index that indicates what players played which matches. It also can be used to review other information about the match, for instance, was the match a ranked match? or what races did each player played with.
To extract this data I use the get_replay_info
function from the summarise_rpl
module (see <<3 - Summarising Replays>>).
To collect this data I loop through the replays in a file inserting the return values of the functions from the ingest sub-package's different modules into collections within the database.
sample_db
that I will use in this notebook for illustration purposes. sample_db = mongo_client['sample_db']
collection = sample_db['replays']
replay_batch = sc2reader.load_replays(db_settings.replay_path)
count_add = 0
count_existed = 0
for rpl in replay_batch:
if not collection.count_documents({'replay_name': rpl.filename},
limit = 1):
# print(f'Adding {Path(rpl.filename).name} to replays collection.')
collection.insert_one(asdict(get_replay_info(rpl)))
count_add += 1
else:
count_existed += 1
# print(rpl.filename, "already exists in the replay_info collection.")
print(f"{count_add} added to replays")
print(f"{count_existed} already existed in replays")
Once the loop finishes, the collection has been created in the database. The following snippet shows that this collection is now composed of several documents containing the indexing data for each replay. Additionally, I added a conditional statement within the loop to check if the collection already contains a replay. In that case, the loop will print a warning and avoid inserting repeated documents into the database.
print('The collection now contains:',
collection.estimated_document_count(), 'documents')
3 - Building the indicators
collection
Once a replay's descriptive data is stored in the replays
collections, I need to also extract and store the performance indicators for each player in the match in the indicators
collection.
In the following code, I illustrate how I can use all the functions defined in the ingest
sub-package to build the indicators of two players in a sample replay. Each player's indicators are stored in a single flat dictionary, i.e. a dictionary that has no nested data structures as values.
# First I locate the sample file.
sample_replay = [sc2reader.load_replay('test_replays\Jagannatha LE.SC2Replay')]
# I create two list of all the functions from the ingest sub-package that I use
# to collect the player's performance indicators.
# I use two lists, because I need to make sure that all functions in the
# list have the same caller structure.
simple_functions = [get_player_macro_econ_stats,
get_expan_times,
get_expan_counts,
calc_attack_ratio,
calc_ctrlg_ratio,
count_max_active_groups,
calc_get_ctrl_grp_ratio,
calc_select_ratio,
list_player_upgrades,
calc_spe_abil_ratios]
double_functions = [count_composition,
count_started]
# I loop through the replays in sample replay, and through the players
# in each replay. In each storing the indicators of each player as a
# dictionary in the indicators_list.
indicators_list = []
for rpl in sample_replay:
print(f'Processing: {rpl.filename}')
for pid, player in rpl.player.items():
print(f'Processing player: pid:{pid} {player}')
# Declare the dict that will contain all of the players performance
# indicators
rpl_indicators = {}
# Run through all functions that need the caller arguments rpl
# (for the replay being analysed) and pid (for the player id of
# the player being parsed) and that return a flat dict.
for func in simple_functions:
rpl_indicators.update(func(rpl, pid))
# Run through all functions that need the caller arguments rpl
# (for the replay being analysed), pid (for the player id of
# the player being parsed) and a flag to focus on extracting data
# from the player's building or troops, and that return a flat dict.
for func in double_functions:
for flag in [True, False]:
rpl_indicators.update(flatten_indicators(func(rpl, pid, flag)))
# I run this last function appart, because I need to flatten the
# output so that the resulting dictionary has no nested levels.
v = get_prefered_spec_abil(rpl, pid)
rpl_indicators.update(flatten_indicators(v))
indicators_list.append(rpl_indicators)
As a result of the code above, indicators_list
now contains two dictionaries that store the indicators for each player. In the inventory_replays
function, I store this data in the database's indicators
collection instead of a list. In the previous code I used a list to illustrate how the process returns the following results:
print('Number of players evaluated: ',len(indicators_list))
print('First set of indicators belongs to:',
indicators_list[0]['player_username'])
print('First set of indicators contains: ',
len(indicators_list[0]), 'indicators.')
print('Second set of indicators belongs to:',
indicators_list[1]['player_username'])
print('Second set of indicators contains: ',
len(indicators_list[1]), 'indicators.')
Exportable functions
This section defines inventory_replays
. Users can pass a directory containing multiple .SC2Replay
files to this function and it will extract all data from these files and store it in the replays
and indicators
collection of the projects database (as defined in the config.json
file) following the logic explained above.
Internally, the function uses three helper functions:
set_up_db
: connects to the MongoDB client and loads the working database.verify_replays_path
: makes sure that the path past by the user is valid.build_indicators
: runs through the indicator extraction loop and stores the results in the indicators collections.
Of these, I export the set_up_db
, given that it can be useful in other modules (see for example <<10 - Player Profiler>>)
The following is a sample run of the set_up_db
function to illustrate how it can be used to connect to a mongo database.
worcking_bd = set_up_db()
print(type(worcking_bd))
After running the function once, I get the following sample results:
mongo_client.drop_database('TEST_library')
inventory_replays()
collections = [col for col in worcking_bd.list_collection_names()]
collections
for col in collections:
print(f'{col} has {worcking_bd[col].estimated_document_count()} records.')