Introduction

This module defines the inventory_replays function following the design proposed in the Ingestion Sequence Diagram (see <<2 - Data Ingestion and Clustering Process>>). It exports these functions into the ingest module of the ingest sub-package.

Exportable Members

Exportable Helper functions

Inventoring and Storing replays collection

To inventory the replays in a replay batch I can use the get_replay_info defined in the summarise_rpl module (see Section 1.1).

In the following code, I load a batch of replays and I print the summary of the first two replays using the get_replay_info function.

replay_batch = sc2reader.load_replays(str(test_batch_path))

for i, rpl in enumerate(replay_batch):
    print(get_replay_info(rpl))
    print(type(get_replay_info(rpl)))
    if i == 1:
        break
File path:                   c:\Users\david\Documents\phdcode\sc_training\test_replays\TestProfilerBatch\16-Bit LE (2).SC2Replay 
File name:                   16-Bit LE (2).SC2Replay 
Date (datetime.datetime):    2021-06-04 03:49:10 
Duration (seconds):          707 
Game type:                   1v1 
Game release:                5.0.7.84643 
Map:                         16-Bit LE 
Game category:               Private 
winner:                      1 
players:                     [(1, 'HDEspino', 'Terran', 'Win'), (2, 'A.I. 1 (Harder)', 'Zerg', 'Loss')] 

<class 'sc_training.ingest.summarise_rpl.Replay_data'>
File path:                   c:\Users\david\Documents\phdcode\sc_training\test_replays\TestProfilerBatch\16-Bit LE (3).SC2Replay 
File name:                   16-Bit LE (3).SC2Replay 
Date (datetime.datetime):    2021-06-08 02:04:37 
Duration (seconds):          384 
Game type:                   1v1 
Game release:                5.0.7.84643 
Map:                         16-Bit LE 
Game category:               Private 
winner:                      1 
players:                     [(1, 'HDEspino', 'Protoss', 'Win'), (2, 'A.I. 1 (Easy)', 'Protoss', 'Loss')] 

<class 'sc_training.ingest.summarise_rpl.Replay_data'>

Storing inventory in Database

I can store the Replay_data objects returned by get_replay_info in a document-based database. Storing this inventory in a database means that other processes could access it to navigate the information built by the ingest process.

sc_training, uses a config.json file, which users should create and store in the working project's data folder, to set up a MongoDB local client. Using this client and the information in the config file the solution also creates the database. The following code loads this file and defines the loading procedure for when this module is imported.

I will divide this loading process into three steps.

1 - Handle Config file

I locate and load the config file. This location process allows users to create a custom config.json in a data directory in their projects. That file can be used to customise the name of the database, the port address and number for the MongoDB client, and the location of the replays the user wants to process. Once the file is located, I also define a procedure to ensure it contains the necessary information to connect with the MongoDB client and access the appropriate database.

Apart from this option, the code also defines how the module can default to a config file stored in the library's data folder if the user fails to provide this file. This default file allows the system to attempt to function based on the assumption that the users are using a Windows computer and have a traditional installation of StarCraft II and MongoDB. Of course, this default set-up would not be expected to work in all cases, but it provides an option and a sample of the config file.

The load_configurations function, uses various internal helper functions to locate, open, verify and load the information from a project's config.json file. It stores this information as a Config_settings object.

class Config_settings[source]

Config_settings(port_address:str, port_number:int, db_name:str, replay_path:str)

This type of object stores the data extracted from the config file.

Attributes

- port_address: str
    Address of the MongoDB Client that the program will connect to.
- port_number: int
    Port number of the client located in the address above
- db_name: str
    Name of the project's data base
- replay_path: str
    Path to the replays that must be analysed and stored in the
    database

load_configurations[source]

load_configurations()

Loads the project's configuration information.

This function locates, verifies and extracts the project's configuration data. This data tells sc_training where to find the replays it needs to inventory and process, how to connect to the MongoDB client it will use to store this data in a database, and the name of the database it should use.

Args

- None

Returns

- Config_settings

Errors

- FileNotFound
    If there is no valid config file
- jsonschema.exceptions.ValidationError
    If the config file does not contain the necessary data or does
    not conform to the proper schema necessary to work.

The following sample code shows the use of load_configurations to set up a test data base.

db_settings = load_configurations()
mongo_client = pymongo.MongoClient(db_settings.port_address, 
                                   db_settings.port_number)
worcking_bd = mongo_client[db_settings.db_name]
print('config.json content:')
print(db_settings)
print('Local Mongo DB Client')
print(mongo_client, end='\n\n')
print('Database: ')
print(worcking_bd)
config.json content:
Port Address:                                 localhost
Port Number:                                      27017
DB Name:                                   TEST_library
Replays file:          .\test_replays\TestProfilerBatch

Local Mongo DB Client
MongoClient(host=['localhost:27017'], document_class=dict, tz_aware=False, connect=True)

Database: 
Database(MongoClient(host=['localhost:27017'], document_class=dict, tz_aware=False, connect=True), 'TEST_library')

2 - Index replay batch data in the "replays" collection

Once the database is set, I store the main descriptive data of each replay into the replays collection of the database. This information is crucial to be able to iterate through the batch and also to have an index that indicates what players played which matches. It also can be used to review other information about the match, for instance, was the match a ranked match? or what races did each player played with.

To extract this data I use the get_replay_info function from the summarise_rpl module (see <<3 - Summarising Replays>>).

To collect this data I loop through the replays in a file inserting the return values of the functions from the ingest sub-package's different modules into collections within the database.

sample_db = mongo_client['sample_db']
collection = sample_db['replays']

replay_batch = sc2reader.load_replays(db_settings.replay_path)

count_add = 0
count_existed = 0
for rpl in replay_batch:
    if not collection.count_documents({'replay_name': rpl.filename}, 
                                      limit = 1):
        # print(f'Adding {Path(rpl.filename).name} to replays collection.')
        collection.insert_one(asdict(get_replay_info(rpl)))
        count_add += 1
    else:
        count_existed += 1
        # print(rpl.filename, "already exists in the replay_info collection.")

print(f"{count_add} added to replays")
print(f"{count_existed} already existed in replays")
0 added to replays
153 already existed in replays

Once the loop finishes, the collection has been created in the database. The following snippet shows that this collection is now composed of several documents containing the indexing data for each replay. Additionally, I added a conditional statement within the loop to check if the collection already contains a replay. In that case, the loop will print a warning and avoid inserting repeated documents into the database.

print('The collection now contains:', 
      collection.estimated_document_count(), 'documents')
The collection now contains: 153 documents

3 - Building the indicators collection

Once a replay's descriptive data is stored in the replays collections, I need to also extract and store the performance indicators for each player in the match in the indicators collection.

In the following code, I illustrate how I can use all the functions defined in the ingest sub-package to build the indicators of two players in a sample replay. Each player's indicators are stored in a single flat dictionary, i.e. a dictionary that has no nested data structures as values.

# First I locate the sample file.
sample_replay = [sc2reader.load_replay('test_replays\Jagannatha LE.SC2Replay')]

# I create two list of all the functions from the ingest sub-package that I use
# to collect the player's performance indicators.
# I use two lists, because I need to make sure that all functions in the 
# list have the same caller structure.
simple_functions = [get_player_macro_econ_stats,
                    get_expan_times,
                    get_expan_counts,
                    calc_attack_ratio,
                    calc_ctrlg_ratio,
                    count_max_active_groups,
                    calc_get_ctrl_grp_ratio,
                    calc_select_ratio,
                    list_player_upgrades,
                    calc_spe_abil_ratios]

double_functions = [count_composition,
                    count_started]

# I loop through the replays in sample replay, and through the players
# in each replay. In each  storing the indicators of each player as a 
# dictionary in the indicators_list. 
indicators_list = []
for rpl in sample_replay:
    print(f'Processing: {rpl.filename}')
    for pid, player in rpl.player.items():
        print(f'Processing player: pid:{pid} {player}')
        
        # Declare the dict that will contain all of the players performance
        # indicators
        rpl_indicators = {}
        
        # Run through all functions that need the caller arguments rpl
        # (for the replay being analysed) and pid (for the player id of
        # the player being parsed) and that return a flat dict. 
        for func in simple_functions:
            rpl_indicators.update(func(rpl, pid))

        # Run through all functions that need the caller arguments rpl
        # (for the replay being analysed), pid (for the player id of
        # the player being parsed) and a flag to focus on extracting data 
        # from the player's building or troops, and that return a flat dict.
        for func in double_functions:
            for flag in [True, False]:
                rpl_indicators.update(flatten_indicators(func(rpl, pid, flag)))

        # I run this last function appart, because I need to flatten the
        # output so that the resulting dictionary has no nested levels. 
        v = get_prefered_spec_abil(rpl, pid)
        rpl_indicators.update(flatten_indicators(v))
                
        indicators_list.append(rpl_indicators)
Processing: test_replays\Jagannatha LE.SC2Replay
Processing player: pid:1 Player 1 - HDEspino (Protoss)
Processing player: pid:2 Player 2 - MxChrisxM (Terran)

As a result of the code above, indicators_list now contains two dictionaries that store the indicators for each player. In the inventory_replays function, I store this data in the database's indicators collection instead of a list. In the previous code I used a list to illustrate how the process returns the following results:

print('Number of players evaluated: ',len(indicators_list))
print('First set of indicators belongs to:', 
      indicators_list[0]['player_username'])
print('First set of indicators contains: ',
        len(indicators_list[0]), 'indicators.')
print('Second set of indicators belongs to:', 
      indicators_list[1]['player_username'])
print('Second set of indicators contains: ',
        len(indicators_list[1]), 'indicators.')      
Number of players evaluated:  2
First set of indicators belongs to: HDEspino
First set of indicators contains:  388 indicators.
Second set of indicators belongs to: MxChrisxM
Second set of indicators contains:  377 indicators.

Exportable functions

This section defines inventory_replays. Users can pass a directory containing multiple .SC2Replay files to this function and it will extract all data from these files and store it in the replays and indicators collection of the projects database (as defined in the config.json file) following the logic explained above.

Internally, the function uses three helper functions:

  • set_up_db: connects to the MongoDB client and loads the working database.
  • verify_replays_path: makes sure that the path past by the user is valid.
  • build_indicators: runs through the indicator extraction loop and stores the results in the indicators collections.

Of these, I export the set_up_db, given that it can be useful in other modules (see for example <<10 - Player Profiler>>)

set_up_db[source]

set_up_db()

Loads the database specified in the project's config.json file.

Returns

- pymongo.database.Database
    Python object that allows the user to interact with the
    database specified in the project's config file.

The following is a sample run of the set_up_db function to illustrate how it can be used to connect to a mongo database.

worcking_bd = set_up_db()
print(type(worcking_bd))
<class 'pymongo.database.Database'>

inventory_replays[source]

inventory_replays()

This function builds two collections within the database specified in the config.json file.

The replay information will be stored in the database specified in cwd/data/config.json in the following collections:

  • replays Stores the metadata of the replays, can be used for indexing and for finding the other replays.
  • inicators Store the indicators for each performance of every player.

Args:

- replay_batch
    Directory address where the replays to process are located.

Return: -None

After running the function once, I get the following sample results:

mongo_client.drop_database('TEST_library')
inventory_replays()
Inventorying replays at: test_replays\TestProfilerBatch in database TEST_library
Load complete.
153 files processed
149 files loaded
4 files ignored
0 files alredy existed
collections = [col for col in worcking_bd.list_collection_names()]
collections
['replays', 'indicators']
for col in collections:
    print(f'{col} has {worcking_bd[col].estimated_document_count()} records.')
replays has 149 records.
indicators has 298 records.