Section 1 outlines the ingest and clustering processes as parts of SC2 Training Grounds' computer model.

Note: this sections’ notebooks and modules use sequence diagrams and annotated code to illustrate the development of one of the primary processes that make up SC2 Training Grounds’ computer model. I chose to develop this part of the document in the most specific level of detail because the base concept of SC2 Training Grounds is to use player experience modelling to generate recommendations. Player experience modelling requires defining a way to abstract and transform the players’ experience into quantifiable data, which I do in this module (see dissertation chapter 5.2).

The following sequence diagram shows the proposed flow of orders and data between modules that would compose the application's ingest and clustering process. In particular, it shows how the library's main module (i.e. sc_training) can orchestrate the following steps:

Load the replay files generated by StarCraft II.
Call upon the ingest sub-package to inventory the replays and store their relevant indicators into various collections within a document-based database using a local MongoDB client.
Use the profiler sub-package to build the player profiles from the replay data collections.
Process the player profiles by game race to build each race classification clusters using the cluster sub-package.

Based on this proposed flow of orders, I first need to implement the ingest process for StarCraft II replay files. This ingestion process is essential to structure the input for the clustering process. Furthermore, it also determines the input of the application's classification modelling process, as shown in SC2 Training Grounds activity diagram (see Section 1's introduction).

In the sequence diagram, I show that the ingestion process is, on a top-level, composed of two main functions, inventory_replays and get_replay_indicators. inventory_replays registers the replays so that a user can use their basic meta-data to reference specific replays and identify their players. Meanwhile, get_replay_indicators compiles all performance indicators of each player in each replay. Thus, the system can loop through various replays and extract and store the performance indicators for each player in each game.

Having understood how the functions are supposed to interact, I implement them in the following modules. The most important outcome of this implementation is the structure of the players' performance indicators and how this data is indexed. In other words, this definition determines the information vectors that will become the inputs for the clustering and classification modelling process afterwards.

Note: To review the three-step process that resulted in the final version of the vectors, see the dissertation document Chapter 5.

This implementation implies defining multiple functions that extract, organise and return different bits of information contained in a SC2Replay file. Since different pieces relate to different categories of information, I group the functions in separate modules according to those categories.

The following sequence diagram shows the interaction between these modules in the ingest process.

Instion Sequence Diagram — Ingestion Sequence Diagram

2 - Data Ingestion and Clustering Process