May 19, 2021
We process more and more data every day. Some processing tasks take a few minutes while others take more than a few hours. The length of processing time depends on the amount of data, the speed of hardware, and how we process data. The first two factors are usually fixed for a given scenario so we will focus on the third factor which is how we build a TurboIntegrator process in Planning Analytics. In this article, we will use the RunProcess command to shorten the data-processing time by converting a single-thread process to a multi-thread process.
A thread is created to handle the workload when a user runs a TurboIntegrator process. Please find an example in the below screenshot. A modern CPU (Central Processing Unit) is a multi-core processor, and this means that the CPU has many CPU cores to simultaneously run multiple threads at the same time. If we have an 8-core processor and eight users running a single-thread process for each user, we fully utilize the processing power of the 8-core processor. However, if we have an 8-core processor but we only use one thread and one core to process data, we will still have the processing power that we can potentially use to speed up the single-thread process.
The thread ID 13552 was created when a user ran a process.
Although not all tasks can be run in parallel, many tasks certainly can. The first step of creating a process which runs in multiple threads is to determine whether its workload can be split into smaller workloads. If we can divide a task into smaller independent tasks, we will be able to process the smaller tasks in parallel by using a thread to handle each smaller task. A task like loading 12 months of data will fit the previous description as long as the data in each month does not reply on data in other months. There are many commands and techniques which we can use for parallel processing. One of the commands is RunProcess.
The RunProcess command allows a single TurboIntegrator process to run another process with a new thread. The command is also for asynchronous processing which means that the new thread will run independently, and the main thread will not wait until the new thread is complete before it executes the new line of code. If we have 7 lines of code that call RunProcess, the main process will create 7 more threads.
The screenshot shows that the main process (ID 59176) runs 7 other processes in parallel.
The sample code of a process which runs 7 other processes in parallel.
Using the RunProcess command is beneficial when we want to run multiple processes in parallel to shorten the total execution time. If we were to run each process one by one by using the ExecuteProcess command, we would have to wait 35 seconds in the previous example. Sometimes, it may not be obvious to us to use the RunProcess command. Therefore, we should ask ourselves whether we can change the design of a process to leverage the RunProcess command and speed up the process.
One real-world example is a load process which loads 12 months of data from one cube to another cube and the data in all months do not rely on each other. We can use the RunProcess command that we have discussed earlier to load 12 months of data at the same time instead of loading one month at a time. We will repeat the RunProcess command 12 times, but each command will load the data of a different month. In the below sample process, the pYear parameter of the main process is to specify the year that we want to load data. The pPeriod parameter of a sub process is to specify the period to which it will load data. The sample process will create 12 threads to simultaneously load 12 months of data.
The sample code of the main process which will load data in 12 months from one cube to another cube.
The thread ID 59176 created 12 additional threads and each thread was responsible for loading 1 month of data.
We have discussed earlier that the RunProcess command is for asynchronous processing. In some cases, we want all threads to be completed before we proceed to the next process or next command. The RunProcess command alone will not be enough to complete this task because of the asynchronous nature of the command.
For example, if we want to allocate expense lines of multiple accounts to different cost centers, we may not know how many expense lines each account has. Therefore, an efficient way to handle this situation is to utilize all available processor cores to process each account and move on to the next account. We will want to know when the processing of the first account is done before we start processing the second account.
We can use the command called “Synchronized” or temporary files to keep track of the running threads, so we know when we will execute the next step. The Synchronized command allows a process to create a lock and other processes which will use the same lock will have to wait until the current process, which possesses the lock, is complete. The details about using the Synchronized command and temporary files to monitor threads are beyond the scope of this article. However, I encourage you to reach out to our experts at QueBIT if you have questions about the Synchronized command or need help with parallel processing to speed up your processes.
We have explored the concept of parallel processing to shorten the data-processing time of a TurboIntegrator process. The RunProcess command is one of the commands that we can use to create concurrent threads. We categorize parallel processing into two types. The first type of parallel processing does not have subsequent steps, so we do not need to monitor the completion of the threads. The second type requires the thread monitoring to find the completion of all threads in order to execute the next process or command. This type of parallel processing is more complex, but we can use the Synchronized command or temporary files to monitor the threads. We know that parallel processing can significantly speed up data processing. Therefore, we should consider building a multi-thread process whenever possible.