Selecting, Copying and Loading Performance in IBM Planning Analytics

IBM Planning Analytics – Selecting, Copying and Loading Performance

Selecting, Copying and Loading

Moving data into, around, and “out of” a Planning Analytics model are examples of common operations and constitute essential practices.

Loading usually involves executing database commands to select information from external data sources, perhaps from a transactional general ledger system or from a data warehouse or lake, transforming that information in some way and then loading it into a model. Executing copies within a Planning Analytics model is also common practice, where typically an existing version or scenario’s data is copied to a “fresh” version to start a new round of planning or perhaps current forecasted amounts are copied from a source cube and loaded into a shared cube for reporting (just a few of examples of copying data). Finally, you will also find views in a model that are routinely used to export slices of information from the model, to be consumed by external applications and consumers.

What is the Cost

Each of the above-mentioned operations has a cost to perform. The higher the cost (the more resources consumed by the operation) the bigger the potential hit to your model’s overall performance. Many factors influence operational cost and are worth understanding.

Loading data (from external systems) and extracting data (from a Planning Analytics model) may involve executing complex database commands that can cause poor performance. Other factors, such as how a Planning Analytics process establishes connections to external systems, also have the potential to slow your model.

When copies are “in progress” they can, at minimum, slow performance within a model and sometimes, depending upon how they have been implemented, may even lock the model so no work can be performed by any user and no other task or “thread” can execute until the lock is released, since copies typically establish views and subsets to identify source and target data intersection points.

Finally, you may find that processes (both loading and copying) may be modifying meta data (for example adding new GL Accounts) “on the fly” which can establish locks on objects within the model which influence model performance.

What to Look for

The following is not a comprehensive list, nor does it guarantee that you will notice a significant improvement in your model’s performance but, but based upon years of experience, these items are typically simple to implement and will improve your model’s overall health:

Poor Selectivity and Volumes

Whether you are identifying data in a relational database query or establishing a cube source view, it is important to be as explicit as possible to “handle” only the data that is absolutely necessary. Make sure you are only reading the data you need to meet your objective as opposed to “over-reading” which is reading a larger amount of data and filtering or skipping unwanted data as you process it. Consider including a “where clause” in a select statement or subset insert when querying data and, additionally, make sure you “shorten” processing volumes wherever possible by excluding “stale” data, for example selecting and processing only current period record – as historical periods may not be necessary as part of a daily update.

Persistent Objects

TurboIntegrator processes use “Views” to establish what data to process. View creation consumes resources and can cause locking. Thankfully, you can use the ViewCreate function to create a temporary view which persists only for the duration of the TurboIntegrator process or chore in which the view is created (when th process completes, all temporary objects are cleaned up). Also, there is no locking associated with a temporary view, as a temporary view is never saved. This can result in improved performance, because there is no need for TurboIntegrator to wait for locks to be released before operating upon a temporary view. Converting your processes from using persistent views to temporary views if accomplished simply by setting a parameter value.

SQL Inefficiencies

If you are executing Structured Query Language (SQL) commands from within a TurboIntegrator process, inefficiently written commands can lead to database – and your model’s – performance issues. The some of the most common mistakes I have come across include:

Using Select * (over fetching) – a better option is to explicitly select only the columns that are needed.
Overusing wildcards – to improve performance when searching for patterns in text fields, it is better to use an index if there is one or, if you must use a wild card search, trailing (rather than leading) wildcards are always more efficient.
Using an ORDER BY with enormous amounts of data. If you need to sort a potentially large result set, try introducing a LIMITclause (along with the ORDER BY) so that the database only sorts the necessary rows.
Using subqueries (a query that appears inside another query) which can typically be slow. Consider rewriting SQL using a JOIN, which is typically faster to execute.
Using OR in a WHERE. Your DBA will tell you that the OR operator can cause a full table scan, which is very inefficient.

You do not have to be a SQL guru to perform most of the above optimizations but if you are unsure, it is always worth a conversation with your database team.

User-imposed Object Locking

Another area which can potentially impact your model’s performance is user privilege. For example, do multiple users have the ability (either by clicking an action button in Planning Analytics Workspace, running them directly through Planning Analytics for Excel or Architect) to initiate copy, load (or other) processes? Granting user access to processes for “on demand,” “self-service,” seems like a good idea however running (and perhaps re-running) a process may restrict or throttle other users within the model, depending upon what the process or processes are doing. Additionally, those privileged users may be unaware of any scheduled chores in the model, increasing the risk of creating Lock contention.

Another area of risk is when you grant Lock Privileges to users, allowing them to actually set and release locks on objects. Although not typically a mainstream practice, user object locking can cause unexpected behaviors, impacting performance.

Establishing and publishing a daily chore schedule, restricting the number of privileged users (including those that can lock objects) and educating those users on what the processes do and the potential effect on performance are all highly recommended for maintaining your model’s good health.

Frequency, and Scheduling

One of the many exciting things that Planning Analytics offers is real-time calculation, consolidation, and reporting. The perceived “need for speed” often drives the design and operation of a model, and can result in higher levels of resource consumption, slower response at times and higher probability of locking.

To keep your model healthy and performing well, review your design and consider the following:

How often do you need to load external data to your model?
Can any data be incrementally loaded (ingesting only the source data that became available or has changed since the previous load)?
Can business logic and/or the processing of data be compartmentalized in “real-time cubes” keeping results or transformed data in “finalized” or “reporting” cubes?
Can access be restricted to cubes and other objects that have higher levels of real-time calculations or other processing? For example, providing simple “input cubes” to users that periodically “sweep” inputs and edits to a reporting cube.
Avoid “trickle processing” – meaning consider performing updates as part of a scheduled batch, rather than in a “constant loop.”
Make scheduled copies of cubes that have higher levels of real-time calculations and use the copy for “reporting only” users.
Consider offering a read-only copy of a Planning Analytics server for reporting consumption.

Conclusion

Hopefully, the information provided here was helpful, and makes you want to learn more, if so, QueBIT’s Knowledge Base is a go-to resource for expert insights on data management, planning analytics, and more.

Selecting, Copying and Loading Performance in IBM Planning Analytics

Selecting, Copying and Loading

What We Do

Industries & Roles

Technology

Resources

About Us