Squeezing more performance out of a large, concatenated fact table
by Mike Steedle, on Aug 21, 2016 11:00:20 AM
DID YOU KNOW?
- Several years ago, I discovered that joining to a resident table in QlikView causes the resulting table to be ordered by the key field, in that field's original load order. See screenshot below. This would be like if you executed an ORDER BY and had an option for ascending original load order as a sequence.
- I also discovered that, similar to indexing a table in a database, ordering a table by a field commonly used in calculations, filters, and dimensions can improve the general calculation performance of an application. I found this to be especially true when ordering a table containing a dimension that was referenced in an Aggr() function. In general, though, for large applications with various point-in-time analyses, trend graphs, and no problematic Aggr usage, the central date field is a great candidate.
Preview of data model after joining Customer to Sales fact table on CustomerKey:
Recently I got around to doing some more testing involving both variables and codifying this in a subroutine so it would be easy to leverage more widely to improve application performance.
To the naked eye, the three tested scenarios have identical data models, anchored with a 39M-row concatenated fact table: the only difference is in how that fact table was sorted in the final application.
- "Field Grouping" is grouping like field values together, as would result from a join, not necessarily in a natural sequence for analysis. See #1 from above. The first scenario is what I believe the default would be if you simply built a concatenated fact table with no consideration given to table sorting at all. The second scenario is left joining the distinct Date values back to the fact table in order to group all rows for the same Date together in the final application.
- "Field Ordering" is the result of doing a true ORDER BY at the end of the reload. See #2 above. In the third scenario, the like Date values are not just grouped together but are also in sequential numeric order. I thought this may have made a difference because it would make all of the days in the same month, year, etc. adjacent in the application's final data model.
|Size on disk (compressed)||267MB||263MB||282MB|
|Size in RAM||2,473MB||2,474MB||2,474MB|
|Initial page calc time (avg)||5.9s||3.9s||3.9s|
|Peak RAM consumption during subroutine||-||3,738MB||4,645MB|
|Subroutine run time||-||29s||2m40s|
As you can see, in my test, applying logic to the fact table to group like Date values together improved performance by a full third, lowering calculation time from 5.9 to 3.9 seconds. Not bad for so little additional work and reload time.
As you can also see, I did not observe better performance from ordering the dates sequentially (scenario 3) compared to just grouping the like values together (scenario 2). Because that script's execution took significantly longer and used a lot more RAM (resident loading the entire concatenated fact table again, with an ORDER BY), I would stick with the lower overhead field grouping approach.
Here's the subroutine script I used for testing, ready to be re-used. You just have to plug in the table name and the field whose values you want to group.
Sub Ax.IndexResidentTable( v_TableName, v_GroupFieldName )
LET v_StartSubTimestamp = Now();
TRACE [MESSAGE]: Grouping like '$(v_GroupFieldName)' field values in table '$(v_TableName)' to improve calculation performance;
LEFT JOIN ([$(v_TableName)])
LOAD FieldValue('$(v_GroupFieldName)',RecNo()) as [$(v_GroupFieldName)]
LET v_SubDuration = Interval(Now() - v_StartSubTimestamp, 'h:mm:ss');
TRACE [MESSAGE]: Finished grouping like '$(v_GroupFieldName)' field values in table '$(v_TableName)' in $(v_SubDuration);
LET v_TableName = ;
LET v_GroupFieldName = ;
LET v_StartSubTimestamp = ;
LET v_SubDuration = ;
Here is what you see when you call the sub:
That said, if you are already normalizing the data model to reduce hops between tables by combining tables, this subroutine would not even be needed. Based on what we have learned about joins, we need only to join the calendar table to the fact table last in the overall sequence of joins to group the like date values together in the final data model.