Overview
A Slowly Changing Dimension (SCD) is a dimension in the data warehouse where current and historical data that changes over time is stored. This is considered as one of the most critical tasks in an Extract, Transform, and Load (ETL).
The backbone of any data analytics is a robust, stable, and scalable dimensional data model. Often data models are faced with challenges when it comes to transitional dimensions.
We will discuss and find a solution on how a SCD Type-2 can be maneuvered in Qlik.
What is Slowly Changing Dimension Type-2
Before proceeding, it is crucial to clearly understand what a SCD Type-2 entails. Effectively managing this slowly changing dimension is pivotal to ensuring accurate reporting. To illustrate, let us consider the case of an employee who undergoes frequent role changes within an organization. Whenever an employee assumes a new role, a record in the employee dimension includes relevant information such as their Employee Number, new Title, Start Date, End Date, and Active Flag. Typically, the Start Date for the new role will be one day after the End Date of the previous role, while the End Date for the new role is either left null or set as a future default date, such as 31-Dec-9999. In this context, we want to obtain the Point-in-time Title of an employee who attended a committee meeting on a specific Committee Date.
Such a scenario can be easily handled in SQL using the between operator, where the query will look something like the below to fetch the current role of an Employee-
Select EmployeeNo, FullName, Title From dbo.DimEmployee where EmployeeNo = 12345 and getdate() between EffectiveDate and [End Date];
When scripting in Qlik, between function is not available and that is when we use IntervalMatch().
The IntervalMatch() function is used to match a particular value in one table – it could be a numerical number or a date, to a range in another table.
Let us now look at the script.
- We first load the fact table.
- Loading the DimEmployee table which contains the Title which is a SCD.
- In the script below, we are using the extended syntax for IntervalMatch() where there are two parameters passed:
- CommitteeDate, the match field, is the field, we are comparing against the date period when the Employee’s Title was effective.
- Cmtt_EmpNo, the Key field. Since, in this scenario, we are looking at the Title effective period for an Employee, we must pass the EmpNo as well for join. Where this is used is something that will be shown in scrips to follow.
At this point the table, bridge table – – IntervalMatch has the data for Employee Title Effective date and End date whenever the employee attended a committee meeting.
- Finally, we join the bridge table with the fact, where the EmployeeNo and the period field will involve in the join to get the expected data.
Wrapping Up
In conclusion, the proper handling of a Slowly Changing Dimension (SCD) is critical to accurate and reliable data modeling. Without proper management of SCDs, stakeholders may encounter inaccuracies and inconsistencies when trying to analyze data, leading to suboptimal decision-making. According to a survey conducted by Experian Data Quality, 77% of companies believe that inaccurate data is a key challenge to their business operations. In addition, a study by Gartner found that poor data quality costs organizations an average of $15 million per year in losses. Therefore, it is essential to implement best practices for managing SCDs, such as tracking changes over time, properly categorizing different types of changes, and establishing clear protocols for updating and maintaining data. By prioritizing the correct handling of SCDs, organizations can ensure that their data is reliable, accurate, and actionable, ultimately leading to improved business outcomes.
Let our expert Qlik Consulting Services guide you through implementing Slowly Changing Dimension. Discover how to turn your data into a strategic asset with our innovative solutions.