Understanding Dimensions
Dimensions give context to measures. (measures are the facts
or measureable values in the fact tables).
They give specific meaning or context to the facts.
The term dimension refers to the tables.
For example: a fact table contain numbers of “product sold”,
a DateDimension Table contain the following Dimensions of date pertaining to
the number of product sold
1.
Data and time(12/4/14
11:12:20)
2.
Quarter(4)
3.
DayofYear(321)
4.
Weekday(Thursday)
5.
Week(44)
Each individual column in a dimension Table is and
attribute. Attribute usually compress or expand data details. Data can be ‘discretized’
into smaller, summarized group.
By Discretized we mean that if we have a daysOfYear
attribure and we were needed to draw a graph based on this data it would be
difficult, so we summarized this data into number of weeks of number of month
so our graph would decrease from 365 to 12 graphs.
Data can also be ‘drilled into’ for more detailed
information like hours of the day, minutes of the hour, seconds etc.
Dimension Table Types;
A Dimension table usually stores more than just attributes,
it store data that is not in fact table.
A dimension table can have at least 5 column types
·
Attribute column
o
It gives context to the
measures
o
Used by tools to create
pivot tables, drill down etc.
o
E.g. day of the week, hour of the day,
·
Name column
o
Used to make the report
data easier to read
o
Provide human readable name
to the entities like costumer, product, order etc.
·
Key column
o
Used to uniquely identify
entities and establish relationships
·
Member Property column
o
Data included to descriptive
use on reports, etc.
o
Like addresses, phone no,
descriptions etc.
·
Lineage column
o
Used to store auditing,
source info
Slowly Changing Dimensions
Dimension data may change from time to time. For example we
have a customer last name, for female , and she got married so her last name
got changed. Or a costumer who is living in A city and has been buying products
from us for two years and now she went somewhere else, B city for example. With
that simple change our data about a costumer can completely change, now we will
say that the costumer buy products from B city because we have made changes to
the table and this can cause problem for us.
Now for Slowly Changing Dimensions (SCD), we have to
solutions.
1.
Type 1
2.
Type 2
Type 1
We use type 1 when we don’t want the past data to be used or
accessed. We simply override the data in dimension table. That is the most
simplest solution for SCD. It depend upon the business you are dealing with.
Type 2
In this type we use some extra columns in order to separate the
two rows in the columns. That is, when we have similar keys for a costumer and
we want to maintain them we use a Surrogate key also known as Data Warehouse
key in the table. We use when we need to maintain the past data and current
data in the data warehouse.
No comments:
Post a Comment