Share via


What purpose does a staging database serve relative to the data warehouse?

Question

Friday, August 23, 2013 1:59 AM

As Im getting my feet wet with my first exposure to building and populating a data warehouse, I am seeing a lot of references to the use of a "staging" database.  Where would this be beneficial? 

All replies (4)

Friday, August 23, 2013 2:07 AM ✅Answered | 1 vote

A staging database or area is used to load data from the sources, modify & cleansing them before you final load them into the DWH; mostly this is easier then to do this within one complex ETL process.

Olaf Helper

[ Blog] [ Xing] [ MVP]


Monday, August 26, 2013 6:49 AM ✅Answered | 2 votes

Hi,

A staging database is used as a "working area" for your ETL.  Olaf has a good definition: A staging database or area is used to load data from the sources, modify & cleansing them before you final load them into the DWH; mostly this is easier then to do this within one complex ETL process.

One of the objectives of the staging area is to facilitate restartability and minimise the impact the extraction has on your source system.  You extract data from your source system only once and store a copy of it in your staging database.  Should your ETL fail further down the line, you do not need to impact your source system by extracting the data for a second time.  If you store the results of each logical step of your transformation in staging tables, you can restart you ETL from the last successful staging step.

Hope this was useful :)

Regards,

Marius  (Microsoft BI Solutions Architect)
Please use Marked as Answer if my post solved your problem and use Vote As Helpful if a post was useful.


Friday, August 23, 2013 5:55 AM | 1 vote

Olaf is completely correct.

In most Data Warehouse situations data can come from all manner of sources.  In one instance I work with we receive numerous csv files which have excess columns, records we're not interested in etc.  Therefore you would load these files in full into a staging database.

Once in the staging database we then run data cleansing, remove the records and fields that we are not interested in, correct some formatting, and add IDs relevant to the company...  then the resulting dataset is loaded into the data warehouse.

Hopefully that makes sense.


Friday, August 23, 2013 10:51 AM

Okay, kind of what I had suspected.  However, its probably not a real show stopper if you don't have one, at least until the data flow and processing times are getting large.  In the case of a smaller company, using an SSIS package scheduled as a job to do the work of pulling the data in directly from the sources, doing a few transformations, then loading the DW should be fine.