Why You Probably Need a Data Warehouse in Your Database

Ian Kiprono
Stackademic
Published in
2 min readApr 29, 2024

--

An Image showing a tall building and a picture of relational database in white color. Image Credits:https://twitter.com/deisbel/status/1726636951548096816

Big companies, generating millions of data records daily, have to be creative and mend traditional rules. The relational database paradigm, once upon a time the champion, is no longer an option. For instance, Amazon creating a report about the best seller products for each month in the last two years.

Or a Bank creating a report showing the average monthly balance for their clients. For how long do you think these queries will execute? There is a possibility of them exploring millions or billions of records each time. Then imagine doing the same calculations but then grouping them by cities, states, or races.

The most viable solution would be instead of using the original DB for the expensive calculations, we can create a new summarized database. We will then have a SalesByMonths table instead of having a Sales table.

During end month, we will need to extract the summary of sales for each product and insert them into the other database. If need arises, we might need to create a report SalesByCityAndMonths, we will also create a new table. At the end of the month, we will extract its data from sales and populate SalesByCityAndMonths.

But we still have one problem, the sales table can and will grow forever. Then, once we have the summarized data, we can start cleaning the older data from the original table sales. Let’s say that after 2 years, for example, we no longer need to keep the original records, so we can safely remove them from sales.

In the worst case scenario, you can create a backup of the deleted data, but never on the production copy of the database. As shown, big data producers must break the normalization and data redundancy rules. They created a new database paradigm where the speed of reading data and for reporting services is most important. This is the fundamental reason why Data warehouse were invented and their core idea. More information can be found here.

Thank you for stopping by and see you in the next one.

Stackademic 🎓

Thank you for reading until the end. Before you go:

--

--

Java Software Engineer and a Problem solver @ pinterest. Putting my random thoughts out there. The world is beautiful and I am on a mission to make it better.