We will look at a traditional data warehouse and illustrate advantages and disadvantages of such a technology. We will then turn to the ETL process and explain,, why ETL-based data warehousing projects became infamous.
The Traditional Data Warehouse and ETL
In a typical IT environment, traditional data warehouses ingest, model, and store data through an Extract, Transform, and Load process (ETL). These ETL jobs are used to move large amounts of data in a batch-oriented manner and are most commonly scheduled to run daily. Running these jobs daily means that, at best, the warehoused data is a few hours old, but more typically, it is a day or older. Because ETL jobs consume significant CPU, memory, disk space, and network bandwidth, it is difficult to justify running these jobs more than once per day. In a time when Application Programming Interfaces (APIs) were not as prevalent as they are now, ETL tools were the go-to solution for operational use cases. With APIs now in the picture—and the sheer variety of data they represent—the ETL method is becoming impractical.
However, even before the era of APIs and big data, ETL tools posed significant challenges, mainly because they require comprehensive knowledge of each operational database or application. Interconnectivity is complicated and requires thorough knowledge of each data source all the way down to the field level. The greater the number of interconnected systems that are to be included in the data warehouse, the more complicated the effort is.
In this digital era, new requirements arise faster than ever before, and previous requirements change just as quickly, making development agility and responsiveness necessary factors for success.
Weak Points of the ETL Data Warehouse Approach
As such, ETL-based data warehousing projects became infamous for appallingly high failure rates. When these projects don’t fail outright they are frequently plagued with cost overruns, and delayed implementations. Great care is needed to conceptualize the database and thoroughly deﬁne requirements to avoid having to re-work complicated and brittle connections, since tightly-coupled interdependencies often trigger unpredictable and far-reaching impacts, even when small changes are made.
Another shortcoming of the ETL data warehouse approach is that the business staff rarely gets an opportunity to see the results until after several months of development work has been completed. By this point, requirements have probably changed, errors have typically been discovered and the overall objective of the project may even have shifted. Any of these variables might force IT back to the drawing board to collect new requirements, and, in all likelihood, months of development effort will be scrapped. In fact, Gartner estimated that between 70 and 80 percent of corporate business intelligence projects failed to deliver the expected outcomes2. /Source http://www.computerweekly.com/news/1280094776/ Poor-communication-to-blame-for-business-intelligence-failure-says-Gartner
Data warehouses were originally built for operational reporting, rather than for interactive data analysis, so using a traditional data warehouse for analytic queries requires carefully building just the right structure and performing extensive and speciﬁc performance optimization. If you later decide to use the data differently, you must change the data structure and re-optimize, which is a very cumbersome and costly process.
Problems of the Traditional ETL Approach
The inherent problems of the traditional ETL approach are compounded by the sheer number of data sources available and the myriad ways to access data, such as the proliferation of APIs that rely on importing and exporting data, each of which has its own access protocol. While it’s technically possible to implement this sort of connectivity through ETL, the actual implementation would be overly complex, difficult to maintain, and costly to extend, problems that are exacerbated if the APIs do not use data exchange standards such as ODBC or JDBC.
In this digital era, new requirements arise faster than ever before and previous requirements change just as quickly, making development agility and responsiveness necessary factors for success. Because of issues like these, traditional data warehouses simply can’t cope with the needs of today’s businesses and related overall digital transformation trends.
Because of the shortcomings of the traditional data warehouse approach, new approaches to data processing emerged, including multi-dimensional OLAP methodology.
Advantages and Disadvantages of a Traditional Data Warehouse
You want to know more about the latest data integration approaches?
“Beyond the Data Lake” is your personal companion and provides you with new insights about the modern data integration. Get this ebook for free and increase your efficiency.