The Traditional Data Warehouse And The ETL Approach

This blog post will provide more information about a traditional data warehouse and illustrates advantages and disadvantages of such a technology. Furthermore, a deeper explanation of an ETL process is given and explained, why ETL-based data warehousing projects became infamous.

In a typical IT environment, traditional data warehouses ingest, model, and store data through an extract, transform, and load process (ETL). These ETL jobs are used to move large amounts of data in a batch-oriented manner and are most commonly scheduled to run daily. Running these jobs daily means that, at best, the warehoused data is a few hours old, but it is typically a day or more old. Because ETL jobs consume significant CPU, memory, disk space, and network bandwidth, it is difficult to justify running these jobs more than once daily. In a time when APIs were not as prevalent as they are now, ETL tools were the go-to solution for operational use cases. With APIs now in the picture—and the sheer variety of data they represent—the ETL method is becoming impractical.

However, even before the era of APIs and big data, ETL tools posed significant challenges, mainly because they require comprehensive knowledge of each operational database or application. Interconnectivity is complicated and requires thorough knowledge of each data source all the way down to the field level. The greater the number of interconnected systems that are to be included in the data warehouse, the more complicated the effort is.

In this digital era, new requirements arise faster than ever before and previous requirements change as quickly making development agility and responsiveness necessary factors for success.

Weak Points of the ETL Data Warehouse Approach

As such, ETL-based data warehousing projects became infamous for appallingly high failure rates. When these projects don’t fail outright they are frequently plagued with cost overruns, and delayed implementations. Great care is needed to conceptualize the database and thoroughly define requirements to avoid having to re-work complicated and brittle connections, since tightly coupled interdependencies often trigger unpredictable and far-reaching impacts even when slight changes are made.

Another shortcoming of the ETL data warehouse approach is that the business staff rarely gets an opportunity to see the results until after several months of development work has been completed. By this point it is common that requirements have changed, errors have been discovered, or the objective of the project has shifted. Any of these variables might force IT back to the drawing board to collect new requirements, and in all likelihood months of development effort will be scrapped. In fact, Gartner estimated that between 70 and 80 percent of corporate business intelligence projects failed to deliver the expected outcomes.

Data warehouses were originally built for operational reporting rather than for interactive data analysis and using a traditional data warehouse for analytic queries requires carefully building just the right structure and performing extensive and specific performance optimization. If you later decide to use the data differently, you must change the data structure and re-optimize, which is a very cumbersome and costly process.

Problems of the Traditional ETL Approach

The inherent problems of the traditional ETL approach is compounded by the sheer number of data sources available and the myriad ways to access data such as the proliferation of APIs that rely on importing and exporting data, each of which has its own access protocol. While it’s technically possible to implement this sort of connectivity through ETL, the actual implementation would be overly complex, difficult to maintain, and costly to extend, problems that are made worse if the APIs do not use data exchange standards such as ODBC or JDBC.

In this digital era, new requirements arise faster than ever before and previous requirements change as quickly making development agility and responsiveness necessary factors for success. Because of issues like these, traditional data warehouses simply can’t cope with the needs of today’s businesses and related overall digital transformation trends. Because of the shortcomings of the traditional data warehouse approach, new approaches to data processing emerged and what came next was multi-dimensional OLAP methodology.

Advantages and Disadvantages of a Traditional Data Warehouse

2016-10-18-btdl-blog-post-2

Learn more about how to evolve from classical ETL and a traditional data warehouse to a logical data warehouse. Get your free eBook now.