We will look at a traditional data warehouse and illustrate advantages and disadvantages of this technology. We will then turn to the ETL process and explain why ETL-based data warehousing projects became infamous.
The Traditional Data Warehouse and ETL
In a typical IT environment, traditional data warehouses ingest, model, and store data through an Extract, Transform, and Load process (ETL). These ETL jobs are used to move large amounts of data in a batch-oriented manner and are most commonly scheduled to run daily. Running these jobs daily means that, at best, the warehoused data is a few hours old, but more typically, it is a day or older. Because ETL jobs consume signiﬁcant CPU, memory, disk space, and network bandwidth, it is diﬃcult to justify running these jobs more than once per day. In a time when Application Programming Interfaces (APIs) were not as prevalent as they are now, ETL tools were the go-to solution for operational use cases. With APIs now in the picture—and the sheer variety of data they represent—the ETL method is becoming impractical.
However, even before the era of APIs and big data, ETL tools posed signiﬁcant challenges, mainly because they require comprehensive knowledge of each operational database or application. Interconnectivity is complicated and requires thorough knowledge of each data source all the way down to the ﬁeld level. The greater the number of interconnected systems that are to be included in the data warehouse, the more complicated the eﬀort is.
In this digital era, new requirements arise faster than ever before, and previous requirements change just as quickly, making development agility and responsiveness necessary factors for success.
Integrate all your data sources and connect to any BI tool with Data Virtuality
Weak Points of the ETL Data Warehouse Approach
As such, ETL-based data warehousing projects became infamous for appallingly high failure rates. When these projects don’t fail outright they are frequently plagued with cost overruns, and delayed implementations. Great care is needed to conceptualize the database and thoroughly deﬁne requirements to avoid having to re-work complicated and brittle connections, since tightly-coupled interdependencies often trigger unpredictable and far-reaching impacts, even when small changes are made.
Another shortcoming of the ETL data warehouse approach is that the business staﬀ rarely gets an opportunity to see the results until after several months of development work has been completed. By this point, requirements have probably changed, errors have typically been discovered and the overall objective of the project may even have shifted. Any of these variables might force IT back to the drawing board to collect new requirements, and, in all likelihood, months of development eﬀort will be scrapped. In fact, Gartner estimated that between 70 and 80 percent of corporate business intelligence projects failed to deliver the expected outcomes.
Data warehouses were originally built for operational reporting, rather than for interactive data analysis, so using a traditional data warehouse for analytic queries requires carefully building just the right structure and performing extensive and speciﬁc performance optimization. If you later decide to use the data diﬀerently, you must change the data structure and re-optimize, which is a very cumbersome and costly process.
Problems of the Traditional ETL Approach
The inherent problems of the traditional ETL approach are compounded by the sheer number of data sources available and the myriad ways to access data, such as the proliferation of APIs that rely on importing and exporting data, each of which has its own access protocol. While it’s technically possible to implement this sort of connectivity through ETL, the actual implementation would be overly complex, diﬃcult to maintain, and costly to extend, problems that are exacerbated if the APIs do not use data exchange standards such as ODBC or JDBC.
In this digital era, new requirements arise faster than ever before and previous requirements change just as quickly, making development agility and responsiveness necessary factors for success. Because of issues like these, traditional data warehouses simply can’t cope with the needs of today’s businesses and related overall digital transformation trends. Because of the shortcomings of the traditional data warehouse approach, new approaches to data processing emerged, including the multi-dimensional OLAP methodology.
Advantages and Disadvantages of a Traditional Data Warehouse
You want to know more about the latest data integration approaches? Our free ebook is your personal companion and provides you with new insights about modern data integration. Increase your business’s efficiency today!