The need for big data systems has led to the introduction of numerous new technologies for storing data, including file systems, such as Amazon S3, Apache Hadoop, and Microsoft Azure Data Lake, and also modern SQL systems, such as Amazon Athena, Google BigQuery, and Snowflake. The data storage landscape keeps changing for data scientists.
New regulations for data privacy and protection, such as GDPR, define and limit what organizations are allowed to do with their data, which type of data can be stored, how long it can be kept and for which purposes it may be used. This complicates and restricts the analytical work of data scientists.
In this whitepaper, the author Rick van der Lans explains how a modern data architecture can help data scientists to be faster and to work more efficiently. More specifically, he describes a flexible data architecture, called the logical data lake, in which data virtualization acts as the general entry point for data scientists to access data.