Azure Data Warehouse services are offered via a collection of resources in Azure. These are the Data Lake, Synapse Analytics, HDInsight, and Databricks services. All of these services fall into common Data Warehouse scenarios. Data Warehouse is referring to big data analytics, artificial intelligence, cloud storage, and other applications related to big data. For example, HDInsigt works with Hadoop, Spark, Hive, LLAP, Kafka, Storm, and more for big data storage and processing. In this tutorial, we’ll look at each of the data warehouse components in Azure and understand what each resource offers.
What is Big Data?
Before we look at some of the services in Azure for data warehouses, let’s talk about Big Data. What is it? Big Data is a term used to describe massive volumes of structured or unstructured data that is so large it is hard to process with traditional database and software techniques. Therefore we need special services just to handle the sheer size of the data. The following tools help us with that.
Azure Data Lake
Data Lake enables big data analytics and artificial intelligence while offering cloud storage that is less expensive than relational database cloud storage. Data Lake can store data from various business systems and data warehouses, as well as device and sensor data. Data Lake handles large volumes of structured and unstructured diverse data from diverse sources. In summary, Data Lake is a place to store large volumes of data inexpensively even if the data is not all of the same types.
Azure Synapse Analytics
Azure Synapse Analytics is an integrated analytics service that accelerates time to insight across data warehouses and big data systems. Azure Synapse Analytics was formerly known as Azure SQL Data Warehouse and is the core solution when thinking about Data Warehousing.
- Formally known as SQL Data Warehouse
- Enterprise Data Warehousing and Big Data Analytics
- Used to run SQL queries against large databases for use cases like reporting
HDIsight is used for processing huge amounts of data. This is done using cloud distributons of Hadoop components. HDIsight supports popular open-source frameworks like Hadoop, Spark, Hive, LLAP, Kafka, Storm, and R.
- Is named after Hadoop
- For use with open source analytics software
- Spark, Kafka, Hadoop, etc…
Databricks offers two environments for developing data intensive applications. The first is Azure Databricks SQL Analytics and the second is Azure Databricks Workspace.
- Made by the creators of Spark
- Apache Spark-based analytics platform
- Optimized for the Azure Cloud Platform
- Third-party Databricks services supported in Azure
- Can serve as a data source for machine learning algorithms.
What Is Data Warehouse For?
A data warehouse service is appropriate for when there is a need to turn huge amounts of data from existing systems into an easier-to-digest format. Strict data structures are not a requirement in a data warehouse solution. There is flexibility in column naming and the ability to reformat the schema of data that simplifies relationships and consolidates multiple tables into one. This approach puts understanding of data within the reach of administrators that are not data developers by trade.
- Data mining tools can find hidden patterns in the data using automation.
- Able to store historical data from multiple sources, representing a single source of truth.
- Allows the transactional system to focus on handling writes, while the data warehouse satisfies the majority of read requests.
- Data quality can be improved by cleaning up data as it is imported into the data warehouse.
- Data warehouses make it easier to create business intelligence solutions.
- A data warehouse can consolidate data from different software.
- Makes it easier to provide secure access to authorized users, while restricting access to others.
Learn More About Azure Data Warehouse
- Introduction To Azure Data Lake Storage (docs.microsoft.com)
- Azure Data Lake (mindmajix.com)
- Data Lake Azure (dremio.com)
- Azure Synapse Analytics (docs.microsoft.com)
- Learning Path Azure Synapse Analytics (5minutebi.com)
- Microsoft Azure Implementing Cloud Data Warehouses (pluralsight.com)
- Azure Hdinsight Hadoop Apache Hadoop Linux Tutorial Get Started (docs.microsoft.com)
- R Server Hdinsight (bluegranite.com)
- Getting Started With Hdinsight (mssqltips.com)
- Azure Databricks Scenarios Databricks Extract Load Sql Data Warehouse (docs.microsoft.com)
- Azure Databricks Hands On (medium.com)
- A Beginners Guide To Azure Databricks (sqlshack.com)