This week I’ve been configured, Azure HDInsiight with Hadoop cluster. I tested with raw data set of flight delay and processing the data with Hadoop cluster.
The functionality of the Hadoop is batch query and analysis of store data. The number of nodes in Hadoop cluster is Head node (2) and, data node (+1), and default VM size.
Requirement and services for configuring Hadoop Cluster:
Cluster tier: HDInsight Services tiers
Microsoft Azure offers the big data cloud in two services tiers such as Standard and Premium. In the testing during, I am using Standard tier and Linux operating system.
The default node configuration and virtual machine sizes for clusters:
Head: Default VM size, D3 v2, recommended VM size: D3 v2, D4 v2, D12 v2
Worker: default VM size, recommended VM sizes: D3 v2, D4 v2, D12 v2
Supported HDInsight Versions: Highly available clusters with two head nodes and one worker are deployed by default for HDInsight 2.1 and above. The latest available version HDInsight version 3.6 and Hortonworks Data Platform: 2.6, Apache Hive & HCatalog: 1.2.1
Default version: HDInsight 3.5, Hortonworks Data Platform: 2.5, Apache Hive & HCatalog: 1.2.1
To check the version of current Hadoop component for windows based clusters location: “C:\apps\dist” directory after logging remotely to a cluster.
default container: prj702test-2017-06-29t01-14-39-221z
Hadoop Cluster default type: Two head nodes and four data nodes
In my testing purpose and minimising cost, I am using one data node and one head node.
Note: The cost of HDInsight is calculated by the number of nodes and VM sized for both nodes. The bill for node usage for as long as the cluster exists. The billing starts when a cluster is created and stop when the cluster is deleted. Cluster can’t be de-allocated or put on hold.
Note: A and D1-4 Series VMs: General-purpose Linux VM sizes
D11 -14 Series VM: Memory-optimized Linux VM sizes
Summary of configuration
Thank you 🙂