Week Eight Activities: Configure Azure HDInsight with Hadoop cluster

This week I’ve been configured, Azure HDInsiight with Hadoop cluster. I tested with raw data set of flight delay and processing the data with Hadoop cluster.

The functionality of the Hadoop is batch query and analysis of store data. The number of nodes in Hadoop cluster is Head node (2)  and, data node (+1), and default VM size.

1

Requirement and services for configuring Hadoop Cluster:

Cluster tier: HDInsight Services tiers

Microsoft Azure offers the big data cloud in two services tiers such as Standard and Premium. In the testing during, I am using Standard tier and Linux operating system.

The default node configuration and virtual machine sizes for clusters:

Head: Default VM size, D3 v2, recommended VM size: D3 v2, D4 v2, D12 v2

Worker: default VM size, recommended VM sizes: D3 v2, D4 v2, D12 v2

Supported HDInsight Versions: Highly available clusters with two head nodes and one worker are deployed by default for HDInsight 2.1 and above.  The latest available version HDInsight version 3.6 and Hortonworks Data Platform: 2.6, Apache Hive & HCatalog: 1.2.1

Default version: HDInsight 3.5, Hortonworks Data Platform: 2.5, Apache Hive & HCatalog: 1.2.1

To check the version of current Hadoop component for windows based clusters location: “C:\apps\dist” directory after logging remotely to a cluster.

Configure Snapshot:2.jpg

2-3.jpg3.jpg

4.jpg

default container: prj702test-2017-06-29t01-14-39-221z

5.jpg

6.jpg

Hadoop Cluster default type: Two head nodes and four data nodes

7.jpg

In my testing purpose and minimising cost, I am using one data node and one head node.

Note: The cost of HDInsight is calculated by the number of nodes and VM sized for both nodes. The bill for node usage for as long as the cluster exists. The billing starts when a cluster is created and stop when the cluster is deleted. Cluster can’t be de-allocated or put on hold.

11.jpg

12.jpg

13.jpg

Note: A and D1-4 Series VMs: General-purpose Linux VM sizes

D11 -14 Series VM: Memory-optimized Linux VM sizes

14.jpg Summary of configuration

15.jpg

16.jpg

16.jpg

17.jpg

18.jpg

19.jpg

20.jpg

Thank you 🙂

 

 

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s