Week Four Activities: Project Proposal Defence Experience

On 26 May 2017 was my Project Proposal Defence Seminar. As you know my project title “Implementing an end-to-end scalable analytical suite to simplify the business decision-making for classified Big data sets using Public Cloud Services “. It was a powerpoint presentation.

In this presentation, I shared my project implementing ideas with my classmates and my project supervisor. I discussed the following agendas:

  • Project Title
  • Project Goals
  • Background of Project
  • Implementation Plan
  • Anticipated Outcome

After completing my presentation, My classmates gave me some useful suggestion that I was not considered before. I also received some questions from them that will help to write my project report.

Question from my classmate Karan:

What are your role in the project and my contribution?

Question from Ravi:

What technology is used for stream data analysis?

Question from Bhawana:

What would you use for your dataset? and who would be benefited from this project?

Apart from that, I also received the most important suggestion from my supervisor. He guided me how to start my project, what I should add in my presentation when the audience is not familiar with the topic.  This feedback will help me when I will be going to do my final project defence seminar and poster presentation.

From the week five, I am going to start Phase Two: Report writing and Implementation of the project.

Thank you 🙂

Week Four Activities: Preparation of Proposal Defence

In week three, I submitted my project proposal. This week (21-26 May 2017), I am getting ready for my project proposal defence seminar, also I am waiting for the result of project committee review on my proposal.

In this week, I study on big data analysts from varies sources for preparing myself for defence my project topic.

Reference areas are big data analytics in cloud computing and different tech domain of cloud services. Some are I am sharing here that I feel useful:

https://www.ibm.com/blogs/cloud-computing/2014/02/cloud-computing-and-big-data-an-ideal-combination/

https://www.computer.org/csdl/mags/co/2015/03/mco2015030020.html

http://theinstitute.ieee.org/ns/quarterly_issues/tisep14.pdf

Besides study on the topic, I am also preparing my powerpoint presentation that will hold on 26 May 2017.

 

Thank you 🙂

 

 

Lab 2: Introduction to R Vectors

In this lab, I will take you a trip to the Statisticians paradise where we will learn how to keep track of betting progress and how to do some analyses on past actions. We will also learn some data-analytical skills for uplift performance at the tables and fire off a career as a professional gambler!

Note: R is case sensitive!

The task lists for the lab 2:

1.  Create a vector

2. Naming a vector

3. Calculating the vector (Arithmetic)

4. Comparing the vector (logical)

Steps for the task: Create a vector

Let’s go ‘Vegas!’ Vegas variable declaration where the value is “GO!” 1.jpg

Step 1: Vectors are one-dimension arrays that generally hold numeric, character or logical data or you can say, a vector is a simple tool to store the data.

In this lab, I will use a

Create a vector with the combination function c() in R, and put the vector elements separately by a comma between the parentheses.

R command for vectors:

numeric_vector <- c(1,2,3) or c(1:3)

charcter_vector <- c(“a”,”b”,”c”)

After creating vectors, we can use them to do calaculations.

2.jpg

Step 2: Declare a boolean_vector that contains the three elements, True, False, and True in this order.

R command: boolean_vector <- c(TRUE, FALSE, TRUE)

3.jpg

Step 3: In this step, we will use our data analytical superpowers on our last week winning and losses games of poker and roulette in Las Vegas.

Firstly, we need to collect all the winnings and losses for the last week

For Poker_vector:

  • On Monday, won $140
  • Tuesday, lost $ 50
  • Wednesday, won $20
  • Thursday, lost $120
  • Friday, won $240

For roulette_vector:

  • Monday, lost $24
  • Tuesday, lost $50
  • Wednesday, won $100
  • Thursday, lost $350
  • Friday, won $10

Secondly, We need to create the variables poker_vector and roulette _vector for winning /losses poker and roulette to use those data in R

4.jpg

Steps for the task: Naming a vector

As a data analyst, It is very important to understand the behaviour of data that you are using, and what each element refers to.

In the previous task, we created a vector.In this task, We will show Naming a vector. You need to give a name to the elements of the vector with the names() function.

Step1: Vector itself would show which day it belongs. Assign days as names function of poker_vector and roulette_vector

7

An exercise of names functions: Suppose, You created a vector some_vector, values are ” John Doe”, Poker Player”. Then you declare names () function for the vector element that assigned the first element = Name and second element = Profession.

6.jpg

Step 2: In this steps, we will see how efficient way we can declare the naming function rather than typing and re-typing information again and again.

Firstly, we will assign the days of the week vector to a variable days_vector that will contain the days of the week for poker and roulette game.

Secondly, we will use days_vector to set the names of poker_vector and roulette vector.8

Steps for the task: Calculating the vector (Arithmetic)

In this task, we will do some data analytical magic on the poker and roulette. We will calculate the total winnings of the games because it is time to get those Ferraris in our garage, right!

Step 1: Arithmetic calculations on vectors

In this step, we will know how to sum two vectors in R. It does usually the element-wise sum.

For instance, c (1, 2, 3) +c(4, 5, 6)

c(1+4, 2+5, 3+6)

c(5, 7, 9)

or do calculations with variables that represent vectors

a <- c (1, 2, 3)

b <- c(4, 5, 6)

c < – a+b9

Step 2: Calculating the total

Firstly, we need to understand what was our profit and loss per day of the week was.

Secondly, the total daily profit is the sum of the profit/loss that we realised on poker and roulette per day.

We can calculate the total winnings in R, just summation of the two vectors

Assign variable, total_daily = The Sum of (poker_vector + roulette_vector), the result will be combination of  how much we won or lost on each day in total poker and roulette

10.jpg

Step 3: In the previous step, we had the data a mix of good and bad days.

In this step, we will calculate the totals amount of won/loss for a week for poker and roulette. Because there may be very tiny chance we have lost money over the week in total?

total_week = sum of all gains and losses of the week

We will use a function called sum() to analysis total week gain and loss together. Sum() calculates the sum of all elements of a vector.

Total amount of money we won/lost with poker/roulette

total_poker <- sum(poker_vector)

total roulette <- sum(roulettle_vector)

Instructions:

  • Calculate the total amount of money that we won/lost with roulette and poker
  • Calculate the total week for poker and roulette

11

Steps for the task: Comparing the vector (logical)

In this task, we will change our data analysis strategy and we will do the deeper analysis for calculating more winnings.

Step 1: roulettleIn this step, we will compare two games:

Comparision symbols in R : <, >, <=, >=, =

Instructions for comparing,

  • Calculate total_poker and total roulette using the sum() function
  • Check if total gains in poker are higher than for  roulette by using comparison

12.jpg

 

Thank you 🙂

Lab 1: Introduction to R Mathematical and variable Assignment

In this lab, I will discuss how to start with R script, general mathematical operations and variable.

The task lists for the lab 1:

  • General Mathematical Operations
  • Variable assignment
  • Basic data types in R

Step for the task: The General Mathematical Operations

  • +, plus (addition)
  • −, minus (subtraction)
  • ×, times (multiplication)
  • ÷, obelus (division)
  • ^, exponentiation [written as an, involving two numbers, the base a and the exponent n, called n-th power of a]
  • %%, remainder

1

2

Steps for the task: Variable assignment

A variable is a basic concept in statistical programming. It allows storing a value or an object function in R. A programmer can use later the variables’name to easily access the value or object that is stored in that variable.

Step1: assign a value to a variable with the command

x= 5

R command: x

3.jpg

Noticed that: R doesn’t print the value of a variable to the console. x future, otherwise, you wouldn’t have stored the value in a variable in the first place.

Step2:  Suppose, you are a data analyst, you have an egg basket with 10 eggs. Now you need to store the number of eggs in a variable with the name. Number_of_eggs

R command: Number_of_eggs

4.jpg

Step3: Suppose, your egg basket is the combination of Chicken eggs, Bird eggs. As a data analyst, you need to assign variable by name and store the value to the variable, the calculate how many eggs you have in total in the basket.

R command:

Chicken_eggs

Bird_eggs

Total_eggs

5.jpg

Noted that: The advantage of doing calculations with a variable is re-usability. 

For example, If you change the value for the Chicken_eggs variable = 15 instead of the 7, Total_eggs will automatically update as well. 6

Remember that: Mathematical operations work numeric variables in R. If you try to add two variable and assigned a variable with a text value. It will show error. The addition of a numeric and a character variable is not possible.

R command: Chicken_eggs

                        Bird_eggs  <- ‘eight’

Total_eggs= Chicken_eggs + Bird_eggs = error: non-numeric argument to binary operator

7.jpg

Steps for the task: Basic data types in R

R works with the following data types:

  • Decimal values, for example, 4.5, are called numeric
  • Natural numbers ( 4), are called integers or numeric
  • Boolean values (True, False) are called Logical
  • Text or string values are called characters

Exercise 8.jpg

Exercise To avoid mismatch in data types, you can check the data type of the variable beforehand of operation with the class () function.

R command: class (my_numeric); it will show the data type of my_numeric

9.jpg

Thank you 🙂 

Week Three Phase-6 Activities

On 19 May 2017, Final day of my project proposal submission.

In the morning, I met my supervisor with my draft proposal. After a long discussion with him, I altered my project title.  

Title:   Implementing an end-to-end scalable analytical suite to simplify the business decision making for classified Big data sets using Public Cloud Services ”

He also gave me some suggestions to add few ideas in my final proposal and remove some topics from my draft proposal. I modified my draft proposal based on our discussion meeting before submission. 

At the evening, finally, I submitted my proposal. Now I am waiting for the final review from Project Committee. Wishing for the best! Let’s see what happens.

I would like to share my fundamental reference architecture here because It will give you an idea what I am going to do.1222.png

Thank you 🙂 

Week Three Phase-5 Activities

On the day 1, week three, 18 May 2017, I am studying on “Big Data” project in the different platform. As I chose “Big Data analysis in the cloud platform” for my project, It is completely a challenging topic to me. Therefore, I need to know more about it and enrich my knowledge to implement it.

First of all, I am searching which platform is best fit for this project and what is the main problem on the traditional platform.

Secondly, I am looking for the tools and method that is using the cloud platform for the big data analysis

Thirdly, I am trying to virtualize a realistic enterprise system where I will apply my project for the demo.

Finally, I am studying about Data warehouse and Business Intelligence.

Based on my all study and research, I am preparing my prosal that I have to submit tomorrow, on 19th May 2017.

The references link:

https://aws.amazon.com/big-data/what-is-big-data/

https://cloud.google.com/solutions/big-data/

https://azure.microsoft.com/en-us/solutions/big-data/

https://www-01.ibm.com/software/data/bigdata/

Thank you 🙂

Week Two Phase-4 Activities

As I mentioned before, I am going to implement “Database Migration Project” that I was selected last week, but something interesting and exciting I found suddenly when I was watching “ MicrosoftAzure webinar on Big data”. 

On 12 May 2017, I discussed my idea with my supervisor during one to one meeting and he said that between two ideas the last one is better. Therefore, I changed my topic. Therefore, I am going to implement a project on Big data analysis.

Title: “Implementing End to End classified Big data set analysis using Microsoft Azure cloud Services”

Now I am reading and researching on the big data analysis projects to get the idea of the implementation project as well as to finalise my proposal.

References link:

https://azure.microsoft.com/en-us/solutions/big-data/

https://aws.amazon.com/events/anz/on-demand/auckland-summit/?sc_channel

Thank you 🙂