Wednesday, 25 January 2023

 Introduction to Data Analytics

1.1 Variable, Measurement and Data

Variables – is a characteristic of any entity being studied that is capable of taking on different values

Measurements – is when a standard process is used to assign numbers to particular attributes or characteristic of a variable

Data – data are recorded measurements

1.2 What is generating so much data?

Data can be generated by
–Humans,
–Machines or
–Humans-machines combines
It can be generated anywhere where any information is generated and stored in structured or unstructured formats

1.3 How data add value to business?


Image source: :https://datajobs.com/

1.4 Why Data is important?

  • Data helps in make better decisions
  • Data helps in solve problems by finding the reason for under performance
  • Data helps one to evaluate the performance.
  • Data helps one improve processes
  • Data helps one understand consumers and the market

2.1. Define data analytics

  • Analytics is defined as “the scientific process of transforming data into insights for making better decisions”
  • Analytics, is the use of data, information technology, statistical analysis, quantitative methods, and mathematical or computer-based models to help managers gain improved insight about their business operations and make better, fact-based decisions – James Evans
  • Analysis = Analytics ?

2.2 Why analytics is important?

Opportunity abounds for the use of analytics and big data such as:

  1. Determining credit risk
  2. Developing new medicines
  3. Finding more efficient ways to deliver products and services
  4. Preventing fraud
  5. Uncovering cyber threats
  6. Retaining the most valuable customers

Tuesday, 20 December 2022

 Introduction to Machine Learning

A machine learning model is a file that has been trained to recognize certain types of patterns. You train a model over a set of data, providing it an algorithm that it can use to reason over and learn from those data.

A machine learning model is defined as a mathematical representation of the output of the training process.  

Machine Learning Algorithm Vs Model 

An algorithm in machine learning is a procedure that is run on data to create a machine learning "model"

Machine Learning algorithms performs " pattern recognizance". Algorithms " learn" from data or are fit on a data set.

There are many machine learning algorithms.

A model in machine machine learning is the output of a machine learning algorithm run on a data set.

A model represents what was learned by a machine learning algorithm.  

                                                Image Source: www.cogitotech.com

 Dataset 

A ML dataset is a collection of data that is used to train the model.

A dataset acts as an example to teach the machine learning algorithms how to make predictions.

Types of ML/AI datasets 

1. Training data (60%): This is the data that will be be used to train the model 

2. Validation data(20%): Subset of the training dataset used to check the accuracy of the model.

3. Testing data(20%): Separate set of data from the validation and training datasets, used to evaluate the performance of the model. 


Hypothesis

It is defined as the approximate function that best describes the target in supervised machine learning.

It is primarily based on data as well as bias and restrictions applied to data. Two types of hypothesis are: Null Hypothesis and Alternative Hypothesis.

Hypothesis Space

It is the set of all the possible legal hypothesis. This is the set from which the ML algorithm would determine the best possible (only one) which could best describe the target function or the outputs.

 Inductive Bias

The inductive bias of a learning algorithm is the set of assumptions that the learner uses to predict outputs of given inputs that it has not encountered.

In ML, one aims to construct algorithms that are able to learn to predict a certain target output.


                                         Image source: https://miro.medium.com


Monday, 12 December 2022

Exploratory data analysis on the banking data set with Pandas Part-II

 The banking data set contains all the details. By reading or observing data set carefully write the code for the following.

 First we load the data set into a variable, so that is becomes easy to perform operations on it.

1. How many number of missing values are there in the data set? (Answer:3)

 

3. What is the shape of the data after dropping the feature “Unnamed: 0”, missing values and duplicated values? (Answer:(5578,17) )

2. Total how many duplicate values are presented in the data set? (Answer:2)

4. What is the average age of the clients those who have not subscribed to deposit? (Answer:41)

 First, I will extract the required columns from the data set as

Then , I will filter the required data

 

Finally, I will describe it to find total count, the mean is the average

5. What is the maximum number of contacts performed during the campaign for the clients who have  subscribed to deposit? (Answer:32)

  First, I will extract the required columns from the data set as

Then , I will filter the required data

Finally, I will describe it to find total count, the maximum is the total count

6. What is the count of unique education levels in the data and find out how many clients have completed secondary education?


 the count of unique education levels in the data can be computed as (Answer:4)

how many clients have completed secondary education? (Answer:2721)


 7.  What is the percentage split of the categories in the column “deposit”

( Answer: Yes - 47% & No - 53%)

8. Generate a scatter plot of “age” vs “balance”.


 
9. How many clients with personal loan has housing loan as well? (Answer: 397)

10. How many unemployed clients have not subscribed to deposit? (Answer: 78)


Find Us On Facebook

Computer Basics

More

C Programming

More

Java Tutorial

More

Data Structures

More

MS Office

More

Database Management

More
Top