Statistical Analysis with Python - Data Science

Statistical analysis means investigating trends, patterns, and relationships using quantitative data. It is an important research tool used by scientists, governments, businesses, and other organizations. In this lab we are going do lot of analysis on data sets using Python and R.

Pre Requisites:

  • The students must have knowledge on the basics of Mathematics Syllabus

Course Objectives:

  1. To provide an overview of a new language R used for data science.
  2. To familiarize students with how various statistics like mean median etc. can be collected for data exploration in PYTHON
  3. To provide a solid undergraduate foundation in both probability theory and mathematical statistics and at the same time provides an indication of the relevance and importance of the theory in solving practical problems in the real world

 Experiment 1: Functions in Python

  1. Develop a python program to calculate the Greatest Common Divisor for two numbers. Create two separate procedures (functions) to illustrate the iterative and recursive solutions.
  2. Develop a python program to calculate the next number in the Fibonacci series for a given number (which may or may not be in the Fibonacci series).
  3. Develop a python program to calculate the square root of (N+1)th Prime number for a given number N using binary search with a precision of upto 7 decimal places.(Avoid built-in square root function)
  4. Design a Python program to determine the difference in days for two dates in YYYY:MM:DD format (0 <= YYYY <= 9999, 1 <= MM <= 12, 1 <= DD <= 31) following the leap year rules.

Experiment 2: Text file processing & Basic Statistics

1. Develop Python program to generate count of words in a text  file.

2.Write a program in Python with functions to calculate the following for comma-separated numbers in a text file(.txt)

a) 3rd Maximum number
b) 9th Minimum number
c) Total Unique Count
d )Mean
e) Standard Deviation
f) Number(s) with maximum frequency
g) Number(s) with minimum frequency

Notes:- First three bits a,b,c solutions are in a single program

Experiment 3: Exploring the Numpy library for multi-dimensional array processing

1.Develop programs in Python to implement the following in Numpy

a.Array slicing, reshaping, concatenation and splitting
b.Universal functions in Numpy
c.Aggregations
d.Broadcasting
e.Fast sorting

Experiment 4: Data cleaning and processing with Pandas

1.Develop the following programs in Python
     a) Implementing and querying the Series data structure
     b) Implementing and querying the Data Frame data structure
     c) Merge two different Data Frames 

     d) Performing Data Frame Indexing 

Experiment 5: Advanced Data Processing and Transformation-Implement the following using the Pandas library

  1. GroupBy
  2. Pivot tables

 Experiment 6: Data Visualization-I in PYTHON

  1. Write programs to demonstrate the different plots like Line Chart, Bar Chart, Scatter Plot, Pie Chart,Box Plot
  2. Write programs to demonstrate to create subplots.
Experiment 7: Data Visualization-II in PYTHON
  1. Write programs to illustrate the different plotting data distributions like, Univariate Distributions, Bivariate Distributions.
  2. Write programs to demonstrate plotting Categorical and Time-Series Data.
Experiment 8: Probability Distributions
  1. Generate and Visualize Discrete and continuous distributions using the statistical environment.
  2. Demonstration of normal, binomial and Poisson distributions. 

Experiment 9: Building Confidence in Confidence Intervals 

  1. Populations Versus Samples 
  2. Large Sample Confidence Intervals 
  3. Simulating Data Sets 
  4. Evaluating the Coverage of Confidence Intervals

Experiment 10: Perform Tests of Hypotheses

  1. How to perform tests of hypotheses about the mean when the variance is known. 
  2. How to compute the p-value. 
  3. Explore the connection between the critical region, the test statistic, and the p-value

 Data Science Additional Programs

Experiment 1: Basics of PYTHON Programming 

  1. Download and install PYTHON ID
  2. Write a program to illustrate basic Arithmetic in PYTHON 
  3. Write a program to illustrate Variable assignment in PYTHON 
  4. Write a program to illustrate data types in PYTHON

Experiment 2: Decision making, Looping Statement and Functions

  1. Write a program to illustrate if-else-else if in PYTHON 
  2. Write a Program to illustrate While and For loops in PYTHON 
  3. Write a program to demonstrate working with functions in PYTHON.
  1. Write a program to demonstrate working with installing and loading packages in PYTHON
  2. Write a programs to demonstrate working with Data Reshaping in PYTHON

Experiment : 5  Interfaces
  1. Write a program to demonstrate following operations on the given datasets.
                    1)  Load data from different files like CSV and Excel PYTHON
                    2)   Data Description               

Experiment 6: Regression

  1. Write a program to demonstrate line regression in PYTHON for the given data set by following the below steps.

                                  1.Reading and Understanding the Data
                                  2.Hypothesis Testing in Linear Regression
                                  3.Building a Linear Model
                                  4.Residual Analysis and Predictions

Experiment 6: Creating a NumPy Array

  1. Basic ndarray
  2. Array of zeros
  3. Array of ones
  4. Random numbers in ndarray
  5. An array of your choice
  6. Imatrix in NumPy
  7. Evenly spaced ndarray

Experiment 7: The Shape and Reshaping of NumPy Array

  1. Dimensions of NumPy array
  2. Shape of NumPy array
  3. Size of NumPy array
  4. Reshaping a NumPy array
  5. Flattening a NumPy array
  6. Transpose of a NumPy array

Experiment 8: Indexing and Slicing of NumPy Array

  1. Slicing 1-D NumPy arrays
  2. Slicing 2-D NumPy arrays
  3. Slicing 3-D NumPy arrays
  4. Negative slicing of NumPy arrays

Experiment 8: Perform following operations using pandas

  1. Creating dataframe
  2. concat()
  3. Setting conditions
  4. Adding a new column

Experiment 9: Read the following file formats using pandas

  1. Text files
  2. CSV files
  3. Excel files
  4. JSON files

Experiment 10: Perform following visualizations using matplotlib

  1. Bar Graph
  2. Pie Chart
  3. Box Plot
  4. Histogram
  5. Line Chart and Subplots
  6. Scatter Plot

APPLICATIONS OF PYTHON-Pandas

A) Pandas Data Series:

  1. Write a Pandas program to create and display a one-dimensional array-like object containing an array of data using Pandas module.
  2. Write a Pandas program to convert a Panda module Series to Python list and it's type.
  3. Write a Pandas program to add, subtract, multiple and divide two Pandas Series.
  4. Write a Pandas program to convert a NumPy array to a Pandas series.

Sample Series:

NumPy array:
[10 20 30 40 50]
Converted Pandas series:
0 10
1 20
2 30
3 40
4 50
dtype: int64

B) Pandas Data Frames:
Consider Sample Python dictionary data and list labels:


exam_data = {'name': ['Anastasia', 'Dima', 'Katherine', 'James', 'Emily', 'Michael',
'Matthew', 'Laura', 'Kevin', 'Jonas'],
'score': [12.5, 9, 16.5, np.nan, 9, 20, 14.5, np.nan, 8, 19],
'attempts': [1, 3, 2, 3, 2, 3, 1, 1, 2, 1],
'qualify': ['yes', 'no', 'yes', 'no', 'no', 'yes', 'yes', 'no', 'no', 'yes']}
labels = ['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j']

  1. Write a Pandas program to create and display a Data Frame from a specified dictionary data which has the index labels.
  2. Write a Pandas program to change the name 'James' to 'Suresh' in name column of the Data Frame.
  3. Write a Pandas program to insert a new column in existing Data Frame.
  4. Write a Pandas program to get list from Data Frame column headers.
  5. Write a Pandas program to get list from Data Frame column headers
C) Pandas Index:
  1. Write a Pandas program to display the default index and set a column as an Index in a given data frame.
  2. Write a Pandas program to create an index labels by using 64-bit integers, using floating-point numbers in a given data frame.
D) Pandas String and Regular Expressions:
  1. Write a Pandas program to convert all the string values to upper, lower cases in a given pandas series. Also find the length of the string values.
  2. Write a Pandas program to remove white spaces, left sided white spaces and right sided white spaces of the string values of a given pandas series.
  3. Write a Pandas program to count of occurrence of a specified sub-string in a Data Frame column.
  4. Write a Pandas program to swap the cases of a specified character column in a given Data Frame.
 E) Pandas Joining and merging DataFrame:
  1. Write a Pandas program to join the two given dataframes along rows and assign all data.
  2. Write a Pandas program to append a list of dictionaries or series to a existing DataFrame and display the combined data.
  3. Write a Pandas program to join the two dataframes with matching records from both sides where available.
F) Pandas Grouping Aggregate
 
Consider data set:
 

  1.  Write a Pandas program to split the following dataframe into groups based on school code. Also check the type of GroupBy object.
  2. Write a Pandas program to split the following dataframe by school code and get mean, min, and max value of age for each school.
G) Pandas Styling:
  1. Create a dataframe of ten rows, four columns with random values. Write a Pandas program to highlight the negative numbers red and positive numbers black.
  2. Create a dataframe of ten rows, four columns with random values. Write a Pandas program to highlight the maximum value in each column.
  3. Create a dataframe of ten rows, four columns with random values. Write a Pandas program to highlight dataframe's specific columns.
H)  Plotting:
  1. Write a Pandas program to create a horizontal stacked bar plot of opening, closing stock prices of any stock dataset between two specific dates.
  2. Write a Pandas program to create a histograms plot of opening, closing, high, lowstock prices of stock dataset between two specific dates.
  3. Write a Pandas program to create a stacked histograms plot of opening, closing, high, low stock prices of stock dataset between two specific dates with more bins.

 I) Exploratory data analysis on the bank marketing data set with Pandas Part-I

The bank marketing data set contains all the details. By reading or observing data set carefully write the code for the following 

  1. How many number of missing values are there in the data set?
  2. Total how many duplicate values are presented in the data set?
  3. What is the shape of the data after dropping the feature “Unnamed: 0”, missing values and duplicated values?
  4. What is the average age of the clients those who have subscribed to deposit?
  5. What is the maximum number of contacts performed during the campaign for the clients who have not subscribed to deposit?
  6. What is the difference between the maximum balance (in euros) for the clients who have subscribed to deposit and for the clients who have not subscribed to the deposit?
  7. What is the count of unique job levels in the data and find out how many clients are in the management level?
  8. What is the percentage split of the categories in the column “deposit”?
  9. Generate a scatter plot of “age” vs “balance”.
  10. How many unemployed clients have subscribed to deposit?

 I) Exploratory data analysis on the banking data set with Pandas Part-II

The banking data set contains all the details. By reading or observing data set carefully write the code for the following 

  1. How many number of missing values are there in the data set?
  2. Total how many duplicate values are presented in the data set?
  3. What is the shape of the data after dropping the feature “Unnamed: 0”, missing values and duplicated values?
  4. What is the average age of the clients those who have not subscribed to deposit?
  5. What is the maximum number of contacts performed during the campaign for the clients who have  subscribed to deposit?
  6. What is the count of unique education levels in the data and find out how many clients have completed secondary education?
  7. What is the percentage split of the categories in the column “deposit”?
  8. Generate a scatter plot of “age” vs “balance”.
  9. How many clients with personal loan has housing loan as well?
  10. How many unemployed clients have not subscribed to deposit? 
  11.  
 For Answers of Part-II Click Here

0 comments :

Post a Comment

Note: only a member of this blog may post a comment.

Machine Learning

More

Advertisement

Java Tutorial

More

UGC NET CS TUTORIAL

MFCS
COA
PL-CG
DBMS
OPERATING SYSTEM
SOFTWARE ENG
DSA
TOC-CD
ARTIFICIAL INT

C Programming

More

Python Tutorial

More

Data Structures

More

computer Organization

More
Top