Statistical Analysis with Python - Data Science

Statistical analysis means investigating trends, patterns, and relationships using quantitative data. It is an important research tool used by scientists, governments, businesses, and other organizations. In this lab we are going do lot of analysis on data sets using Python and R.

Pre Requisites:

  • The students must have knowledge on the basics of Mathematics Syllabus

Course Objectives:

  1. To provide an overview of a new language R used for data science.
  2. To familiarize students with how various statistics like mean median etc. can be collected for data exploration in PYTHON
  3. To provide a solid undergraduate foundation in both probability theory and mathematical statistics and at the same time provides an indication of the relevance and importance of the theory in solving practical problems in the real world

Experiment 1: Basics of PYTHON Programming 

  1. Download and install PYTHON ID
  2. Write a program to illustrate basic Arithmetic in PYTHON 
  3. Write a program to illustrate Variable assignment in PYTHON 
  4. Write a program to illustrate data types in PYTHON

Experiment 2: Decision making, Looping Statement and Functions

  1. Write a program to illustrate if-else-else if in PYTHON 
  2. Write a Program to illustrate While and For loops in PYTHON 
  3. Write a program to demonstrate working with functions in PYTHON.
  1. Write a program to demonstrate working with installing and loading packages in PYTHON
  2. Write a programs to demonstrate working with Data Reshaping in PYTHON

 Experiment 5: Functions in Python

  1. Develop a python program to calculate the Greatest Common Divisor for two numbers. Create two separate procedures (functions) to illustrate the iterative and recursive solutions.
  2. Develop a python program to calculate the next number in the Fibonacci series for a given number (which may or may not be in the Fibonacci series).
  3. Develop a python program to calculate the square root of (N+1)th Prime number for a given number N using binary search with a precision of upto 7 decimal places.(Avoid built-in square root function)
  4. Design a Python program to determine the difference in days for two dates in YYYY:MM:DD format (0 <= YYYY <= 9999, 1 <= MM <= 12, 1 <= DD <= 31) following the leap year rules.

Experiment 6: Text file processing & Basic Statistics

1. Develop Python program to generate count of words in a text  file.

2.Write a program in Python with functions to calculate the following for comma-separated numbers in a text file(.txt)

a) 3rd Maximum number
b) 9th Minimum number
c) Total Unique Count
d )Mean
e) Standard Deviation
f) Number(s) with maximum frequency
g) Number(s) with minimum frequency

Notes:- First three bits a,b,c solutions are in a single program

Experiment 7: Exploring the Numpy library for multi-dimensional array processing

1.Develop programs in Python to implement the following in Numpy

a.Array slicing, reshaping, concatenation and splitting
b.Universal functions in Numpy
e.Fast sorting
Experiment 8: Data cleaning and processing with Pandas

1.Develop the following programs in Python
     a) Implementing and querying the Series data structure
     b) Implementing and querying the Data Frame data structure
     c) Merge two different Data Frames

Experiment 9: Advanced Data Processing and Transformation-Implement the following using the Pandas library

  1. GroupBy
  2. Pivot tables
Experiment 10: Data Interfaces
  1. Write a program to demonstrate following operations on the given datasets.
                    1)  Load data from different files like CSV and Excel PYTHON
                    2)   Data Description
Experiment 11: Data Visualization-I in PYTHON
  1. Write programs to demonstrate the different plots like Line Chart, Bar Chart, Histogram, Pie Chart, Stacked Bar Chart, Scatter Plot, Box Plot, Heat Map by loading the real-time data.
  2. Write programs to demonstrate to create subplots.
Experiment 12: Data Visualization-II in PYTHON
  1. Write programs to illustrate the different plotting data distributions like, Univariate Distributions, Bivariate Distributions.
  2. Write programs to demonstrate plotting Categorical and Time-Series Data.
Experiment 13: Probability Distributions
  1. Generate and Visualize Discrete and continuous distributions using the statistical environment.
  2. Demonstration of normal, binomial and Poisson distributions. 

Experiment 14: Building Confidence in Confidence Intervals 

  1. Populations Versus Samples 
  2. Large Sample Confidence Intervals 
  3. Simulating Data Sets 
  4. Evaluating the Coverage of Confidence Intervals

Experiment 15: Perform Tests of Hypotheses

  1. How to perform tests of hypotheses about the mean when the variance is known. 
  2. How to compute the p-value. 
  3. Explore the connection between the critical region, the test statistic, and the p-value 

Experiment 16: Regression

  1. Write a program to demonstrate line regression in PYTHON for the given dataset by following the below steps.

                                  1.Reading and Understanding the Data
                                  2.Hypothesis Testing in Linear Regression
                                  3.Building a Linear Model
                                  4.Residual Analysis and Predictions


Post a Comment

Note: only a member of this blog may post a comment.

Find Us On Facebook

Computer Basics


C Programming


Java Tutorial


Data Structures


MS Office


Database Management