Saturday, 4 December 2021

Write a program to demonstrate working with Data Frames in PYTHON ( Part-I)

 A Data Frame represents a tabular, spreadsheet-like data structure containing an ordered collection of columns, each of which can be a different value type (numeric, string, Boolean, etc.). The Data Frame has both a row and column index; the following operations can be performed on Data Frame 

1) Slice Data Frame
2) Append a Column to Data Frame
3) Select a Column of a Data Frame
4) Subset a Data Frame
5) Finding unique elements
6) Sorting a Data frame
7) Merge Data Frames in PYTHON

 The following programs guide you how to do above task 

1) Slice Data Frame 

Slice operations include selecting rows or columns based on labels or index numbers.

Pandas Data Frame syntax includes “loc” and “iloc” functions, eg.,  data_frame.loc[ ] and data_frame.iloc[ ]. Both functions are used to access rows and/or columns, where “loc” is for access by labels and “iloc” is for access by position, i.e. numerical indices.  

The main distinction between the two methods is:

  1. loc gets rows (and/or columns) with particular labels.
  2. iloc gets rows (and/or columns) at integer locations.

The following program show how to slice rows and columns

# importing pandas library
import pandas as pd
# Initializing the nested list with Data set
player_list = [['M.S.Dhoni', 36, 75, 5428000],
               ['A.B.D Villers', 38, 74, 3428000],
               ['V.Kholi', 31, 70, 8428000],
               ['S.Smith', 34, 80, 4428000],
               ['C.Gayle', 40, 100, 4528000],
               ['J.Root', 33, 72, 7028000],
               ['K.Peterson', 42, 85, 2528000]]
# creating a pandas dataframe
df = pd.DataFrame(player_list, columns=['Name', 'Age', 'Weight', 'Salary'])
print("Data Frame before slicing")
print("----------------------------------------")
print(df)
print("----------------------------------------")
print()
print()
print("1.Slicing rows in data frame")
print("----------------------------------------")
df1 = df.iloc[0:4]
df11=df.loc[0:4]
print("data frame after slicing")
print("----------------------------------------")
print(df1)
print("----------------------------------------")
print("slicing with loc")
print("----------------------------------------")
print(df11)
print("----------------------------------------")
print()
print()
print("2.Slicing columns in data frame")
df2 = df.iloc[:,0:2]
print("----------------------------------------")
print("data frame after slicing")
print("----------------------------------------")
print(df2)

The following is the output 

Data Frame before slicing
----------------------------------------
            Name  Age  Weight   Salary
0      M.S.Dhoni   36      75  5428000
1  A.B.D Villers   38      74  3428000
2        V.Kholi   31      70  8428000
3        S.Smith   34      80  4428000
4        C.Gayle   40     100  4528000
5         J.Root   33      72  7028000
6     K.Peterson   42      85  2528000
----------------------------------------


1.Slicing rows in data frame
----------------------------------------
data frame after slicing
----------------------------------------
            Name  Age  Weight   Salary
0      M.S.Dhoni   36      75  5428000
1  A.B.D Villers   38      74  3428000
2        V.Kholi   31      70  8428000
3        S.Smith   34      80  4428000
----------------------------------------
slicing with loc
----------------------------------------
            Name  Age  Weight   Salary
0      M.S.Dhoni   36      75  5428000
1  A.B.D Villers   38      74  3428000
2        V.Kholi   31      70  8428000
3        S.Smith   34      80  4428000
4        C.Gayle   40     100  4528000
----------------------------------------


2.Slicing columns in data frame
----------------------------------------
data frame after slicing
----------------------------------------
            Name  Age
0      M.S.Dhoni   36
1  A.B.D Villers   38
2        V.Kholi   31
3        S.Smith   34
4        C.Gayle   40
5         J.Root   33
6     K.Peterson   42
 

2) Append a Column to Data Frame 

The Data Frame is stored in the form of table. It is to add a column to data frame by using 

data-frame[column-name]='list name'

# Import pandas package
import pandas as pd
# Define a set containing Students data
data = {'Name': ['Jai', 'Princi', 'Gaurav', 'Anuj'],
'Height': [5.1, 6.2, 5.1, 5.2],
'Qualification': ['Msc', 'MA', 'Msc', 'Msc']}
# Convert the set into DataFrame
df = pd.DataFrame(data)
print("----------------------------------------")
print("Data Frame before adding column")
print("----------------------------------------")
print(df)
print("----------------------------------------")
print()
print()
# Declare a list that is to be converted into a column
address = ['Delhi', 'Bangalore', 'Chennai', 'Patna']
# Using 'Address' as the column name and equating it to the list
df['Address'] = address
# Observe the result
print("----------------------------------------")
print("Data Frame after adding column")
print("----------------------------------------")
print(df)

The following is the output 

----------------------------------------
Data Frame before adding column
----------------------------------------
     Name  Height Qualification
0     Jai     5.1           Msc
1  Princi     6.2            MA
2  Gaurav     5.1           Msc
3    Anuj     5.2           Msc
----------------------------------------


----------------------------------------
Data Frame after adding column
----------------------------------------
     Name  Height Qualification    Address
0     Jai     5.1           Msc      Delhi
1  Princi     6.2            MA  Bangalore
2  Gaurav     5.1           Msc    Chennai
3    Anuj     5.2           Msc      Patna


3) Select a Column of a Data Frame

The column are selected by using label name / column names.

# Import pandas package
import pandas as pd
# Define a set containing employee data
data = {'Name':['Jai', 'Princi', 'Gaurav', 'Anuj'],
'Age':[27, 24, 22, 32],
'Address':['Delhi', 'Kanpur', 'Allahabad', 'Kannauj'],
'Qualification':['Msc', 'MA', 'MCA', 'Phd']}
# Convert the set into DataFrame
print("----------------------------------------")
print(" Original Data Frame")
print("----------------------------------------")
df = pd.DataFrame(data)
print(df)
print()
print()
# select two columns
print("----------------------------------------")
print("Selecting two columns from Data Frame")
print("----------------------------------------")
df2=df[['Name', 'Qualification']]
print(df2)
print("----------------------------------------")
print("Selecting all rows and second to fourth column from Data Frame")
print("----------------------------------------")
# select all rows 
# and second to fourth column
df3=df[df.columns[1:4]]
print(df3)

The following is the output

----------------------------------------
 Original Data Frame
----------------------------------------
     Name  Age    Address Qualification
0     Jai   27      Delhi           Msc
1  Princi   24     Kanpur            MA
2  Gaurav   22  Allahabad           MCA
3    Anuj   32    Kannauj           Phd


----------------------------------------
Selecting two columns from Data Frame
----------------------------------------
     Name Qualification
0     Jai           Msc
1  Princi            MA
2  Gaurav           MCA
3    Anuj           Phd
----------------------------------------
Selecting all rows and second to fourth column from Data Frame
----------------------------------------
   Age    Address Qualification
0   27      Delhi           Msc
1   24     Kanpur            MA
2   22  Allahabad           MCA
3   32    Kannauj           Phd

0 comments:

Post a Comment

Note: only a member of this blog may post a comment.

Find Us On Facebook

Computer Basics

More

C Programming

More

Java Tutorial

More

Data Structures

More

MS Office

More

Database Management

More
Top