Sunday 30 January 2022

How to encode the labels and show the performance of encoded labels

 Usually in Machine learning we encounter data which have multiple labels in one or multiple columns. These labels can be characters or numeric form. These kind of data cannot be fed in the raw format to a Machine Learning model. To make the data understandable for the model, it is often labeled using Label encoding. Label Encoding is a technique of converting the labels into numeric form so that it could be ingested to a machine learning model. It is an important step in data preprocessing for supervised learning techniques. In this method, we generally replace each value in a categorical column with numbers from 0 to N-1. LabelEncoder is a utility class to help normalize labels such that they contain only values between 0 and n_classes-1.

The following example demonstrates you how to encode labels. Here i am using iris.csv file for example purpose. you can download this file here

sklearn.preprocessing.LabelEncode is used for performing label encoding. The detailed description can be  found here on the official website (https://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.LabelEncoder.html)

 Setp-1: First we find the unique labels in the column variety as follows

import numpy as np
import pandas as pd

# Import dataset required data set
df = pd.read_csv('iris.csv')

df['variety'].unique()


Output

array(['Setosa', 'Versicolor', 'Virginica'], dtype=object)

 

Setps-2: Now using preprocessing.LabelEncoder() we encode the above unique data set as follow

# Import label encoder
from sklearn import preprocessing

# label_encoder object knows how to understand word labels.
label_encoder = preprocessing.LabelEncoder()

# Encode labels in column 'species'.
df['variety']= label_encoder.fit_transform(df['variety'])

df['variety'].unique()


Output 

array([0, 1, 2])

As you can observe in the above output, Setosa is labeled as 0, Versicolor is labeled as 1, and Virginica is labeled as 2

0 comments :

Post a Comment

Note: only a member of this blog may post a comment.

Machine Learning

More

Advertisement

Java Tutorial

More

UGC NET CS TUTORIAL

MFCS
COA
PL-CG
DBMS
OPERATING SYSTEM
SOFTWARE ENG
DSA
TOC-CD
ARTIFICIAL INT

C Programming

More

Python Tutorial

More

Data Structures

More

computer Organization

More
Top