Sunday, 30 January 2022

How to encode the labels and show the performance of encoded labels

 Usually in Machine learning we encounter data which have multiple labels in one or multiple columns. These labels can be characters or numeric form. These kind of data cannot be fed in the raw format to a Machine Learning model. To make the data understandable for the model, it is often labeled using Label encoding. Label Encoding is a technique of converting the labels into numeric form so that it could be ingested to a machine learning model. It is an important step in data preprocessing for supervised learning techniques. In this method, we generally replace each value in a categorical column with numbers from 0 to N-1. LabelEncoder is a utility class to help normalize labels such that they contain only values between 0 and n_classes-1.

The following example demonstrates you how to encode labels. Here i am using iris.csv file for example purpose. you can download this file here

sklearn.preprocessing.LabelEncode is used for performing label encoding. The detailed description can be  found here on the official website (

 Setp-1: First we find the unique labels in the column variety as follows

import numpy as np
import pandas as pd

# Import dataset required data set
df = pd.read_csv('iris.csv')



array(['Setosa', 'Versicolor', 'Virginica'], dtype=object)


Setps-2: Now using preprocessing.LabelEncoder() we encode the above unique data set as follow

# Import label encoder
from sklearn import preprocessing

# label_encoder object knows how to understand word labels.
label_encoder = preprocessing.LabelEncoder()

# Encode labels in column 'species'.
df['variety']= label_encoder.fit_transform(df['variety'])



array([0, 1, 2])

As you can observe in the above output, Setosa is labeled as 0, Versicolor is labeled as 1, and Virginica is labeled as 2


Post a Comment

Note: only a member of this blog may post a comment.

Find Us On Facebook

Computer Basics


C Programming


Java Tutorial


Data Structures


MS Office


Database Management