Normalization is the process of scaling individual samples to have unit norm. This process can be useful if you plan to use a quadratic form such as the dot-product or any other kernel to quantify the similarity of any pair of samples.

This assumption is the base of the Vector Space Model often used in text classification and clustering contexts.

Data normalization is used when you want to adjust the values in the feature vector so that they can be measured on a common scale. One of the most common forms of normalization that is used in machine learning adjusts the values of a feature vector so that they sum up to 1.

**Types of normalization**

To normalize data, the preprocessing.normalize() function can be used. This function scales input vectors individually to a unit norm (vector length). Three types of norms are provided, l1, l2, or max.

The function normalize provides a quick and easy way to perform this operation on a single array-like dataset, either using the l1, l2, or max norms:

How it works

1. As we said, to normalize data, the preprocessing.normalize() function can be used as follow.

import numpy as np

data = np.array([[3, -1.5, 2, -5.4], [0, 4, -0.3, 2.1], [1, 3.3, -1.9, -4.3]])

data_normalized = preprocessing.normalize(data, norm='l1', axis=0)

2. To display the normalized array, we will use the following code:

print(data_normalized)

The following output is returned:

[[ 0.75 -0.17045455 0.47619048 -0.45762712] [ 0. 0.45454545 -0.07142857 0.1779661 ] [ 0.25 0.375 -0.45238095 -0.36440678]]

3. As already mentioned, the normalized array along the columns (features) must return a sum equal to 1. Let's check this for each column

data_norm_abs = np.abs(data_normalized)

print(data_norm_abs.sum(axis=0))

In the first line of code, we used the np.abs() function to evaluate the absolute value of each element in the array. In the second row of code, we used the sum() function to calculate the sum of each column (axis=0). The following results are returned:

[1. 1. 1. 1.]