How to Remove NaN from Array in Python?

As a Python developer during a project for one of our USA clients, I had a requirement to remove NaN from Array in Python removing NaN (Not a Number) values is necessary to ensure the accuracy of your analysis. We will go through detailed examples to learn different methods to clean your data efficiently.

What are NaN Values in Python?

NaN in Python stands for “Not a Number” and is used to represent missing or undefined values in a dataset. When working with large datasets, especially in fields of finance, healthcare, etc you may encounter NaN values that can disturb your calculations and analyses. For instance, consider a dataset of average temperatures in various US cities where some data points might be missing.

Read How to Reverse an Array in Python?

Prerequisites

Before we get into the methods, make sure you have Python NumPy installed. You can install it using pip:

pip install numpy

Check out How to Update an Array in Python

1. Use numpy.isnan() and Boolean Indexing

The most simple way to remove NaN values from a Python NumPy array is by using the numpy.isnan() function in combination with Boolean indexing. Let’s see how this works.

Example

Imagine you have an array representing average monthly rainfall in inches for New York City, but some months have missing data:

import numpy as np

rainfall = np.array([3.4, 4.2, np.nan, 2.9, 3.1, np.nan, 4.0, 3.8, 3.7, 4.1, np.nan, 3.9])

To remove the NaN values, you can use the following code:

clean_rainfall = rainfall[~np.isnan(rainfall)]
print(clean_rainfall)

This code will output:

[3.4 4.2 2.9 3.1 4.  3.8 3.7 4.1 3.9]

A screenshot of the executed example code is added below, you can have a look.

Remove NaN from Array in Python

Here, numpy.isnan(rainfall) returns a boolean array indicating where NaN values are located and negates this array.

Read How to Print Duplicate Elements in Array in Python

2. Use numpy.nan_to_num()

Another approach is to replace NaN values with a specific number using the numpy.nan_to_num() function in Python. This method is useful when you prefer to add missing values rather than remove them.

Example

Let’s use the same rainfall data for New York City:

rainfall = np.array([3.4, 4.2, np.nan, 2.9, 3.1, np.nan, 4.0, 3.8, 3.7, 4.1, np.nan, 3.9])

You can replace NaN values with zero (or any other value) as follows:

clean_rainfall = np.nan_to_num(rainfall, nan=0.0)
print(clean_rainfall)

This code will output:

[3.4 4.2 0.  2.9 3.1 0.  4.  3.8 3.7 4.1 0.  3.9]

A screenshot of the executed example code is added below, you can have a look.

How to Remove NaN from Array in Python

In this example, all NaN values are replaced with 0.0.

Check out How to Convert Python Dict to Array

3. Remove Rows or Columns with NaN Values

In some cases, you may want to remove entire rows or columns which contain NaN values. This method is particularly useful for 2D arrays or matrices.

Example

Consider a 2D array representing the average monthly temperatures for various cities in the USA:

temperatures = np.array([
    [32.0, 35.1, np.nan],
    [45.2, np.nan, 47.8],
    [np.nan, 50.5, 52.3],
    [55.1, 58.0, 60.2]
])

To remove rows with any NaN values, you can use the following code:

clean_temperatures = temperatures[~np.isnan(temperatures).any(axis=1)]
print(clean_temperatures)

This code will output:

[[55.1 58.  60.2]]

A screenshot of the executed example code is added below, you can have a look.

Removing Rows or Columns with NaN Values from Array in Python

Here, np.isnan(temperatures).any(axis=1) returns a boolean array indicating which rows contain NaN values, and negates it.

Read Python repeat array n times

4. Use Pandas for DataFrames

If you’re working with tabular data, the Pandas library provides more easy methods to handle NaN values. You can easily remove or fill NaN values in DataFrames.

Example

Let’s say you have a data frame representing the average monthly temperatures for various US cities:

import pandas as pd

data = {
    'New York': [32.0, 35.1, np.nan, 45.2, np.nan, 47.8, np.nan, 50.5, 52.3, 55.1, 58.0, 60.2],
    'Los Angeles': [58.4, 60.2, 62.1, np.nan, 65.3, 68.0, 70.2, np.nan, 72.4, 74.1, 75.8, 77.5],
    'Chicago': [28.2, 30.1, np.nan, 35.4, 37.6, np.nan, 40.3, 42.1, 44.0, 46.2, 48.5, np.nan]
}

df = pd.DataFrame(data)

To remove rows with any NaN values, you can use the dropna() method:

clean_df = df.dropna()
print(clean_df)

This code will output:

   New York  Los Angeles  Chicago
9      55.1         74.1     46.2

Alternatively, to fill NaN values with a specific value, you can use the fillna() method:

filled_df = df.fillna(0.0)
print(filled_df)

This code will output:

    New York  Los Angeles  Chicago
0      32.0         58.4     28.2
1      35.1         60.2     30.1
2       0.0         62.1      0.0
3      45.2          0.0     35.4
4       0.0         65.3     37.6
5      47.8         68.0      0.0
6       0.0         70.2     40.3
7      50.5          0.0     42.1
8      52.3         72.4     44.0
9      55.1         74.1     46.2
10     58.0         75.8     48.5
11     60.2         77.5      0.0

Check out How to Get Values from a JSON Array in Python

Conclusion

In this tutorial, I helped you to learn how to remove NaN from array in Python. Whether you choose to remove or replace NaN values, NumPy and Pandas offer many tools. Topics I covered, are using numpy.isnan() and Boolean Indexing, using Numpy.nan_to_num(), Removing Rows or Columns with NaN Values, Using Pandas for DataFrames.

You may also like to read:

.

51 Python Programs

51 PYTHON PROGRAMS PDF FREE

Download a FREE PDF (112 Pages) Containing 51 Useful Python Programs.

pyython developer roadmap

Aspiring to be a Python developer?

Download a FREE PDF on how to become a Python developer.

Let’s be friends

Be the first to know about sales and special discounts.