Convert DataFrame To NumPy Array Without Index In Python

While I was working on a machine learning project, I needed to feed my pandas DataFrame into a model that required NumPy arrays. The challenge was that I needed to convert the DataFrame to a NumPy array without including the index values, as they would skew my model’s predictions.

While pandas DataFrames are great for data manipulation and analysis, NumPy arrays offer better performance for numerical computations.

In this article, I’ll share five effective methods to convert a DataFrame to a NumPy array without including the index.

Let us start..!

Table of Contents

DataFrames and NumPy Arrays in Python

Before we get into the methods, I will quickly explain the difference between these two data structures:

pandas DataFrame: A 2D labeled data structure with columns that can be of different types
NumPy Array: A fast, versatile n-dimensional array object for numerical computation

When converting from a DataFrame to a NumPy array, we typically want just the data values without the index or column labels.

Convert DataFrame To NumPy Array Without Index in Python

Now, I will explain to you the methods to convert a DataFrame to a NumPy array without an index in Python.

Read Drop Rows in Python Pandas DataFrames

Method 1 – Use the values Attribute

The simplest and most simple way to convert a DataFrame to a Python NumPy array is using the values attribute.

import pandas as pd
import numpy as np

# Create a sample DataFrame
data = {
    'Name': ['John', 'Sarah', 'Mike', 'Emily'],
    'Age': [28, 32, 25, 30],
    'Salary': [75000, 85000, 62000, 92000]
}

df = pd.DataFrame(data)
print("Original DataFrame:")
print(df)

# Convert to NumPy array without index
numpy_array = df.values
print("\nNumPy Array (using values):")
print(numpy_array)

Output:

Original DataFrame:
    Name  Age  Salary
0   John   28   75000
1  Sarah   32   85000
2   Mike   25   62000
3  Emily   30   92000

NumPy Array (using values):
[['John' 28 75000]
 ['Sarah' 32 85000]
 ['Mike' 25 62000]
 ['Emily' 30 92000]]

I executed the above example code and added the screenshot below.

The values attribute gives us just the data, dropping both index and column names.

Check out Convert a DataFrame to a Nested Dictionary in Python

Method 2 – Use to_numpy() Method

Since pandas 0.24.0, the recommended way to convert a DataFrame to a NumPy array is the to_numpy() method in Python. It offers more flexibility and is more explicit in its purpose.

# Using to_numpy() method
numpy_array = df.to_numpy()
print("NumPy Array (using to_numpy()):")
print(numpy_array)

Output:

NumPy Array (using to_numpy()):
[['John' 28 75000]
 ['Sarah' 32 85000]
 ['Mike' 25 62000]
 ['Emily' 30 92000]]

I executed the above example code and added the screenshot below.

The to_numpy() method provides additional parameters like dtype and copy that gives you more control over the conversion process.

# Convert to float data type (where possible)
numeric_df = df.select_dtypes(include=[np.number])
float_array = numeric_df.to_numpy(dtype=float)
print("\nNumPy Array with float dtype:")
print(float_array)

Output:

NumPy Array with float dtype:
[[28.0 75000.0]
 [32.0 85000.0]
 [25.0 62000.0]
 [30.0 92000.0]]

Read Convert a Pandas DataFrame to a Dict Without Index in Python

Method 3 – Use iloc for Selecting Only Data

If you want more granular control over which rows and columns to include, iloc is an efficient option. Combined with to_numpy() or values, it lets you extract specific portions of your DataFrame.

# Using iloc to select all rows and columns, then convert
numpy_array = df.iloc[:, :].values
print("NumPy Array (using iloc + values):")
print(numpy_array)

# Select only numeric columns
numpy_array_numeric = df.iloc[:, 1:].to_numpy()
print("\nNumPy Array (only numeric columns):")
print(numpy_array_numeric)

Output:

NumPy Array (using iloc + values):
[['John' 28 75000]
 ['Sarah' 32 85000]
 ['Mike' 25 62000]
 ['Emily' 30 92000]]

NumPy Array (only numeric columns):
[[28 75000]
 [32 85000]
 [25 62000]
 [30 92000]]

I executed the above example code and added the screenshot below.

This approach is particularly useful when you need to convert only a subset of your DataFrame.

Check out Convert a Pandas DataFrame to a List in Python

Method 4 – Use the np.array() Function

Another approach is using the Python NumPy array() function directly. This is slightly less efficient than the previous methods, but gives you the same result.

# Using np.array() function
numpy_array = np.array(df)
print("NumPy Array (using np.array()):")
print(numpy_array)

Output:

NumPy Array (using np.array()):
[['John' 28 75000]
 ['Sarah' 32 85000]
 ['Mike' 25 62000]
 ['Emily' 30 92000]]

Using np.array() is a quick and simple way to convert a DataFrame to a NumPy array.

Read Add Rows to a DataFrame Pandas in a Loop in Python

Method 5 – Use as_matrix() for Older Pandas Versions

If you’re working with an older version of pandas (below 0.24.0), you might encounter the as_matrix() method. While this is now deprecated, I’m including it for completeness.

# Using as_matrix() (deprecated)
# numpy_array = df.as_matrix()  # This would work in older pandas versions
# print("NumPy Array (using as_matrix()):")
# print(numpy_array)

I recommend using to_numpy() instead of as_matrix() in all modern pandas code.

Real-World Application: Stock Price Analysis

Let’s look at a practical example of using these conversion methods in a real-world scenario. Imagine we’re analyzing stock prices for major US tech companies:

import pandas as pd
import numpy as np
from sklearn.preprocessing import StandardScaler

# Create a DataFrame with stock price data
stock_data = {
    'Date': pd.date_range(start='2023-01-01', periods=5, freq='B'),
    'AAPL': [182.63, 185.92, 183.29, 186.56, 189.30],
    'MSFT': [376.04, 374.65, 380.12, 379.89, 385.77],
    'GOOGL': [141.20, 142.65, 140.37, 143.50, 144.08],
    'AMZN': [145.24, 143.67, 146.89, 148.23, 151.94]
}

stock_df = pd.DataFrame(stock_data)
stock_df.set_index('Date', inplace=True)
print("Stock Price DataFrame:")
print(stock_df)

# Convert to NumPy array for numerical processing
stock_prices = stock_df.to_numpy()
print("\nStock Prices as NumPy Array:")
print(stock_prices)

# Standardize the stock prices using the NumPy array
scaler = StandardScaler()
standardized_prices = scaler.fit_transform(stock_prices)
print("\nStandardized Stock Prices:")
print(standardized_prices)

# Calculate correlation matrix (requires NumPy array)
correlation_matrix = np.corrcoef(stock_prices, rowvar=False)
print("\nCorrelation Matrix:")
print(correlation_matrix)

Output:

Stock Price DataFrame:
                AAPL    MSFT   GOOGL    AMZN
Date                                        
2023-01-01  182.63  376.04  141.20  145.24
2023-01-02  185.92  374.65  142.65  143.67
2023-01-03  183.29  380.12  140.37  146.89
2023-01-04  186.56  379.89  143.50  148.23
2023-01-05  189.30  385.77  144.08  151.94

Stock Prices as NumPy Array:
[[182.63 376.04 141.2  145.24]
 [185.92 374.65 142.65 143.67]
 [183.29 380.12 140.37 146.89]
 [186.56 379.89 143.5  148.23]
 [189.3  385.77 144.08 151.94]]

Standardized Stock Prices:
[[-1.41382404 -0.97007596 -1.21950079 -0.91591437]
 [-0.52219097 -1.33329275  0.35214672 -1.3673142 ]
 [-1.23609815  0.3492594  -1.84489488 -0.33588806]
 [-0.34446509  0.25501035  1.14638428  0.17158686]
 [ 3.51657825  1.69910896  1.56586466  2.44752977]]

Correlation Matrix:
[[1.         0.76889843 0.86906289 0.92155987]
 [0.76889843 1.         0.43675254 0.85978599]
 [0.86906289 0.43675254 1.         0.68422939]
 [0.92155987 0.85978599 0.68422939 1.        ]]

In this example, we:

Created a DataFrame with stock price data
Converted it to a NumPy array using to_numpy()
Used the NumPy array for standardization with scikit-learn
Calculated a correlation matrix to see relationships between stock prices

This workflow is common in financial data analysis and machine learning pipelines.

Check out Convert Python Dictionary to Pandas DataFrame

Performance Comparison

When working with large datasets, performance matters. Here’s a quick comparison of the methods we’ve discussed:

import pandas as pd
import numpy as np
import time

# Create a large DataFrame
large_df = pd.DataFrame(np.random.rand(100000, 10))

# Method 1: values
start = time.time()
array1 = large_df.values
time1 = time.time() - start

# Method 2: to_numpy()
start = time.time()
array2 = large_df.to_numpy()
time2 = time.time() - start

# Method 3: iloc + values
start = time.time()
array3 = large_df.iloc[:, :].values
time3 = time.time() - start

# Method 4: np.array()
start = time.time()
array4 = np.array(large_df)
time4 = time.time() - start

print(f"Method 1 (values): {time1:.6f} seconds")
print(f"Method 2 (to_numpy): {time2:.6f} seconds")
print(f"Method 3 (iloc + values): {time3:.6f} seconds")
print(f"Method 4 (np.array): {time4:.6f} seconds")

Typically, you’ll find that values and to_numpy() are the fastest methods, with np.array() being slightly slower due to the overhead of creating a new array.

Each method offers unique advantages depending on your specific use case. The values attribute is quick and concise, while to_numpy() offers more flexibility with its parameters. Using iloc gives you precise control over which portions of the DataFrame to convert. I also covered real-world applications and performance comparison.