Transform Pandas DataFrame with n-level hierarchical index into n-D Numpy array

Question

Question

Is there a good way to transform a DataFrame with an n-level index into an n-D Numpy array (a.k.a n-tensor)?

Example

Suppose I set up a DataFrame like

from pandas import DataFrame, MultiIndex

index = range(2), range(3)
value = range(2 * 3)
frame = DataFrame(value, columns=['value'],
                  index=MultiIndex.from_product(index)).drop((1, 0))
print frame

which outputs

The index is a 2-level hierarchical index. I can extract a 2-D Numpy array from the data using

print frame.unstack().values

which outputs

[[  0.   1.   2.]
 [ nan   4.   5.]]

How does this generalize to an n-level index?

Playing with unstack(), it seems that it can only be used to massage the 2-D shape of the DataFrame, but not to add an axis.

I cannot use e.g. frame.values.reshape(x, y, z), since this would require that the frame contains exactly x * y * z rows, which cannot be guaranteed. This is what I tried to demonstrate by drop()ing a row in the above example.

Any suggestions are highly appreciated.

The answer to "how does it generalize" is it doesn't. A pandas DataFrame is fundamentally a two-dimensional object. As your example shows, it doesn't enforce equal sizes across index "dimensions", so if you try to expand it to more dimensions, there may be gaps. I think if you want to get an n-D array you may have to make it yourself by iterating over the index levels and creating a separate "slice" of the result array for each. Pandas just isn't targeted at that sort of structure. — BrenBarn
– BrenBarn, Commented Jan 27, 2016 at 21:34
Thanks @Bren. I managed to address the problem of missing rows and to use reshape() (see below). This seems to work on my dataset, although I wouldn't be surprised if there are situations where it chokes. — Igor Raush
– Igor Raush, Commented Jan 27, 2016 at 23:22

Igor Raush · Accepted Answer · 2020-10-19 19:36:18Z

18

Edit. This approach is much more elegant (and two orders of magnitude faster) than the one I gave below.

# create an empty array of NaN of the right dimensions
shape = map(len, frame.index.levels)
arr = np.full(shape, np.nan)

# fill it using Numpy's advanced indexing
arr[frame.index.codes] = frame.values.flat
# ...or in Pandas < 0.24.0, use
# arr[frame.index.labels] = frame.values.flat

Original solution. Given a setup similar to above, but in 3-D,

from pandas import DataFrame, MultiIndex
from itertools import product

index = range(2), range(2), range(2)
value = range(2 * 2 * 2)
frame = DataFrame(value, columns=['value'],
                  index=MultiIndex.from_product(index)).drop((1, 0, 1))
print(frame)

we have

       value
0 0 0      0
    1      1
  1 0      2
    1      3
1 0 0      4
  1 0      6
    1      7

Now, we proceed using the reshape() route, but with some preprocessing to ensure that the length along each dimension will be consistent.

First, reindex the data frame with the full cartesian product of all dimensions. NaN values will be inserted as needed. This operation can be both slow and consume a lot of memory, depending on the number of dimensions and on the size of the data frame.

levels = map(tuple, frame.index.levels)
index = list(product(*levels))
frame = frame.reindex(index)
print(frame)

which outputs

       value
0 0 0      0
    1      1
  1 0      2
    1      3
1 0 0      4
    1    NaN
  1 0      6
    1      7

Now, reshape() will work as intended.

shape = map(len, frame.index.levels)
print(frame.values.reshape(shape))

which outputs

[[[  0.   1.]
  [  2.   3.]]

 [[  4.  nan]
  [  6.   7.]]]

The (rather ugly) one-liner is

frame.reindex(list(product(*map(tuple, frame.index.levels)))).values\
     .reshape(map(len, frame.index.levels))

edited Oct 19, 2020 at 19:36

answered Jan 27, 2016 at 23:08

Igor Raush

15.3k1 gold badge38 silver badges58 bronze badges

Sign up to request clarification or add additional context in comments.

6 Comments

hannes Over a year ago

works nicely! there's a minor typo: frame.reindex(levels) should be frame.reindex(index).

Ike Over a year ago

For us noobies; don't forget that in python3 you'll need to turn the result of 'map' into a list before any of this will work. ie. shape = list(map(len, frame.index.levels))

cbarrick Over a year ago

Getting the shape is more straight forward: frame.index.levshape. Neither this nor the given solution seem to work with non-unique indices.

CrabMan Over a year ago

Doing df.index.labels I get AttributeError: 'MultiIndex' object has no attribute 'labels'. What's up with that?

Zz'Rot Over a year ago

@CrabMan Very late response, but MultiIndex.labels has been deprecated in favor of MultiIndex.codes - using the latter should fix the error. (pandas-docs.github.io/pandas-docs-travis/whatsnew/…)

|

G-Higgins · Accepted Answer · 2021-09-10 15:13:40Z

This can be done quite nicely using the Python xarray package which can be found here: http://xarray.pydata.org/en/stable/. It has great integration with Pandas and is quite intuitive once you get to grips with it.

If you have a multiindex series you can call the built-in method multiindex_series.to_xarray() (https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.to_xarray.html). This will generate a DataArray object, which is essentially a name-indexed numpy array, using the index values and names as coordinates. Following this you can call .values on the DataArray object to get the underlying numpy array.

If you need your tensor to conform to a set of keys in a specific order, you can also call .reindex(index_name = index_values_in_order) (http://xarray.pydata.org/en/stable/generated/xarray.DataArray.reindex.html) on the DataArray. This can be extremely useful and makes working with the newly generated tensor much easier!

Collectives™ on Stack Overflow

Transform Pandas DataFrame with n-level hierarchical index into n-D Numpy array

Question

Example

2 Answers 2

6 Comments

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

Question

Example

2 Answers 2

6 Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related