Panda-Column as index for numpy array

Question

How can I use a panda row as index for a numpy array? Say I have

>>> grid = arange(10,20)
>>> df = pd.DataFrame([0,1,1,5], columns=['i'])

I would like to do

>>> df['j'] = grid[df['i']]
IndexError: unsupported iterator index

What is a short and clean way to actually perform this operation?

Update

To be precise, I want an additional column that has the values that correspond to the indices that the first column contains: df['j'][0] = grid[df['i'][0]] in column 0 etc

expected output:

Parallel Case: Numpy-to-Numpy

Just to show where the idea comes from, in standard python / numpy, if you have

>>> keys = [0, 1, 1, 5]
>>> grid = arange(10,20)
>>> grid[keys]
Out[30]: array([10, 11, 11, 15])

Which is exactly what I want to do. Only that my keys are not stored in a vector, they are stored in a column.

Jeff · Accepted Answer · 2014-05-14 14:09:51Z

5

This is a numpy bug that surfaced with pandas 0.13.0 / numpy 1.8.0.

You can do:

In [5]: grid[df['i'].values]
Out[5]: array([0, 1, 1, 5])

In [6]: Series(grid)[df['i']]
Out[6]: 
i
0    0
1    1
1    1
5    5
dtype: int64

This matches your output. You can assign an array to a column, as long as the length of the array/list is the same as the frame (otherwise how would you align it?)

In [14]: grid[keys]
Out[14]: array([10, 11, 11, 15])

In [15]: df['j'] = grid[df['i'].values]


In [17]: df
Out[17]: 
   i   j
0  0  10
1  1  11
2  1  11
3  5  15

edited May 14, 2014 at 14:09

answered May 14, 2014 at 12:48

Jeff

130k21 gold badges223 silver badges189 bronze badges

Sign up to request clarification or add additional context in comments.

8 Comments

FooBar Over a year ago

Since I don't really understand the logic that is applying here: How would I affect the index? This now has df['i'] has an index, but I'd like it to have the index of df.

Jeff Over a year ago

what are you trying to do? its not really a good idea to index into a numpy array using a pandas structure, because the numpy array doesn't know about indexes or anything.

FooBar Over a year ago

I want to add the series to the original dataframe.

Jeff Over a year ago

df['i'] = grid will work as long as the length of the frame is the same as the length of the array

FooBar Over a year ago

df['j'] = pd.Series(grid)[df['i']] gave me TypeError: incompatible index of inserted column with frame index. And df['j'] = grid is probably not what you meant - the latter has a completely different size?

|

Collectives™ on Stack Overflow

Panda-Column as index for numpy array

1 Answer 1

8 Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

8 Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related