1

How can I use a panda row as index for a numpy array? Say I have

>>> grid = arange(10,20)
>>> df = pd.DataFrame([0,1,1,5], columns=['i'])

I would like to do

>>> df['j'] = grid[df['i']]
IndexError: unsupported iterator index

What is a short and clean way to actually perform this operation?

Update

To be precise, I want an additional column that has the values that correspond to the indices that the first column contains: df['j'][0] = grid[df['i'][0]] in column 0 etc

expected output:

index i j 
    0 0 10
    1 1 11
    2 1 11
    3 5 15 

Parallel Case: Numpy-to-Numpy

Just to show where the idea comes from, in standard python / numpy, if you have

>>> keys = [0, 1, 1, 5]
>>> grid = arange(10,20)
>>> grid[keys]
Out[30]: array([10, 11, 11, 15])

Which is exactly what I want to do. Only that my keys are not stored in a vector, they are stored in a column.

1 Answer 1

5

This is a numpy bug that surfaced with pandas 0.13.0 / numpy 1.8.0.

You can do:

In [5]: grid[df['i'].values]
Out[5]: array([0, 1, 1, 5])

In [6]: Series(grid)[df['i']]
Out[6]: 
i
0    0
1    1
1    1
5    5
dtype: int64

This matches your output. You can assign an array to a column, as long as the length of the array/list is the same as the frame (otherwise how would you align it?)

In [14]: grid[keys]
Out[14]: array([10, 11, 11, 15])

In [15]: df['j'] = grid[df['i'].values]


In [17]: df
Out[17]: 
   i   j
0  0  10
1  1  11
2  1  11
3  5  15
Sign up to request clarification or add additional context in comments.

8 Comments

Since I don't really understand the logic that is applying here: How would I affect the index? This now has df['i'] has an index, but I'd like it to have the index of df.
what are you trying to do? its not really a good idea to index into a numpy array using a pandas structure, because the numpy array doesn't know about indexes or anything.
I want to add the series to the original dataframe.
df['i'] = grid will work as long as the length of the frame is the same as the length of the array
df['j'] = pd.Series(grid)[df['i']] gave me TypeError: incompatible index of inserted column with frame index. And df['j'] = grid is probably not what you meant - the latter has a completely different size?
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.