2

I have a pandas dataframe where the Items columns is a list.

cust_id   Items
100     ['item1','item2','item3']
101     ['item5','item8','item9']
102     ['item2','item4']

I want to convert the above dataframe to the below format.

cust_id  Items
100     item1 item2 item3
101     item5 item8 item9
102     item2 item4

I tried using the pandas built in string replace function put its returning the original column without actually performing the string replace operation.

df['Items']=(df['Items'].astype(str)).replace({"['":"", "', '":" ", "']":"" },method='string')

Please advise

Update:

I used the below code to create the original dataframe.

df=df1.groupby(['cust_id'])['Items'].apply(list).reset_index()
4
  • Sorry are the elements lists or a string of a list? Can you post code to construct your df to avoid ambiguity Commented Sep 23, 2015 at 12:08
  • Why not just use an apply on the column, and do something like lambda lst: ' '.join(lst) Commented Sep 23, 2015 at 12:10
  • @EdChum I have added the code to reconstruct my original df. Commented Sep 23, 2015 at 13:08
  • @Brian This is new info for me, i will check it out. Commented Sep 23, 2015 at 13:09

1 Answer 1

4

If the elements are really list , then you can use str.join() on the list along with series.apply method . Example -

In [159]: df = pd.DataFrame([[100,['item1','item2','item3']],[101,['item5','item8','item9']],[102,['item2','item4']]],columns=['cust_id','Items'])

In [160]: df
Out[160]:
   cust_id                  Items
0      100  [item1, item2, item3]
1      101  [item5, item8, item9]
2      102         [item2, item4]

In [161]: df['Items'] = df['Items'].apply(' '.join)

In [162]: df
Out[162]:
   cust_id                Items
0      100    item1 item2 item3
1      101    item5 item8 item9
2      102          item2 item4
Sign up to request clarification or add additional context in comments.

7 Comments

You can just use df['Items'].apply(' '.join) here`
what does ' ' mean in the apply function?
@mrcet007 its the space, which is being used inbetween the different elements.
Thanks! Can someone also explain why pd.replace doesnt work 7 what is wrong with the code i tried?
because series.replace() is for replacing whole values, what you actually wanted to use was series.str.replace() . but there you cannot use dict, you would have to provide a regular expression or multiple .str.replace() .
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.