3

I am new to Pandas framework and I have searched enough to resolve my issue but did not get much help online.

I have a string column as given below and I want to convert it into separate columns. My problem here is I have tried splitting it but it did not give me the output the way I need.

*-----------------------------------------------------------------------------*
|  Total Visitor                                                              |
*-----------------------------------------------------------------------------*
|  2x Adult, 1x Adult + Audio Guide                                           |
|  2x Adult, 2x Youth, 1x Children                                            | 
|  5x Adult + Audio Guide, 1x Children + Audio Guide, 1x Senior + Audio Guide |
*-----------------------------------------------------------------------------*

here is the code I used to split my string but did not give me expected output.

df = data["Total Visitor"].str.split(",", n = 1, expand = True)

My Expected Output should be as following table after splitting the string:

*----------------------------------------------------------------------------------------------------------------*
|  Adult    | Adult + Audio Guide    | Youth   | Children    | Children + AG        | Senior + AG                                                                       
*----------------------------------------------------------------------------------------------------------------*
|  2x Adult | 1x Adult + Audio Guide |    -    |       -     |    -                    | -  
|
|  2x Adult |          -             |2x Youth | 1x Children |    -                    | -                               
|      -    | 5x Adult + Audio Guide |    -    |      -      |1x Children + Audio Guide| 1x Senior + Audio Guide |
*----------------------------------------------------------------------------------------------------------------*

How can I do this? Any help or guidance would be great.

2 Answers 2

6

Idea is create list of dictionaries with keys of removed numbers with x by regex - ^\d+x\s+ (^ is start of string, \d+ is one or more integers and \s+ is one or more whitespaces) and pass to DataFrame constructor:

import re

L =[dict([(re.sub('^\d+x\s+',"",y),y) for y in x.split(', ')]) for x in df['Total Visitor']]

df = pd.DataFrame(L).fillna('-')
print (df)
      Adult     Adult + Audio Guide     Youth     Children  \
0  2x Adult  1x Adult + Audio Guide         -            -   
1  2x Adult                       -  2x Youth  1x Children   
2         -  5x Adult + Audio Guide         -            -   

      Children + Audio Guide     Senior + Audio Guide  
0                          -                        -  
1                          -                        -  
2  1x Children + Audio Guide  1x Senior + Audio Guide  

Another similar idea is split by x for columns names from keys of dicts:

L = [dict([(y.split('x ')[1], y) for y in x.split(', ')]) for x in df['Total Visitor']]

df = pd.DataFrame(L).fillna('-')
Sign up to request clarification or add additional context in comments.

Comments

2

Here is a way using pandas methods:

dstack = df['Total Visitor'].str.split(',', expand=True).stack().str.strip().to_frame()
dstack['cols'] = dstack[0].str.extract(r'\d+x\s(.*)')
df_out = dstack.set_index('cols', append=True)[0].reset_index(level=1, drop=True).unstack()
df_out

Output:

cols     Adult     Adult + Audio Guide     Children     Children + Audio Guide     Senior + Audio Guide     Youth
0     2x Adult  1x Adult + Audio Guide          NaN                        NaN                      NaN       NaN
1     2x Adult                     NaN  1x Children                        NaN                      NaN  2x Youth
2          NaN  5x Adult + Audio Guide          NaN  1x Children + Audio Guide  1x Senior + Audio Guide       NaN

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.