How to split a string in a pandas dataframe, and return multiple dataframes

Question

I have a pandas dataframe containing strings:

df = pd.DataFrame({'column1': ['One_Two_Three', 'First_Second_Third', 'nrOne_nrTwo_nrThree'], 'column2': ['nrOne_nrTwo_nrThree', 'First_Second_Third', 'One_Two_Three'], 'column3': ['First_Second_Third', 'One_Two_Three', 'nrOne_nrTwo_nrThree'],})
Out[0]: df 
               column1              column2              column3
0        One_Two_Three  nrOne_nrTwo_nrThree   First_Second_Third
1   First_Second_Third   First_Second_Third        One_Two_Three
2  nrOne_nrTwo_nrThree        One_Two_Three  nrOne_nrTwo_nrThree

I would like to end up with three dataframes, so that the first one contain the characters before the first underscore, the second one before the second underscore and the third contain the last part. For the first like:

    df_one
    Out[1]: 
               column1              column2              column3
0              One                  nrOne                First
1              First                First                One
2              nrOne                One                  nrOne

I've tried

df_temp = df.apply(lambda x: x.str.split('_'))

df_temp
Out[2]: 
                   column1                  column2                  column3
0        [One, Two, Three]  [nrOne, nrTwo, nrThree]   [First, Second, Third]
1   [First, Second, Third]   [First, Second, Third]        [One, Two, Three]
2  [nrOne, nrTwo, nrThree]        [One, Two, Three]  [nrOne, nrTwo, nrThree]

To split it into lists and

df_temp.apply(lambda x: x[0])
Out[3]: 
  column1  column2 column3
0     One    nrOne   First
1     Two    nrTwo  Second
2   Three  nrThree   Third

But this ends up affecting only the first row. Anyone who have a solution?

This Apply pandas function to column to create multiple new columns? might help. Moreover, I believe that df["column1"].apply(lambda s: pd.Series(s.split("_"))) should return many columns. — Felipe Whitaker
– Felipe Whitaker, Commented Dec 21, 2021 at 15:00
Thanks for the answer. I found that using pandas applymap instead of apply works well in my case — DHJ
– DHJ, Commented Dec 21, 2021 at 15:08

DHJ · Accepted Answer · 2021-12-21 15:23:42Z

1

One solution is to use applymap:

df_temp.applymap(lambda x: x[0])
Out[0]: 
  column1 column2 column3
0     One   nrOne   First
1   First   First     One
2   nrOne     One   nrOne

Another is to use apply on a Series, by stacking and unstacking:

df_temp.stack().apply(lambda x: x[0]).unstack()
Out[0]: 
  column1 column2 column3
0     One   nrOne   First
1   First   First     One
2   nrOne     One   nrOne

edited Dec 21, 2021 at 15:23

answered Dec 21, 2021 at 15:08

DHJ

6315 silver badges18 bronze badges

Sign up to request clarification or add additional context in comments.

Collectives™ on Stack Overflow

How to split a string in a pandas dataframe, and return multiple dataframes

1 Answer 1

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related