2

I have a pandas dataframe containing strings:

df = pd.DataFrame({'column1': ['One_Two_Three', 'First_Second_Third', 'nrOne_nrTwo_nrThree'], 'column2': ['nrOne_nrTwo_nrThree', 'First_Second_Third', 'One_Two_Three'], 'column3': ['First_Second_Third', 'One_Two_Three', 'nrOne_nrTwo_nrThree'],})
Out[0]: df 
               column1              column2              column3
0        One_Two_Three  nrOne_nrTwo_nrThree   First_Second_Third
1   First_Second_Third   First_Second_Third        One_Two_Three
2  nrOne_nrTwo_nrThree        One_Two_Three  nrOne_nrTwo_nrThree

I would like to end up with three dataframes, so that the first one contain the characters before the first underscore, the second one before the second underscore and the third contain the last part. For the first like:

    df_one
    Out[1]: 
               column1              column2              column3
0              One                  nrOne                First
1              First                First                One
2              nrOne                One                  nrOne

I've tried

df_temp = df.apply(lambda x: x.str.split('_'))

df_temp
Out[2]: 
                   column1                  column2                  column3
0        [One, Two, Three]  [nrOne, nrTwo, nrThree]   [First, Second, Third]
1   [First, Second, Third]   [First, Second, Third]        [One, Two, Three]
2  [nrOne, nrTwo, nrThree]        [One, Two, Three]  [nrOne, nrTwo, nrThree]

To split it into lists and

df_temp.apply(lambda x: x[0])
Out[3]: 
  column1  column2 column3
0     One    nrOne   First
1     Two    nrTwo  Second
2   Three  nrThree   Third

But this ends up affecting only the first row. Anyone who have a solution?

2
  • This Apply pandas function to column to create multiple new columns? might help. Moreover, I believe that df["column1"].apply(lambda s: pd.Series(s.split("_"))) should return many columns. Commented Dec 21, 2021 at 15:00
  • Thanks for the answer. I found that using pandas applymap instead of apply works well in my case Commented Dec 21, 2021 at 15:08

1 Answer 1

1

One solution is to use applymap:

df_temp.applymap(lambda x: x[0])
Out[0]: 
  column1 column2 column3
0     One   nrOne   First
1   First   First     One
2   nrOne     One   nrOne

Another is to use apply on a Series, by stacking and unstacking:

df_temp.stack().apply(lambda x: x[0]).unstack()
Out[0]: 
  column1 column2 column3
0     One   nrOne   First
1   First   First     One
2   nrOne     One   nrOne
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.