Adding values to Pandas dataframe from a second dataframe

Question

I have two pandas dataframes like this:

category    time         day      1         2           3

a            24           1      15         35          20        
a            22           1      40         35          15                       


category     day    time      total

 b            1      19         10 
 b            1      20         10   
 b            1      21         10  
 b            1      22         10  
 b            1      23         10 
 b            1      24         10   
 b            2      1          10

 a            1      0          10  
 a            1      1          10  
 a            1      2          10  
 a            1      3          10  
 a            1      4          10  
 a            1      5          10  
 a            1      6          10  
 a            1      7          10  
 a            1      8          10  
 a            1      9          10  
 a            1      10         10  
 a            1      11         10  
 a            1      12         10  
 a            1      13         10  
 a            1      14         10  
 a            1      15         10  
 a            1      16         10  
 a            1      17         10  
 a            1      18         10       
 a            1      19         10 
 a            1      20         10   
 a            1      21         10  
 a            1      22         10  
 a            1      23         10 
 a            1      24         10   
 a            2      1          10

I would like to add the element in column '2' in the first dataframe to the corresponding element in the second dataframe, the element in column '1' should be added to the cell above and the one in column '3' to the cell belove

Rendering this result:

category     day    time      total

 b            1      19         10 
 b            1      20         10   
 b            1      21         10  
 b            1      22         10  
 b            1      23         10 
 b            1      24         10   
 b            2      1          10

 a            1      0          10  
 a            1      1          10  
 a            1      2          10  
 a            1      3          10  
 a            1      4          10  
 a            1      5          10  
 a            1      6          10  
 a            1      7          10  
 a            1      8          10  
 a            1      9          10  
 a            1      10         10  
 a            1      11         10  
 a            1      12         10  
 a            1      13         10  
 a            1      14         10  
 a            1      15         10  
 a            1      16         10  
 a            1      17         10  
 a            1      18         10       
 a            1      19         10
 a            1      20         10   
 a            1      21         10      + 40
 a            1      22         10      + 35
 a            1      23         10 + 15 + 15 
 a            1      24         10 + 35   
 a            2      1          10 + 20

Right now I use a loop but it is to slow, and I need something faster:

for row in df_1:
            date = row[0]
            time = row[1]
            category = row[2]
            total = row[3:]
            index = df_2.index[(df_2['date'] == date) & (df_2['time'] == time)].[0] & (df_2['category'] == category)]

            df_2['total'].iloc[index - 1 : index + 1 + 1 ] = df_2['total'].iloc[index - 1 : index + 1 + 1] + total

Is there a good way to do this using pandas? Should I set the index of my second data frame to "day" and "time" to be able to retrieve them fast? I have a very large dataset so it is important to me that the approach I choose is fast.

Why has your first dataframe duplicate column names? Can you supply a minimal reproducible example? — jpp
– jpp, Commented Jun 12, 2018 at 15:01

Scott Boston · Accepted Answer · 2018-06-12 15:40:18Z

1

Let's see if this is faster:

df11 = df1.melt(['time','day'])

df12 = (df11.rename(columns={'value':'total'})
            .set_index(['day',
                     df11['time']+df11.groupby(['time','day']).cumcount()-1])
            .drop(['time','variable'], axis=1)
            .rename_axis(['day','time']).sum(level=[0,1]))

df_out = df2.set_index(['day','time']).add(df12, fill_value=0).reset_index()

print(df_out)

Output:

   day  time  total
0    1    14   10.0
1    1    15   50.0
2    1    16   45.0
3    1    17   40.0
4    1    18   45.0
5    1    19   30.0

edited Jun 12, 2018 at 15:40

answered Jun 12, 2018 at 15:23

Scott Boston

154k15 gold badges160 silver badges207 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

dagrun Over a year ago

Hi. I am sure this is a great solution, but unfortunately, I am having trouble understanding the steps to be able to adjust it to work for my case. I have changed my example a bit now to be closer to my use case. I have added an extra column that I am aslo using to look for the matching rows called category. In addition, my example now shows how the time changes when a new day starts. I have all the rows in correct order, so I still want to add to the next row when rapping around midnight. Can you help me by adjusting your answer to fit my edited question? Thank you so much for all the help!

Scott Boston Over a year ago

@dagrun this add a bit of complexity to the question. I think you're going to need to create a datetime column so you can add hours to handle the midnight to morning addition.

Collectives™ on Stack Overflow

Adding values to Pandas dataframe from a second dataframe

1 Answer 1

2 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

2 Comments

Your Answer

Sign up or log in

Post as a guest

Related