0

I have two pandas dataframes like this:

category    time         day      1         2           3

a            24           1      15         35          20        
a            22           1      40         35          15                       


category     day    time      total

 b            1      19         10 
 b            1      20         10   
 b            1      21         10  
 b            1      22         10  
 b            1      23         10 
 b            1      24         10   
 b            2      1          10

 a            1      0          10  
 a            1      1          10  
 a            1      2          10  
 a            1      3          10  
 a            1      4          10  
 a            1      5          10  
 a            1      6          10  
 a            1      7          10  
 a            1      8          10  
 a            1      9          10  
 a            1      10         10  
 a            1      11         10  
 a            1      12         10  
 a            1      13         10  
 a            1      14         10  
 a            1      15         10  
 a            1      16         10  
 a            1      17         10  
 a            1      18         10       
 a            1      19         10 
 a            1      20         10   
 a            1      21         10  
 a            1      22         10  
 a            1      23         10 
 a            1      24         10   
 a            2      1          10

I would like to add the element in column '2' in the first dataframe to the corresponding element in the second dataframe, the element in column '1' should be added to the cell above and the one in column '3' to the cell belove

Rendering this result:

category     day    time      total

 b            1      19         10 
 b            1      20         10   
 b            1      21         10  
 b            1      22         10  
 b            1      23         10 
 b            1      24         10   
 b            2      1          10

 a            1      0          10  
 a            1      1          10  
 a            1      2          10  
 a            1      3          10  
 a            1      4          10  
 a            1      5          10  
 a            1      6          10  
 a            1      7          10  
 a            1      8          10  
 a            1      9          10  
 a            1      10         10  
 a            1      11         10  
 a            1      12         10  
 a            1      13         10  
 a            1      14         10  
 a            1      15         10  
 a            1      16         10  
 a            1      17         10  
 a            1      18         10       
 a            1      19         10
 a            1      20         10   
 a            1      21         10      + 40
 a            1      22         10      + 35
 a            1      23         10 + 15 + 15 
 a            1      24         10 + 35   
 a            2      1          10 + 20

Right now I use a loop but it is to slow, and I need something faster:

for row in df_1:
            date = row[0]
            time = row[1]
            category = row[2]
            total = row[3:]
            index = df_2.index[(df_2['date'] == date) & (df_2['time'] == time)].[0] & (df_2['category'] == category)]

            df_2['total'].iloc[index - 1 : index + 1 + 1 ] = df_2['total'].iloc[index - 1 : index + 1 + 1] + total

Is there a good way to do this using pandas? Should I set the index of my second data frame to "day" and "time" to be able to retrieve them fast? I have a very large dataset so it is important to me that the approach I choose is fast.

2
  • 1
    Why has your first dataframe duplicate column names? Can you supply a minimal reproducible example? Commented Jun 12, 2018 at 15:01
  • Ooops. A simple mistake on my part Commented Jun 13, 2018 at 7:20

1 Answer 1

1

Let's see if this is faster:

df11 = df1.melt(['time','day'])

df12 = (df11.rename(columns={'value':'total'})
            .set_index(['day',
                     df11['time']+df11.groupby(['time','day']).cumcount()-1])
            .drop(['time','variable'], axis=1)
            .rename_axis(['day','time']).sum(level=[0,1]))

df_out = df2.set_index(['day','time']).add(df12, fill_value=0).reset_index()

print(df_out)

Output:

   day  time  total
0    1    14   10.0
1    1    15   50.0
2    1    16   45.0
3    1    17   40.0
4    1    18   45.0
5    1    19   30.0
Sign up to request clarification or add additional context in comments.

2 Comments

Hi. I am sure this is a great solution, but unfortunately, I am having trouble understanding the steps to be able to adjust it to work for my case. I have changed my example a bit now to be closer to my use case. I have added an extra column that I am aslo using to look for the matching rows called category. In addition, my example now shows how the time changes when a new day starts. I have all the rows in correct order, so I still want to add to the next row when rapping around midnight. Can you help me by adjusting your answer to fit my edited question? Thank you so much for all the help!
@dagrun this add a bit of complexity to the question. I think you're going to need to create a datetime column so you can add hours to handle the midnight to morning addition.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.