creating a dataframe with a nested loop

Question

I’m new to pandas so forgive me if this is an easy question. I would like to create a dataframe by using elements from two different dataframes. What is the best way to do this? It looks like I will have to use some kind of nested for loop:

for a in df1.itertuples()
    for b in df2.itertuples()
        if df1.vega(a) = df2.vega(b)
            delta = df2.gamma(a) – df1.gamma(b)
                if delta > 0 or delta <0
                    zelda = df2.blabla(b)

EDIT: both dataframes have the same column names:

df1 = pd.DataFrame({'vega': [bla1, bla2, bla3, bla4], 'gamma': [242340.1, 466530.2, 325346, 34654365]})
df2 = pd.DataFrame({'vega': [bla6, bla2, bla7, bla4], 'gamma': [3454334, 896530.2, 32543, 34654]})

EDIT2: In this example the new dataframe would be:

 zelda|
 ----- |
430000 |
-34619711 |

EDIT3: Sorry guys, your answers opened my eyes and made me rethink my problem as it seems no for loop at all is needed here. I added a ‘ranking’ column to my dataframes, shuffled the ‘vega’ column in df2 and added the ‘rho’ column to the resulting dataframe. (I actually have more columns in the dataframes, but they are all used in the same way as either ‘gamma’, ‘blabla’, or ‘ranking’ columns.)

Is this still possible using the merge solutions you guys came up with? Again, my apologies for the first incomplete question.

    zelda = df2.blabla(b)
    rho = df1.ranking(a)

df1 = pd.DataFrame({'vega': [bla1, bla2, bla3, bla4], 'gamma': [242340.1, 466530.2, 325346, 34654365, ‘blabla’: [regina, cactus, galileo, viking], ‘ranking’: [11, 34,65,46]]})
df2 = pd.DataFrame({'vega': [bla2, bla6, bla7, bla4], 'gamma': [3454334, 896530.2, 32543, 34654], ‘blabla’: [lucy, fletnix, ingrid, pablo], ‘ranking’: [45, 4,5,3]})

In this new example, the new dataframe would be:

    delta|  zelda| rho|vega
---------|-------|----|----
2987803,8|fletnix|  34|bla2
-34619711|  pablo|  46|bla4

Can you provide sample data on the dataframes you are trying to bring together? — Andrew L
– Andrew L, Commented May 16, 2017 at 17:26
both dataframes have the same column names: df1 = pd.DataFrame({'vega': [bla1, bla2, bla3, bla4], 'gamma': [242340.1, 466530.2, 325346, 34654365]}) df2 = pd.DataFrame({'vega': [bla6, bla2, bla7, bla4], 'gamma': [3454334, 896530.2, 32543, 34654]}) — Jamie Lee
– Jamie Lee, Commented May 16, 2017 at 17:35

MaxU - stand with Ukraine · Accepted Answer · 2017-05-16 18:05:37Z

2

IIUC:

In [56]: df1.merge(df2, on='vega', suffixes=['1','2']) \
    ...:    .eval("zelda = gamma2 - gamma1", inplace=False)[['zelda']]
    ...:
Out[56]:
        zelda
0    430000.0
1 -34619711.0

edited May 16, 2017 at 18:05

answered May 16, 2017 at 17:46

MaxU - stand with Ukraine

212k37 gold badges402 silver badges436 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

Kyle · Accepted Answer · 2017-05-16 17:53:12Z

1

Is this what you are looking for?

mask = df1.vega == df2.vega
delta = df1[mask].gamma - df2[mask].gamma

delta[delta != 0]
1     -430000.0
3    34619711.0
Name: gamma, dtype: float64

answered May 16, 2017 at 17:53

Kyle

2,9342 gold badges21 silver badges30 bronze badges

Comments

Scott Boston · Accepted Answer · 2017-05-16 18:12:10Z

1

Or you can use set_index and subract dataframes using intrinsic index alignment nature of Pandas.

df1.set_index('vega')
df2.set_index('vega')
(df2 - df1).dropna().rename(columns={'gamma':'zelda'})

output:

          zelda
vega            
bla2    430000.0
bla4 -34619711.0

answered May 16, 2017 at 18:12

Scott Boston

154k15 gold badges160 silver badges207 bronze badges

Comments

xgord · Accepted Answer · 2017-05-16 22:18:31Z

Another option, create a dataframe merged on the vega column and add columns to it for your new data. Setting how='inner' in the merge call keeps only the rows where that vega value appeared in both df1 and df2.

joint = df1.merge(df2, on='vega', how='inner')

      gamma_x  vega   gamma_y
0    466530.2  bla2  896530.2
1  34654365.0  bla4   34654.0

Then pandas lets you easily subtract the values in one column from another:

joint['delta'] = joint['gamma_y'] - joint['gamma_x']

      gamma_x  vega   gamma_y       delta
0    466530.2  bla2  896530.2    430000.0
1  34654365.0  bla4   34654.0 -34619711.0

And you can select the rows that meet your criteria:

zelda = joint[(joint['delta'] > 0) | (joint['delta'] < 0)]['delta']

Update: response to your edit 3

Yes, you can do what you want. Just try it out for yourself. When you merge the two dataframes together, the non-matching columns from the df1 will become colname_x and the non-matching columns from df2 will become colname_y. So you can select the columns you want based on that.

joint = df1.merge(df2, on='vega', how='inner')
joint['delta'] = joint['gamma_y'] - joint['gamma_x']
joint['rho'] = joint['ranking_x'] # select the ranking column from df1
joint['zelda'] = joint['blabla_y'] # select the blabla column from df2
joint[['delta', 'zelda', 'rho', 'vega']] # select only the columns from your example output

        delta  zelda  rho  vega
0   2987803.8   lucy   34  bla2
1 -34619711.0  pablo   46  bla4

Thanks a lot guys: I was able to solve two problems at once thnaks to your answers

Collectives™ on Stack Overflow

creating a dataframe with a nested loop

4 Answers 4

Comments

Comments

Comments

Update: response to your edit 3

1 Comment

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

4 Answers 4

Comments

Comments

Comments

Update: response to your edit 3

1 Comment

Your Answer

Sign up or log in

Post as a guest

Related