I have a dataframe df1 that holds rows for thousands of trades between a certain country and the United Kingdom. I also have another dataframe df2 that holds a few rows for distances between certain countries and the United Kingdom.
Example:
x1 = [{'partner': "Afghanistan", 'trade_value':100},
{'partner':"Zambia",'trade_value':110},
{'partner': "Germany",'trade_value':120},
{'partner': "Afghanistan",'trade_value':150},
{'partner': "USA",'trade_value':1120}]
df1 = pd.DataFrame(x1)
x2 = [{'country': "Afghanistan", 'distance':1000},
{'country': "USA",'distance':1120}];
df2 = pd.DataFrame(x2)
I want to add a new column to df1 called 'distance' and assign each 'partner' in df1 with its appropriate 'distance' from df2.
Note: sometimes a 'partner' country in df1 is repeated due to multiple trades. Also not all distances are available in df2 so I don't mind leaving a cell in df1 empty.
so far I am trying this to no avail:
#Add new column
df1['distance'] = 0;
for index, row in df1.iterrows():
for index, row2 in df2.iterrows():
if row['partner'] == row2['country']:
df1['distance'].replace(row['distance'], row2['distance'],inplace=True);
I am getting all the distance column filled with 1000, which is the made up distance between Afghanistan and the UK, but all other countries are getting the same value (when some countries should not even be getting any value at all)