3

I have a dataframe df1 that holds rows for thousands of trades between a certain country and the United Kingdom. I also have another dataframe df2 that holds a few rows for distances between certain countries and the United Kingdom.

Example:

x1 = [{'partner': "Afghanistan", 'trade_value':100}, 
      {'partner':"Zambia",'trade_value':110}, 
      {'partner': "Germany",'trade_value':120},
      {'partner': "Afghanistan",'trade_value':150},
      {'partner': "USA",'trade_value':1120}]

df1 = pd.DataFrame(x1)

x2 = [{'country': "Afghanistan", 'distance':1000}, 
      {'country': "USA",'distance':1120}];

df2 = pd.DataFrame(x2)

I want to add a new column to df1 called 'distance' and assign each 'partner' in df1 with its appropriate 'distance' from df2.

Note: sometimes a 'partner' country in df1 is repeated due to multiple trades. Also not all distances are available in df2 so I don't mind leaving a cell in df1 empty.

so far I am trying this to no avail:

#Add new column
df1['distance'] = 0;

for index, row in df1.iterrows():
    for index, row2 in df2.iterrows():
        if row['partner'] == row2['country']:
            df1['distance'].replace(row['distance'], row2['distance'],inplace=True);

I am getting all the distance column filled with 1000, which is the made up distance between Afghanistan and the UK, but all other countries are getting the same value (when some countries should not even be getting any value at all)

1 Answer 1

5

IIUC you can use .map

x1 = [{'partner':"Afghanistan", 'trade_value':100}, {'partner':"Zambia",'trade_value':110},
      {'partner': "Germany",'trade_value':120},
      {'partner': "Afghanistan",'trade_value':150},{'partner': "USA",'trade_value':1120}]

df1 = pd.DataFrame(x1)

x2 = [{'country': "Afghanistan", 'distance':1000}, {'country': "USA",'distance':1120}]

df2 = pd.DataFrame(x2)

df1['distance'] = df1['partner'].map(df2.set_index('country')['distance'])

print (df1)

       partner  trade_value  distance
0  Afghanistan          100    1000.0
1       Zambia          110       NaN
2      Germany          120       NaN
3  Afghanistan          150    1000.0
4          USA         1120    1120.0
Sign up to request clarification or add additional context in comments.

2 Comments

You're a life saver, I just tested it on my actual dataframe and it not only worked, it took literally less than a second
@HassanDbouk map works faster compared to other alternatives.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.