1

I'm trying to check the cartesian distance between each set of points in one dataframe to sets of scattered points in another dataframe, to see if the input gets above a threshold 'distance' of my checking points.

I have this working with nested for loops, but is painfully slow (~7 mins for 40k input rows, each checked vs ~180 other rows, + some overhead operations).

Here is what I'm attempting in vectorialized format - 'for every pair of points (a,b) from df1, if the distance to ANY point (d,e) from df2 is > threshold, print "yes" into df1.c, next to input points.

..but I'm getting unexpected behavior from this. With given data, all but one distances are > 1, but only df1.1c is getting 'yes'.

Thanks for any ideas - the problem is probably in the 'df1.loc...' line:

import numpy as np
from pandas import DataFrame

inp1 = [{'a':1, 'b':2, 'c':0}, {'a':1,'b':3,'c':0}, {'a':0,'b':3,'c':0}]
df1 = DataFrame(inp1)

inp2 = [{'d':2, 'e':0}, {'d':0,'e':3}, {'d':0,'e':4}]
df2 = DataFrame(inp2)

threshold = 1

df1.loc[np.sqrt((df1.a - df2.d) ** 2 + (df1.b - df2.e) ** 2) >   threshold, 'c'] = "yes"

print(df1)
print(df2)

   a  b    c
0  1  2  yes
1  1  3    0
2  0  3    0

   d  e
0  2  0
1  0  3
2  0  4
2
  • This IS the expected behavior. as you said all but one distances > 1, and this is the one marked as yes in the C column. Commented Oct 12, 2017 at 20:49
  • There are 3x3 distances to check, and so 8 out of 9 are >1. All input rows exceed dist = 1, so all should get the yes. Commented Oct 13, 2017 at 13:28

2 Answers 2

1

Here is an idea to help you to start...

Source DFs:

In [170]: df1
Out[170]:
   c  x  y
0  0  1  2
1  0  1  3
2  0  0  3

In [171]: df2
Out[171]:
   x  y
0  2  0
1  0  3
2  0  4

Helper DF with cartesian product:

In [172]: x = df1[['x','y']] \
                 .reset_index() \
                 .assign(k=0).merge(df2.assign(k=0).reset_index(), 
                                    on='k', suffixes=['1','2']) \
                 .drop('k',1)


In [173]: x
Out[173]:
   index1  x1  y1  index2  x2  y2
0       0   1   2       0   2   0
1       0   1   2       1   0   3
2       0   1   2       2   0   4
3       1   1   3       0   2   0
4       1   1   3       1   0   3
5       1   1   3       2   0   4
6       2   0   3       0   2   0
7       2   0   3       1   0   3
8       2   0   3       2   0   4

now we can calculate the distance:

In [169]: x.eval("D=sqrt((x1 - x2)**2 + (y1 - y2)**2)", inplace=False)
Out[169]:
   index1  x1  y1  index2  x2  y2         D
0       0   1   2       0   2   0  2.236068
1       0   1   2       1   0   3  1.414214
2       0   1   2       2   0   4  2.236068
3       1   1   3       0   2   0  3.162278
4       1   1   3       1   0   3  1.000000
5       1   1   3       2   0   4  1.414214
6       2   0   3       0   2   0  3.605551
7       2   0   3       1   0   3  0.000000
8       2   0   3       2   0   4  1.000000

or filter:

In [175]: x.query("sqrt((x1 - x2)**2 + (y1 - y2)**2) > @threshold")
Out[175]:
   index1  x1  y1  index2  x2  y2
0       0   1   2       0   2   0
1       0   1   2       1   0   3
2       0   1   2       2   0   4
3       1   1   3       0   2   0
5       1   1   3       2   0   4
6       2   0   3       0   2   0
Sign up to request clarification or add additional context in comments.

1 Comment

I've got that working, thanks. Maybe it was better to still have this done in a few steps in stead of all at once like I was trying with the one-liner. Still 'vectorially' it should be faster once I try it on the real data.
1

Try using scipy implementation, it is surprisingly fast

scipy.spatial.distance.pdist

https://docs.scipy.org/doc/scipy/reference/generated/scipy.spatial.distance.pdist.html

or

scipy.spatial.distance_matrix

https://docs.scipy.org/doc/scipy-0.19.1/reference/generated/scipy.spatial.distance_matrix.html

1 Comment

Thanks, I'll check it out. Though trying to minimize extra packages as they are not trivial to set up on our environment.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.