0

Consider the np array sample below:

import numpy as np

arr = np.array([[1,2,5,  4,2,7,  5,2,9],
                [4,4,1,  4,2,0,  3,6,4],
                [1,2,1,  4,2,2,  5,2,0],
                [1,2,7,  2,4,1,  5,2,8],
                [1,2,9,  4,2,8,  5,2,1],
                [4,2,0,  4,4,1,  5,2,4],
                [4,4,0,  4,2,6,  3,6,6],
                [1,2,1,  4,2,2,  5,2,0]])

PROBLEM: We are concerned only with the first TWO columns of each element triplet. I want to remove array rows that duplicate these two elements of each triplet (in the same order).

In the example above, the rows with indices 0,2,4, and 7 are all of the form [1,2,_, 4,2,_, 5,2,_]. So, we should keep arr[0],and drop the other three. Similarly, row[6] is dropped because it has the same pattern as row[1], namely [4,4,_, 4,2,_, 3,6,_]. In the example given, the output should look like:

               [[1,2,5,  4,2,7,  5,2,9],
                [4,4,1   4,2,0,  3,6,4],
                [1,2,7,  2,4,1,  5,2,8],
                [4,2,0,  4,4,1   5,2,4]]

The part I'm struggling with most is that the solution should be general enough to handle arrays of 3, 6, 9, 12... columns. (always a multiple of 3, and we are always interested in duplications of the first two columns of each triplet.

2
  • What's the significance of the gap in columns? Is this array (8,9) or (8,3,3) shape? Commented Oct 4, 2020 at 23:34
  • Rather than focus on what you want to remove, pay more attention to what you want to keep. Even when you use a function like np.delete you are really constructing a new array with the selected rows or columns. So identifying what you want to keep (conversely drop) and actually creating the new array are separate steps. Commented Oct 4, 2020 at 23:37

1 Answer 1

3

If you can create an array withonly the values you are interested in, you can pass that to np.unique() which has an option to return_index.

One way to get the groups you want is to delete every third column. Pass that to np.unique() and get the indices:

import numpy as np

arr = np.array([[1,2,5,  4,2,7,  5,2,9],
                [4,4,1,   4,2,0,  3,6,4],
                [1,2,1,  4,2,2,  5,2,0],
                [1,2,7,  2,4,1,  5,2,8],
                [1,2,9,  4,2,8,  5,2,1],
                [4,2,0,  4,4,1,   5,2,4],
                [4,4,0,  4,2,6,  3,6,6],
                [1,2,1,  4,2,2,  5,2,0]])



unique_cols = np.delete(arr, slice(2, None, 3), axis=1)
vals, indices = np.unique(unique_cols, axis=0, return_index=True)

arr[sorted(indices)]

output:

array([[1, 2, 5, 4, 2, 7, 5, 2, 9],
       [4, 4, 1, 4, 2, 0, 3, 6, 4],
       [1, 2, 7, 2, 4, 1, 5, 2, 8],
       [4, 2, 0, 4, 4, 1, 5, 2, 4]])
Sign up to request clarification or add additional context in comments.

1 Comment

Works very efficiently on large arrays.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.