Consider the np array sample below:
import numpy as np
arr = np.array([[1,2,5, 4,2,7, 5,2,9],
[4,4,1, 4,2,0, 3,6,4],
[1,2,1, 4,2,2, 5,2,0],
[1,2,7, 2,4,1, 5,2,8],
[1,2,9, 4,2,8, 5,2,1],
[4,2,0, 4,4,1, 5,2,4],
[4,4,0, 4,2,6, 3,6,6],
[1,2,1, 4,2,2, 5,2,0]])
PROBLEM: We are concerned only with the first TWO columns of each element triplet. I want to remove array rows that duplicate these two elements of each triplet (in the same order).
In the example above, the rows with indices 0,2,4, and 7 are all of the form [1,2,_, 4,2,_, 5,2,_]. So, we should keep arr[0],and drop the other three. Similarly, row[6] is dropped because it has the same pattern as row[1], namely [4,4,_, 4,2,_, 3,6,_].
In the example given, the output should look like:
[[1,2,5, 4,2,7, 5,2,9],
[4,4,1 4,2,0, 3,6,4],
[1,2,7, 2,4,1, 5,2,8],
[4,2,0, 4,4,1 5,2,4]]
The part I'm struggling with most is that the solution should be general enough to handle arrays of 3, 6, 9, 12... columns. (always a multiple of 3, and we are always interested in duplications of the first two columns of each triplet.
np.deleteyou are really constructing a new array with the selected rows or columns. So identifying what you want to keep (conversely drop) and actually creating the new array are separate steps.