1

I've got a question on identifying patterns within an array. I'm working with the following array:

A = [1.0, 1.1, 9.0, 9.2, 0.9, 9.1, 1.0, 1.0, 1.2, 9.2, 8.9, 1.1]

Now, this array is clearly made of elements clustering about ~1 and elements about ~9.

Is there a way to separate these clusters? I.e., to get to something like:

a_1 = [1.0, 1.1, 0.9, 1.0, 1.0, 1.2, 1.1]  # elements around ~1
a_2 = [9.0, 9.2, 9.1, 9.2, 8.9]  # elements around ~9

Thanks a lot. Best.

3
  • What would the delta between related values be? Commented Jan 19, 2018 at 12:14
  • 1
    Sorting sounds like a reasonable first step. Then, determine the mean and distribution? ... Isn't this a Brainstorming kind of question? Commented Jan 19, 2018 at 12:15
  • Yeah I know, I was thinking about sorting as well, but I was wondering whether there is a function or something to identify clusters within a list or array. If not, I'll proceed by sorting and getting the maximum step among values :) Commented Jan 19, 2018 at 12:18

1 Answer 1

1

You can do that by comparing each element with which is closer. Is it closer to 1 or 9:

a_1 = [i for i in A if abs(i-1)<=abs(i-9)]
a_2 = [i for i in A if abs(i-1)>abs(i-9)]

But of course this is not a general solution for clustering. It only work in this case when you know the center of the cluster (1 and 9).

If you don't know the center of the cluster, I think you should use a clustering algorithm like K-Means

This is a simple K-Means implementation (with k=2 and 100 as limit iteration). You didn't need to know the center of the cluster, it picks randomly at first.

from random import randint

A = [1.0, 1.1, 9.0, 9.2, 0.9, 9.1, 1.0, 1.0, 1.2, 9.2, 8.9, 1.1]

x = A[randint(0,len(A)-1)]
y = A[randint(0,len(A)-1)]
for _ in range(100):
    a_1 = [i for i in A if abs(i-x)<=abs(i-y)]    
    a_2 = [i for i in A if abs(i-x)>abs(i-y)]    
    print(x,y)
    x = sum(a_1)/len(a_1)
    y = sum(a_2)/len(a_2)

print a_1
print a_2
Sign up to request clarification or add additional context in comments.

6 Comments

Thanks for the answer! However, I don't know the 1 and 9 values a priori. I'd like to identify those values. That's the point of my question
Instead of abs you can use if round(i) == 1 right ?
@VikasDamodar I don't think it works correctly when there are numbers near in the middle, let say 7
@urgeo in that case, I think you need a clustering algorithm, may be K-Means
@urgeo what malioboro said is exactly correct. You need some base for classification of these numbers ..other wise some numbers won't get as listed ex: 5
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.