0

Before downvoting this question and marked as duplicate, let me just explain the issue, i tried all the possible solutions with similar question here on stack, but none of them worked. i also checked, setting an array element with a sequence" error could be improved. #6584

So am training a random forest classifier on 3 different features, all with different dimensions but i reshaped them to to (-1,1), which can fit for training the RF(random forest) model, but it keep on giving the same error again and again as i have tried all the possible things, here are the list of feature functions am using,

here , am computing the color features by simply taking mean/average of images in different color spaces,here am working on RGB,LAB,HSV and GRAY image respectively, as from the code below i have flattened all the possible feature vector array, from different color spaces.

def extract_color_feature(rgb_roi, lab_roi, hsv_roi, gray_roi):
    avg_rgb_per_row = np.average(rgb_roi, axis=0)
    avg_rgb = np.average(avg_rgb_per_row, axis=0).flatten()  

    avg_lab_per_row = np.average(lab_roi, axis=0)
    avg_lab = np.average(avg_lab_per_row, axis=0).flatten()  

    h, s, _ = cv2.split(hsv_roi)
    h_avg = cv2.mean(h)
    s_avg = cv2.mean(s)
    avg_hs = np.hstack([h_avg, s_avg]).flatten() 

    lbp = extract_lbp(gray_roi).flatten()  

    avg_rgb = np.array(avg_rgb, dtype=np.float32).flatten()
    avg_lab = np.array(avg_lab, dtype=np.float32).flatten()
    avg_hs = np.array(avg_hs, dtype=np.float32).flatten()
    lbp = np.array(lbp, dtype=np.float32).flatten()

    avg_color = np.hstack([avg_rgb, avg_lab, avg_hs, lbp])


    return avg_color.flatten() 

in the following function i only computed histogram values from different color spaces again RGB,LAB,HSV color spaces used. as every histogram here performed on single color channel, so depth of every histogram feature will always be 1.

def compute_hist_feature(rgb_seg, hsv_seg, lab_seg, mask):
    b, g, r = cv2.split(rgb_seg)
    h, s, v = cv2.split(hsv_seg)
    l, a, b = cv2.split(lab_seg)

    r_equ = cv2.equalizeHist(r)
    g_equ = cv2.equalizeHist(g)
    b_equ = cv2.equalizeHist(b)
    r_hist = cv2.calcHist([r_equ], [0], mask, [8],
                          [0, 256]).flatten()  
    g_hist = cv2.calcHist([g_equ], [0], mask, [8],
                          [0, 256]).flatten()  
    b_hist = cv2.calcHist([b_equ], [0], mask, [8],
                          [0, 256]).flatten()  

    l_hist = cv2.calcHist([l], [0], mask, [8],
                          [0, 256]).flatten()  
    a_hist = cv2.calcHist([a], [0], mask, [8],
                          [0, 256]).flatten()  
    bb_hist = cv2.calcHist([b], [0], mask, [8],
                           [0, 256]).flatten()  

    h_hist = cv2.calcHist([h], [0], mask,
                          [8], [0, 256]).flatten()  
    s_hist = cv2.calcHist([s], [0], mask,
                          [8], [0, 256]).flatten()  

    h_hist = np.array(h_hist, dtype=np.float32).flatten()
    r_hist = np.array(r_hist, dtype=np.float32).flatten()
    g_hist = np.array(g_hist, dtype=np.float32).flatten()
    b_hist = np.array(b_hist, dtype=np.float32).flatten()
    s_hist = np.array(s_hist, dtype=np.float32).flatten()
    l_hist = np.array(l_hist, dtype=np.float32).flatten()
    a_hist = np.array(a_hist, dtype=np.float32).flatten()
    bb_hist = np.array(bb_hist, dtype=np.float32).flatten()


    hist = np.hstack([r_hist, g_hist, b_hist, h_hist, s_hist, l_hist, a_hist, bb_hist])

    return hist.flatten()  

and finally am using location features , by simply flattened down the (x,y) cordinate list to form a feature array whhich will represent location feautre respectively.

cords = [t[::-1] for t in clusters_.get(disc)]  # reversing the list of tuples

disc_pts = np.array(cords, dtype=np.int32)
loc_feat = np.array(cords, dtype=np.float32).flatten() 

here initially the cords represents to a array with depth 2 coz every pixel have two cordinates so, i flattened it , to form a array with depth of 1.

finally i stacked all the three features to form single feature vector,

feat_vec = np.hstack([loc_feat, color_feat, hist_feat]).flatten()

here i have manually cheked the elements in all three feature vectors, in order to confirm the dtype, dimensions of array are not ambiguous to trigger the error, but everything looks fine to me.

this is the first one, location feature

[  82.  209.   82.  210.   83.  210.   82.  211.   83.  211.   82.  212.
   83.  212.   84.  212.   81.  213.   82.  213.   83.  213.   84.  213.
   81.  214.   82.  214.   83.  214.   84.  214.   81.  215.   82.  215.
   83.  215.   84.  215.   81.  216.   82.  216.   83.  216.   84.  216.
   81.  217.   82.  217.   83.  217.   84.  217.   81.  218.   82.  218.
   83.  218.   84.  218.   85.  218.   81.  219.   82.  219.   83.  219.
   84.  219.   85.  219.   81.  220.   82.  220.   83.  220.   84.  220.
   85.  220.   81.  221.   82.  221.   83.  221.   84.  221.   85.  221.
   81.  222.   82.  222.   83.  222.   84.  222.   85.  222.   86.  222.
   81.  223.   82.  223.   83.  223.   84.  223.   85.  223.   86.  223.
   81.  224.   82.  224.   83.  224.   84.  224.   85.  224.   86.  224.
   81.  225.   82.  225.   83.  225.   84.  225.   85.  225.   86.  225.
   87.  225.   81.  226.   82.  226.   83.  226.   84.  226.   85.  226.
   86.  226.   87.  226.   81.  227.   82.  227.   83.  227.   84.  227.
   85.  227.   86.  227.   87.  227.   82.  228.   83.  228.   84.  228.
   85.  228.   86.  228.   87.  228.   82.  229.   83.  229.   84.  229.
   85.  229.   86.  229.   87.  229.   82.  230.   83.  230.   84.  230.
   85.  230.   86.  230.   87.  230.   82.  231.   83.  231.   84.  231.
   85.  231.   86.  231.   87.  231.   82.  232.   83.  232.   84.  232.
   85.  232.   86.  232.   87.  232.   82.  233.   83.  233.   84.  233.
   85.  233.   86.  233.   87.  233.   88.  233.   83.  234.   84.  234.
   85.  234.   86.  234.   87.  234.   88.  234.   83.  235.   84.  235.
   85.  235.   86.  235.   87.  235.   88.  235.   83.  236.   84.  236.
   85.  236.   86.  236.   87.  236.   88.  236.   83.  237.   84.  237.
   85.  237.   86.  237.   87.  237.   88.  237.   84.  238.   85.  238.
   86.  238.   87.  238.   84.  239.   85.  239.   86.  239.   87.  239.
   84.  240.   85.  240.   86.  240.   87.  240.   84.  241.   85.  241.
   86.  241.   87.  241.   85.  242.   86.  242.   87.  242.   85.  243.
   86.  243.]

this is color feautre vector

[  3.35917592e-01   3.25945705e-01   3.25065553e-01   3.34438205e-01
   2.04288393e-01   1.97153553e-01   1.85440078e-01   0.00000000e+00
   0.00000000e+00   0.00000000e+00   1.32209742e-02   0.00000000e+00
   0.00000000e+00   0.00000000e+00   2.62172282e-04   3.93258437e-04
   1.31086141e-04   9.36329598e-05   0.00000000e+00   0.00000000e+00
   0.00000000e+00   0.00000000e+00   0.00000000e+00   0.00000000e+00
   0.00000000e+00   0.00000000e+00   0.00000000e+00   0.00000000e+00
   0.00000000e+00   0.00000000e+00   0.00000000e+00   0.00000000e+00
   0.00000000e+00   0.00000000e+00   0.00000000e+00   0.00000000e+00
   0.00000000e+00   0.00000000e+00   9.98417616e-01   7.02247198e-04]

and this is histogram feature vector

[   0.    0.    0.    0.    0.    0.    0.  169.    0.    0.    0.    0.
    0.    0.    0.  169.    0.  163.    6.    0.    0.    0.    0.    0.
    0.    0.    0.  169.    0.    0.    0.    0.  169.    0.    0.    0.
    0.    0.    0.    0.    0.    0.    0.    0.    0.   29.   93.   47.
    0.    0.    0.    0.  169.    0.    0.    0.    0.    0.    0.  169.
    0.    0.    0.    0.]

as it can be seen the datatype and dimensions of all three arrays are same, but still getting the error while training with RF or SVC classifier, also when i don't use location feature and train only with color and histogram features, then it doesn't generate the error, and both the training and prediction program works fine. but only when all the three features stacked it geves the error.

the error is throwned when RF classifier is set for training.here _data is a list of feature vectors( ~feat_vec~ ) that are computed previously. and _labels are curresponding lables either 1 or 0, for each data(image) samples respectively.

model = RandomForestClassifier(n_estimators=100, random_state=42)
model.fit(_data, _labels)

complete error trace back:

Traceback (most recent call last):
  File "~/openCV/saliency_detection/svm_train.py", line 59, in <module>
    model.fit(_data, _labels)
  File "/usr/lib/python2.7/site-packages/sklearn/ensemble/forest.py", line 247, in fit
    X = check_array(X, accept_sparse="csc", dtype=DTYPE)
  File "/usr/lib/python2.7/site-packages/sklearn/utils/validation.py", line 382, in check_array
    array = np.array(array, dtype=dtype, order=order, copy=copy)
ValueError: setting an array element with a sequence.
5
  • You left out the most important thing. Where's your exception traceback? Furthermore, questions seeking help debugging must include a minimal reproducible example to reproduce the issue. This is not minimal. Clearly, all those lines of commented out code have nothing to do with this bug. It's just noise. Creating a minimal example by elimination is essential when you want to understand and fix a bug. Commented Feb 24, 2018 at 12:55
  • @HåkenLid thanks for the suggestion, post modified. Commented Feb 24, 2018 at 13:13
  • And what is _data and _labels when you get this error? Those variables are not used elsewhere in your question. Commented Feb 24, 2018 at 13:16
  • Make sure you are passing the expected argument types. Unfortunately, libraries such as scikit often have terribly uninformative error messages. scikit-learn.org/stable/modules/generated/… Commented Feb 24, 2018 at 13:19
  • @HåkenLid that's all fine i checked it, it's not a issue with sklearn, it's issue regarding numpy array, see this issue, github.com/numpy/numpy/issues/6584 Commented Feb 24, 2018 at 13:58

1 Answer 1

1

Most likely the error is cause by trying to create an array from lists or arrays of differing length.

Without the dtype the following creates an object dtype array; with a numeric dtype it raises this error.

In [33]: np.array([[1,2,3],[4,5,6],[7,8,9,10]])
Out[33]: 
array([list([1, 2, 3]), list([4, 5, 6]), list([7, 8, 9, 10])],
      dtype=object)
In [34]: np.array([[1,2,3],[4,5,6],[7,8,9,10]], dtype=int)
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-34-677fde45dbde> in <module>()
----> 1 np.array([[1,2,3],[4,5,6],[7,8,9,10]], dtype=int)

ValueError: setting an array element with a sequence.

It can't create a 2d numeric array from 3 lists of differing length.

In [37]: np.array([[1,2,3],[4,5,6],[7,8,9]], dtype=int)
Out[37]: 
array([[1, 2, 3],
       [4, 5, 6],
       [7, 8, 9]])

In the traceback variable names change, but I'm guessing the problem can be traced back to the _data variable you give fit. You don't show the code that creates _data, but only give a vague description:

_data is a list of feature vectors( ~feat_vec~ )

From your prints it looks like color and histogram have about 80 values. but location clearly has many more. That's consistent with your claim that

also when i don't use location feature and train only with color and histogram features, then it doesn't generate the error, and both the training and prediction program works fine. but only when all the three features stacked it geves the error.

The fact that you can hstack them doesn't tell us anything about how they will work in np.array(....).

In [35]: np.hstack([[1,2,3],[4,5,6],[7,8,9,10]])
Out[35]: array([ 1,  2,  3,  4,  5,  6,  7,  8,  9, 10])

Here's a list of previous times that I've answered a question about the same ValueError:

https://stackoverflow.com/search?q=user%3A901925+ValueError%2Bsequence

Sign up to request clarification or add additional context in comments.

7 Comments

Thanks for the answer, but let me just explain how I made _data and _labels, its a list, which holds the features from each image, features computed from every image is appended in _data list later feeded to RF fit for training , however i will post the contents of _data, in order to be more specific!
What does np.array(_data) produce?
That simply convert a list type to numpy array format, _data is a list.
What's the dtype? I should have asked, what's np.array(_data, dtype=float).
I casted all values in _data to float, dtype signifies the type to convert to.
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.