12

I've currently got a pandas dataframe from reading in CSV's where each column looks like the following column.

>>> train["question1"]
209174            [198, 87, 42, 1568, 193, 7461, 3143, 189]
166856       [198, 110, 1146, 87, 82, 1466, 7, 8, 123, 189]
335224    [198, 89, 42, 3393, 5, 193, 1109, 13, 42, 304,...
244308                      [15, 71360, 1439, 7, 8012, 189]
234779    [39, 15, 8, 440, 2227, 2, 179904, 29563, 47, 9...
213555                       [103, 33, 393, 2707, 291, 189]
288254       [198, 87, 42, 2369, 8, 1033, 26, 8, 1410, 189]
172107    [103, 15, 1, 2334, 119, 8, 201535, 6, 8, 46012...
259159    [198, 110, 70, 4162, 1, 14109, 65, 1, 180, 6, ...
376926    [103, 33, 1, 5395, 7646, 7, 1080, 4, 665, 4078...
376802                      [103, 33, 393, 2707, 1146, 189]
274396      [103, 15, 1, 255, 10820, 125, 83279, 4624, 189]
137372    [198, 87, 42, 311, 8, 127172, 232, 1531, 1293,...
377806    [103, 33, 78, 1421, 5, 1009, 8, 2373, 224, 6, ...
293271    [309002, 46, 198, 89, 82, 659, 8, 996, 14, 309...
102517    [103, 33, 78, 4104, 4, 1122, 6609, 112, 2155, ...
123516       [103, 15, 1, 2801, 4, 8, 1122, 1792, 717, 189]
337879                     [103, 1229, 15, 22208, 188, 189]
112974            [198, 87, 42, 15775, 8, 13837, 2712, 189]
159254    [15, 64, 30, 14673, 11, 17679, 13, 887, 10, 82...
366796    [33, 10058, 12715, 6, 10058, 5599, 1, 216, 874...
395723        [739, 261, 43580, 489, 37, 501, 131, 57, 189]
237095            [198, 6737, 15, 1, 642, 6805, 48605, 189]
337426      [103, 15, 1, 255, 242, 7, 526, 11, 103466, 189]
233527    [103, 120, 1927, 1053, 1703, 62, 19, 17, 29, 1...
155205    [198, 89, 42, 3134, 6385, 6, 4670, 729, 14, 8,...
289580    [190, 1, 298, 79, 496, 30, 240, 7265, 5, 45, 7...
222376    [198, 110, 544, 3483, 500, 7, 1, 96, 237, 63, ...
236585        [103, 1183, 36, 181, 5, 14944, 1, 14490, 189]
234172    [198, 120, 1, 29, 98, 3279, 98, 3279, 98, 1223...

If I go ahead and get the values of it and then gets it shape, its in the form of

>>> train["question1"].values.shape
(283001,)

What I would like to have is to decompose each column into an ndarray such that it would actually have a shape of [283001, 144]

2 Answers 2

16

If you lists are all the same length

np.array(train["question1"].values.tolist())

If they are not, use pd.DataFrame to adjust for you

pd.DataFrame(train["question1"].values.tolist()).values
Sign up to request clarification or add additional context in comments.

3 Comments

@TheM00s3 are the lists within the series the same length? I just looked, clearly not. How do you want to pad the missing values? An np.array has to be rectangular.
I totally missed that part and forget to mention that they aren't. I wrote a processing function to make them so and now the above answer works
It's kinda ridiculous and counter-intuitive that Pandas doesn't try to make lists into a proper array when you call to_numpy. On the other hand I hadn't thought about the challenge of lists that aren't the same length. It would be nice with a recursive option.
0

First convert each row from type object to numpy array

train["question1"] = train["question1"].apply(lambda x: np.array(x, dtype=np.float32))

Then convert pandas column to numpy array:

train_array = train["question1"].to_numpy()

Then convert array of arrays to single array

train_array = np.stack(train_array)

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.