I've currently got a pandas dataframe from reading in CSV's where each column looks like the following column.
>>> train["question1"]
209174 [198, 87, 42, 1568, 193, 7461, 3143, 189]
166856 [198, 110, 1146, 87, 82, 1466, 7, 8, 123, 189]
335224 [198, 89, 42, 3393, 5, 193, 1109, 13, 42, 304,...
244308 [15, 71360, 1439, 7, 8012, 189]
234779 [39, 15, 8, 440, 2227, 2, 179904, 29563, 47, 9...
213555 [103, 33, 393, 2707, 291, 189]
288254 [198, 87, 42, 2369, 8, 1033, 26, 8, 1410, 189]
172107 [103, 15, 1, 2334, 119, 8, 201535, 6, 8, 46012...
259159 [198, 110, 70, 4162, 1, 14109, 65, 1, 180, 6, ...
376926 [103, 33, 1, 5395, 7646, 7, 1080, 4, 665, 4078...
376802 [103, 33, 393, 2707, 1146, 189]
274396 [103, 15, 1, 255, 10820, 125, 83279, 4624, 189]
137372 [198, 87, 42, 311, 8, 127172, 232, 1531, 1293,...
377806 [103, 33, 78, 1421, 5, 1009, 8, 2373, 224, 6, ...
293271 [309002, 46, 198, 89, 82, 659, 8, 996, 14, 309...
102517 [103, 33, 78, 4104, 4, 1122, 6609, 112, 2155, ...
123516 [103, 15, 1, 2801, 4, 8, 1122, 1792, 717, 189]
337879 [103, 1229, 15, 22208, 188, 189]
112974 [198, 87, 42, 15775, 8, 13837, 2712, 189]
159254 [15, 64, 30, 14673, 11, 17679, 13, 887, 10, 82...
366796 [33, 10058, 12715, 6, 10058, 5599, 1, 216, 874...
395723 [739, 261, 43580, 489, 37, 501, 131, 57, 189]
237095 [198, 6737, 15, 1, 642, 6805, 48605, 189]
337426 [103, 15, 1, 255, 242, 7, 526, 11, 103466, 189]
233527 [103, 120, 1927, 1053, 1703, 62, 19, 17, 29, 1...
155205 [198, 89, 42, 3134, 6385, 6, 4670, 729, 14, 8,...
289580 [190, 1, 298, 79, 496, 30, 240, 7265, 5, 45, 7...
222376 [198, 110, 544, 3483, 500, 7, 1, 96, 237, 63, ...
236585 [103, 1183, 36, 181, 5, 14944, 1, 14490, 189]
234172 [198, 120, 1, 29, 98, 3279, 98, 3279, 98, 1223...
If I go ahead and get the values of it and then gets it shape, its in the form of
>>> train["question1"].values.shape
(283001,)
What I would like to have is to decompose each column into an ndarray such that it would actually have a shape of [283001, 144]
If you lists are all the same length
np.array(train["question1"].values.tolist())
If they are not, use pd.DataFrame
to adjust for you
pd.DataFrame(train["question1"].values.tolist()).values
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With