I want to get a 2d-numpy array from a column of a pandas dataframe df
having a numpy vector in each row. But if I do
df.values.shape
I get: (3,)
instead of getting: (3,5)
(assuming that each numpy vector in the dataframe has 5 dimensions, and that the dataframe has 3 rows)
what is the correct method?
If you have an array of shape (2,4) then reshaping it with (-1, 1), then the array will get reshaped in such a way that the resulting array has only 1 column and this is only possible by having 8 rows, hence, (8,1).
melt() function is used to reshape a DataFrame from a wide to a long format. It is useful to get a DataFrame where one or more columns are identifier variables, and the other columns are unpivoted to the row axis leaving only two non-identifier columns named variable and value by default.
Ideally, avoid getting into this situation by finding a different way to define the DataFrame in the first place. However, if your DataFrame looks like this:
s = pd.Series([np.random.randint(20, size=(5,)) for i in range(3)])
df = pd.DataFrame(s, columns=['foo'])
# foo
# 0 [4, 14, 9, 16, 5]
# 1 [16, 16, 5, 4, 19]
# 2 [7, 10, 15, 13, 2]
then you could convert it to a DataFrame of shape (3,5) by calling pd.DataFrame
on a list of arrays:
pd.DataFrame(df['foo'].tolist())
# 0 1 2 3 4
# 0 4 14 9 16 5
# 1 16 16 5 4 19
# 2 7 10 15 13 2
pd.DataFrame(df['foo'].tolist()).values.shape
# (3, 5)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With