I have a program that outputs arrays.
For example:
[[0, 1, 0], [0, 0, 0], [1, 3, 3], [2, 4, 4]]
I would like to turn these arrays into a dataframe using pandas. However, when I do the values become row values like this:
As you can see each array within the overall array becomes its own row. I would like each array within the overall array to become its own column with a column name.
Furthermore, in my use case, the number of arrays within the array is variable. There could be 4 arrays or 70 which means there could be 4 columns or 70. This is problematic when it comes to column names and I was wondering if there was anyway to auto increment column names in python.
Check out my attempt below and let me know how I can solve this.
My desired outcome is simply to make each array within the overall array into its own column instead of row and to have titles for the column that increment with each additional array/column.
Thank you so much.
Need help. Please respond!
frame = [[0, 1, 0], [0, 0, 0], [1, 3, 3], [2, 4, 4]]
numpy_data= np.array(frame)
df = pd.DataFrame(data=numpy_data, columns=["column1", "column2", "column3"])
print(frame)
print(df)
Since a DataFrame is similar to a 2D Numpy array, we can create one from a Numpy ndarray . You should remember that the input Numpy array must be 2D, otherwise you will get a ValueError. If you pass a raw Numpy ndarray , the index and column names start at 0 by default.
Numpy arrays Since a dataframe can be considered as a two-dimensional data structure, we can use a two-dimensional numpy array to create a dataframe. A is a two-dimensional array with 4 rows and 3 columns. We can pass it to the DataFrame function. Pandas assigns integer index for columns by default.
You can convert NumPy array to pandas dataframe using the dataframe constructor pd. DataFrame(array) . Use the below snippet to create a pandas dataframe from the NumPy array. When you print the dataframe using df , you'll see the array is converted as a dataframe.
A possible solution could be transposing
and renaming the columns after transforming the numpy
array into a dataframe
. Here is the code:
import numpy as np
import pandas as pd
frame = [[0, 1, 0], [0, 0, 0], [1, 3, 3], [2, 4, 4]]
numpy_data= np.array(frame)
#transposing later
df = pd.DataFrame(data=numpy_data).T
#creating a list of columns using list comprehension without specifying number of columns
df.columns = [f'mycol{i}' for i in range(0,len(df.T))]
print(df)
Output:
mycol0 mycol1 mycol2 mycol3
0 0 0 1 2
1 1 0 3 4
2 0 0 3 4
Same code for 11 columns:
import numpy as np
import pandas as pd
frame = [[0, 1, 0], [0, 0, 0], [1, 3, 3], [2, 4, 4], [5, 2, 2], [6,7,8], [8,9,19] , [10,2,4], [2,6,5], [10,2,5], [11,2,9]]
numpy_data= np.array(frame)
df = pd.DataFrame(data=numpy_data).T
df.columns = [f'mycol{i}' for i in range(0,len(df.T))]
print(df)
mycol0 mycol1 mycol2 mycol3 mycol4 mycol5 mycol6 mycol7 mycol8 mycol9 mycol10
0 0 0 1 2 5 6 8 10 2 10 11
1 1 0 3 4 2 7 9 2 6 2 2
2 0 0 3 4 2 8 19 4 5 5 9
You can transpose
the array and add_prefix
frame = [[0, 1, 0], [0, 0, 0], [1, 3, 3], [2, 4, 4]]
pd.DataFrame(np.array(frame).T).add_prefix('column')
Out:
column0 column1 column2 column3
0 0 0 1 2
1 1 0 3 4
2 0 0 3 4
Works with every number of arrays
frame = [[0, 1, 0], [0, 0, 0], [1, 3, 3], [2, 4, 4], [1,0,1], [2,0,3]]
pd.DataFrame(np.array(frame).T).add_prefix('column')
Out:
column0 column1 column2 column3 column4 column5
0 0 0 1 2 1 2
1 1 0 3 4 0 0
2 0 0 3 4 1 3
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With