For a research project, I need to process every individual's information from the website into an excel file. I have copied and pasted everything I need from the website onto a single column in an excel file, and I loaded that file using PANDAS. However, I need to present each individual's information horizontally instead of vertically like it is now. For example, this is what I have right now. I only have one column of unorganized data.
df= pd.read_csv("ior work.csv", encoding = "ISO-8859-1")
Data:
0 Andrew
1 School of Music
2 Music: Sound of the wind
3 Dr. Seuss
4 Dr.Sass
5 Michelle
6 School of Theatrics
7 Music: Voice
8 Dr. A
9 Dr. B
I want transpose every 5 lines to organize the data into this organizational format; the labels below are labels of the columns.
Name School Music Mentor1 Mentor2
What is the most efficient way to do this?
Pandas DataFrame: transpose() functionThe transpose() function is used to transpose index and columns. Reflect the DataFrame over its main diagonal by writing rows as columns and vice-versa. If True, the underlying data is copied. Otherwise (default), no copy is made if possible.
To get the nth row in a Pandas DataFrame, we can use the iloc() method. For example, df. iloc[4] will return the 5th row because row numbers start from 0.
Definition and Usage The values property returns all values in the DataFrame. The return value is a 2-dimensional array with one array for each row.
If no data are missing, you can use numpy.reshape
:
print (np.reshape(df.values,(2,5)))
[['Andrew' 'School of Music' 'Music: Sound of the wind' 'Dr. Seuss'
'Dr.Sass']
['Michelle' 'School of Theatrics' 'Music: Voice' 'Dr. A' 'Dr. B']]
print (pd.DataFrame(np.reshape(df.values,(2,5)),
columns=['Name','School','Music','Mentor1','Mentor2']))
Name School Music Mentor1 Mentor2
0 Andrew School of Music Music: Sound of the wind Dr. Seuss Dr.Sass
1 Michelle School of Theatrics Music: Voice Dr. A Dr. B
More general solution with generating length
of new array
by shape
divide by number of columns:
print (pd.DataFrame(np.reshape(df.values,(df.shape[0] / 5,5)),
columns=['Name','School','Music','Mentor1','Mentor2']))
Name School Music Mentor1 Mentor2
0 Andrew School of Music Music: Sound of the wind Dr. Seuss Dr.Sass
1 Michelle School of Theatrics Music: Voice Dr. A Dr. B
Thank you piRSquared for another solution:
print (pd.DataFrame(df.values.reshape(-1, 5),
columns=['Name','School','Music','Mentor1','Mentor2']))
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With