Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Transpose the data in a column every nth rows in PANDAS

For a research project, I need to process every individual's information from the website into an excel file. I have copied and pasted everything I need from the website onto a single column in an excel file, and I loaded that file using PANDAS. However, I need to present each individual's information horizontally instead of vertically like it is now. For example, this is what I have right now. I only have one column of unorganized data.

df= pd.read_csv("ior work.csv", encoding = "ISO-8859-1")

Data:

0 Andrew
1 School of Music
2 Music: Sound of the wind
3 Dr. Seuss
4 Dr.Sass
5 Michelle
6 School of Theatrics
7 Music: Voice
8 Dr. A
9 Dr. B

I want transpose every 5 lines to organize the data into this organizational format; the labels below are labels of the columns.

Name School Music Mentor1 Mentor2

What is the most efficient way to do this?

like image 862
Molly Zhao Avatar asked Sep 29 '16 04:09

Molly Zhao


People also ask

How do I transpose columns to rows in pandas?

Pandas DataFrame: transpose() functionThe transpose() function is used to transpose index and columns. Reflect the DataFrame over its main diagonal by writing rows as columns and vice-versa. If True, the underlying data is copied. Otherwise (default), no copy is made if possible.

How do you get the nth row in pandas?

To get the nth row in a Pandas DataFrame, we can use the iloc() method. For example, df. iloc[4] will return the 5th row because row numbers start from 0.

What does .values in pandas do?

Definition and Usage The values property returns all values in the DataFrame. The return value is a 2-dimensional array with one array for each row.


1 Answers

If no data are missing, you can use numpy.reshape:

print (np.reshape(df.values,(2,5)))
[['Andrew' 'School of Music' 'Music: Sound of the wind' 'Dr. Seuss'
  'Dr.Sass']
 ['Michelle' 'School of Theatrics' 'Music: Voice' 'Dr. A' 'Dr. B']]

print (pd.DataFrame(np.reshape(df.values,(2,5)), 
                    columns=['Name','School','Music','Mentor1','Mentor2']))
       Name               School                     Music    Mentor1  Mentor2
0    Andrew      School of Music  Music: Sound of the wind  Dr. Seuss  Dr.Sass
1  Michelle  School of Theatrics              Music: Voice      Dr. A    Dr. B

More general solution with generating length of new array by shape divide by number of columns:

print (pd.DataFrame(np.reshape(df.values,(df.shape[0] / 5,5)), 
                    columns=['Name','School','Music','Mentor1','Mentor2']))
       Name               School                     Music    Mentor1  Mentor2
0    Andrew      School of Music  Music: Sound of the wind  Dr. Seuss  Dr.Sass
1  Michelle  School of Theatrics              Music: Voice      Dr. A    Dr. B

Thank you piRSquared for another solution:

print (pd.DataFrame(df.values.reshape(-1, 5), 
                    columns=['Name','School','Music','Mentor1','Mentor2']))
like image 150
jezrael Avatar answered Oct 14 '22 05:10

jezrael