Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

python:pandas - How to combine first two rows of pandas dataframe to dataframe header?

I'm trying to read one excel file which looks like below:

enter image description here

I also have one script which will convert this xlsx file into the csv files with its sheet names (If three sheets available then it will create three different csv files).

It's csv file looks like below:

Unnamed: 0,Gender A,Unnamed: 2,Gender B,Unnamed: 4,Gender C,Gender D
date,Male,Female,Male,Female,Male,Female
2017-01-01 00:00:00,2,3,3,2,3,3
2017-01-02 00:00:00,5,7,7,42,3,5
2017-01-03 00:00:00,4,6,6,12,2,7
2017-01-04 00:00:00,6,7,3,6,4,8
2017-01-05 00:00:00,6,8,8,3,5,3
2017-01-06 00:00:00,54,3,3,6,3,5
2017-01-07 00:00:00,3,4,6,3,6,5
2017-01-08 00:00:00,3,6,6,3,6,4
2017-01-09 00:00:00,2,2,8,7,5,2
2017-01-10 00:00:00,4,3,2,4,5,5
2017-01-11 00:00:00,12,10,10,3,1,6
2017-01-12 00:00:00,9,7,7,3,4,1

So, my first question is which is better choice to process these files - xlsx or csv?

Next, I just want to read first two rows as a column header. So that I can understand In which Gender how many males and females are available.

Expected Output:

0                  date   Gender A_Male Gender A_Female   Gender B_Male  Gender B_Female   Gender C_Male  Gender D_Female 
1   2017-01-01 00:00:00        2              3           3                 2               3                  3
2   2017-01-02 00:00:00        5              7           7                42               3                  5
3   2017-01-03 00:00:00        4              6           6                12               2                  7
4   2017-01-04 00:00:00        6              7           3                 6               4                  8
5   2017-01-05 00:00:00        6              8           8                 3               5                  3
6   2017-01-06 00:00:00       54              3           3                 6               3                  5
7   2017-01-07 00:00:00        3              4           6                 3               6                  5
8   2017-01-08 00:00:00        3              6           6                 3               6                  4
9   2017-01-09 00:00:00        2              2           8                 7               5                  2
10  2017-01-10 00:00:00        4              3           2                 4               5                  5
11  2017-01-11 00:00:00       12             10          10                 3               1                  6
12  2017-01-12 00:00:00        9              7           7                 3               4                  1
like image 445
ketan Avatar asked Sep 22 '17 04:09

ketan


People also ask

How do I merge two rows in a DataFrame in Python?

We can use the concat function in pandas to append either columns or rows from one DataFrame to another. Let's grab two subsets of our data to see how this works. When we concatenate DataFrames, we need to specify the axis. axis=0 tells pandas to stack the second DataFrame UNDER the first one.

How do you append a row to the top of a DataFrame in Pandas?

Use concat() to Add a Row at Top of DataFrame Use pd. concat([new_row,df. loc[:]]). reset_index(drop=True) to add the row to the first position of the DataFrame as Index starts from zero.

How do I concatenate rows in a DataFrame?

Use DataFrame.append() method to concatenate DataFrames on rows. For E.x, df. append(df1) appends df1 to the df DataFrame.


2 Answers

Let's try:

df = pd.read_excel('Untitled 2.xlsx', header=[0,1])
df.columns = df.columns.map('_'.join)
df.rename_axis('Date').reset_index()

Output:

         Date  Gender A_Male  Gender A_Female  Gender B_Male  Gender B_Female  \
0  2017-01-01              2                3              3                2   
1  2017-01-02              5                7              7               42   
2  2017-01-03              4                6              6               12   
3  2017-01-04              6                7              3                6   
4  2017-01-05              6                8              8                3   
5  2017-01-06             54                3              3                6   
6  2017-01-07              3                4              6                3   
7  2017-01-08              3                6              6                3   
8  2017-01-09              2                2              8                7   
9  2017-01-10              4                3              2                4   
10 2017-01-11             12               10             10                3   
11 2017-01-12              9                7              7                3   

    Gender C_Male  Gender D_Female  
0               3                3  
1               3                5  
2               2                7  
3               4                8  
4               5                3  
5               3                5  
6               6                5  
7               6                4  
8               5                2  
9               5                5  
10              1                6  
11              4                1  
like image 164
Scott Boston Avatar answered Sep 28 '22 06:09

Scott Boston


I love @ScottBoston's approach. Here are some cosmetic alternatives. If you want your column headers to look pretty, especially when the second row contains the units for the quantity, you can do something like this:

df = pd.read_excel('Untitled 2.xlsx', header=[0,1], index_col=0)
df.columns = df.columns.map(lambda h: '{}\n({})'.format(h[0], h[1]))
df.rename_axis('Date')

And if you want to make sure your column names don't contain spaces (so you can access them as properties of the DataFrame):

df = pd.read_excel('Untitled 2.xlsx', header=[0,1], index_col=0)
df.columns = df.columns.map(lambda h: '  '.join(h).replace(' ', '_'))
df.rename_axis('Date')

Which gives:

                     Gender_A__Male  Gender_A__Female   ...         Gender_C__Male  Gender_D__Female
Date                                                    ...                                         
2017-01-01 00:00:00               2                 3   ...                      3                 3
2017-01-02 00:00:00               5                 7   ...                      3                 5
2017-01-03 00:00:00               4                 6   ...                      2                 7
2017-01-04 00:00:00               6                 7   ...                      4                 8
2017-01-05 00:00:00               6                 8   ...                      5                 3
2017-01-06 00:00:00              54                 3   ...                      3                 5
2017-01-07 00:00:00               3                 4   ...                      6                 5
2017-01-08 00:00:00               3                 6   ...                      6                 4
2017-01-09 00:00:00               2                 2   ...                      5                 2
2017-01-10 00:00:00               4                 3   ...                      5                 5
2017-01-11 00:00:00              12                10   ...                      1                 6
2017-01-12 00:00:00               9                 7   ...                      4                 1
like image 29
hobs Avatar answered Sep 28 '22 04:09

hobs