Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Repeat Pandas dataframe row labels

Is there a way to repeat row labels with Pandas dataframe? My Excel output created with xlsxwriter currently doesn't repeat groupings at higher levels. I appreciate any help.

How my Excel sheet looks like now:

Country   State/Province    Population
US        Califonia         38,802,500
          Texas             26,956,958
          Florida           19,893,297
...
CAN       Alberta            3,645,257
          Manitoba           4,400,057

I would like the output to have repeating country level labels like below

Country      State/Province        Population
US           California            38,802,500
US           Texas                 26,956,958
US           Florida               19,893,297
...
CAN          Alberta                3,645,257
CAN          Manitoba               4,400,057
like image 833
ohss117 Avatar asked Jun 15 '15 18:06

ohss117


People also ask

How do you repeat a row in a data frame?

In R, the easiest way to repeat rows is with the REP() function. This function selects one or more observations from a data frame and creates one or more copies of them. Alternatively, you can use the SLICE() function from the dplyr package to repeat rows.

How do you repeat a series on pandas?

Pandas Series: repeat() function The repeat() function is used to repeat elements of a Series. Returns a new Series where each element of the current Series is repeated consecutively a given number of times. The number of repetitions for each element. This should be a non-negative integer.

Does pandas allow duplicate column names?

Index objects are not required to be unique; you can have duplicate row or column labels.


1 Answers

You can import the excel data and then forward fill the relevant column:

df = pd.read_excel('data.xlsx')
df.Country.ffill(inplace=True)
>>> df
  Country State/Province  Population
0      US      Califonia    38802500
1      US          Texas    26956958
2      US        Florida    19893297
3     CAN        Alberta     3645257
4     CAN       Manitoba     4400057

If needed, you could then set the index to Country and State/Province.

>>> df.set_index(['Country', 'State/Province']) 
                        Population
Country State/Province            
US      Califonia         38802500
        Texas             26956958
        Florida           19893297
CAN     Alberta            3645257
        Manitoba           4400057

The original DataFrame could then be retrieved via df.reset_index().

like image 190
Alexander Avatar answered Nov 05 '22 15:11

Alexander