I have seen how to work with a double index, but I have not seen how to work with a two-row column headers. Is this possible?
For example, row 1 is a repetitive series of dates: 2016, 2016, 2015, 2015
Row 2 is a repetitive series of data. Dollar Sales, Unit Sales, Dollar Sales, Unit Sales.
So each "Dollar Sales" heading is actually tied to the date in the row above.
Subsequent rows are individual items with data.
Is there a way to do a groupby
or some way that I can have two column headers? Ultimately, I want to line up the "Dollar Sales" as a series by date so that I can make a nice graph. Unfortunately there are multiple columns before the next "Dollar Sales" value. (More than just the one "Unit Sales" column). Also if I delete the date row above, there is no link between which "Dollar Sales" are tied to each date.
When you print the dataframe using the df.head () method, you can see that the pandas dataframe is having two column headers for each column. If you have the potential headers at any of the header rows, you can replace the header with the nth row.
The read_csv () method accepts the parameter header. You can pass header= [0, 1] to make the first two rows from the CSV file as a header of the dataframe. Using this way, you can create a dataframe with multiple header rows.
DataFrame.loc [] method is used to retrieve rows from Pandas DataFrame. Rows can also be selected by passing integer location to an iloc [] function. As shown in the output image, two series were returned since there was only one parameter both of the times. In Order to add a Row in Pandas DataFrame, we can concat the old dataframe with new one.
Convert Row to Column Header Using DataFrame.rename () You can use DataFrame.rename () to rename the header and use loc [] or iloc [] to remove the first row from the data. Use this approach even if you wanted to convert the middle or any nth row to a column header.
If using pandas.read_csv()
or pandas.read_table()
, you can provide a list of indices for the header
argument, to specify the rows you want to use for column headers. Python will generate the pandas.MultiIndex
for you in df.columns
:
df = pandas.read_csv('DollarUnitSales.csv', header=[0,1])
You can also use more than two rows, or non-consecutive rows, to specify the column headers:
df = pandas.read_table('DataSheet1.csv', header=[0,2,3])
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With