I have seen how to work with a double index, but I have not seen how to work with a two-row column headers. Is this possible? For example, row 1 is a repetitive series of dates: 2016, 2016, 2015, 2015 Row 2 is a repetitive series of data. Dollar Sales, Unit Sales, Dollar Sales, Unit Sales. So each "Dollar Sales" heading is actually tied to the date in the row above. Subsequent rows are individual items with data. Is there a way to do a <code>groupby</code> or some way that I can have two column headers? Ultimately, I want to line up the "Dollar Sales" as a series by date so that I can make a nice graph. Unfortunately there are multiple columns before the next "Dollar Sales" value. (More than just the one "Unit Sales" column). Also if I delete the date row above, there is no link between which "Dollar Sales" are tied to each date.

If using <code>pandas.read_csv()</code> or <code>pandas.read_table()</code>, you can provide a list of indices for the <code>header</code> argument, to specify the rows you want to use for column headers. Python will generate the <code>pandas.MultiIndex</code> for you in <code>df.columns</code>: <pre class="prettyprint"><code>df = pandas.read_csv('DollarUnitSales.csv', header=[0,1]) </code></pre> You can also use more than two rows, or non-consecutive rows, to specify the column headers: <pre class="prettyprint"><code>df = pandas.read_table('DataSheet1.csv', header=[0,2,3]) </code></pre>

Python Pandas, two rows as column headers?

Tags:

python-3.x

pandas

I have seen how to work with a double index, but I have not seen how to work with a two-row column headers. Is this possible?

For example, row 1 is a repetitive series of dates: 2016, 2016, 2015, 2015

Row 2 is a repetitive series of data. Dollar Sales, Unit Sales, Dollar Sales, Unit Sales.

So each "Dollar Sales" heading is actually tied to the date in the row above.

Subsequent rows are individual items with data.

Is there a way to do a groupby or some way that I can have two column headers? Ultimately, I want to line up the "Dollar Sales" as a series by date so that I can make a nice graph. Unfortunately there are multiple columns before the next "Dollar Sales" value. (More than just the one "Unit Sales" column). Also if I delete the date row above, there is no link between which "Dollar Sales" are tied to each date.

545

asked Dec 06 '16 21:12

Stephen

1 Answers

If using pandas.read_csv() or pandas.read_table(), you can provide a list of indices for the header argument, to specify the rows you want to use for column headers. Python will generate the pandas.MultiIndex for you in df.columns:

df = pandas.read_csv('DollarUnitSales.csv', header=[0,1])

You can also use more than two rows, or non-consecutive rows, to specify the column headers:

df = pandas.read_table('DataSheet1.csv', header=[0,2,3])

answered Nov 21 '22 13:11

Kevin

Related questions
                            
                                aiohttp+sqlalchemy: Can't reconnect until invalid transaction is rolled back
                            
                                Pycharm - Waiting for REPL response when python console using IPython
                            
                                In sklearn.decomposition.PCA, why are components_ negative?
                            
                                How replace transparent with a color in pillow
                            
                                I get an error in python3 when importing mechanize
                            
                                How to run Keras on multiple cores?
                            
                                Proper use Generator typing
                            
                                How to capture arbitrary paths at one route in FastAPI?
                            
                                Python: iterate over a sublist
                            
                                Why does Django South 1.0 use iteritems()?
                            
                                PySpark python issue: Py4JJavaError: An error occurred while calling o48.showString
                            
                                How do I install pip for python 3.8 on Ubuntu without changing any defaults?
                            
                                Python3's super and comprehensions -> TypeError?
                            
                                How to mock a Django model object (along with its methods)?
                            
                                Best Machine Learning package for Python 3x? [closed]
                            
                                Multi-line logging in Python
                            
                                numpy.ndarray vs pandas.DataFrame
                            
                                Tkinter AttributeError: object has no attribute 'tk'
                            
                                TypeError: list indices must be integers or slices, not list
                            
                                Applications of '~' (tilde) operator in Python

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With