Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Pandas Dataframe add header without replacing current header

Tags:

python

pandas

How can I add a header to a DF without replacing the current one? In other words I just want to shift the current header down and just add it to the dataframe as another record.

*secondary question: How do I add tables (example dataframe) to stackoverflow question?

I have this (Note header and how it is just added as a row:

   0.213231  0.314544
0 -0.952928 -0.624646
1 -1.020950 -0.883333

I need this (all other records are shifted down and a new record is added) (also: I couldn't read the csv properly because I'm using s3_text_adapter for the import and I couldn't figure out how to have an argument that ignores header similar to pandas read_csv):

       A          B
0  0.213231  0.314544
1 -1.020950 -0.883333
like image 697
horatio1701d Avatar asked Oct 23 '13 00:10

horatio1701d


People also ask

How do I add a header to pandas DataFrame?

You can add header to pandas dataframe using the df. colums = ['Column_Name1', 'column_Name_2'] method. What is this? You can use the below code snippet to set column headers to the dataframe.

How do I add a header to an existing CSV file in Python?

Use pandas. DataFrame. read_csv(file, header=None) . Then, call pandas. DataFrame. to_csv(file, header=str_list, index=False) with a string list of column labels as str_list to write a header to the CSV file.

How do you add a header in Python?

In this article, we are going to add a header to a CSV file in Python. Method #1: Using header argument in to_csv() method. Initially, create a header in the form of a list, and then add that header to the CSV file using to_csv() method.


1 Answers

Another option is to add it as an additional level of the column index, to make it a MultiIndex:

In [11]: df = pd.DataFrame(randn(2, 2), columns=['A', 'B'])

In [12]: df
Out[12]: 
          A         B
0 -0.952928 -0.624646
1 -1.020950 -0.883333

In [13]: df.columns = pd.MultiIndex.from_tuples(zip(['AA', 'BB'], df.columns))

In [14]: df
Out[14]: 
         AA        BB
          A         B
0 -0.952928 -0.624646
1 -1.020950 -0.883333

This has the benefit of keeping the correct dtypes for the DataFrame, so you can still do fast and correct calculations on your DataFrame, and allows you to access by both the old and new column names.

.

For completeness, here's DSM's (deleted answer), making the columns a row, which, as mentioned already, is usually not a good idea:

In [21]: df_bad_idea = df.T.reset_index().T

In [22]: df_bad_idea
Out[22]: 
              0         1
index         A         B
0     -0.952928 -0.624646
1      -1.02095 -0.883333

Note, the dtype may change (if these are column names rather than proper values) as in this case... so be careful if you actually plan to do any work on this as it will likely be slower and may even fail:

In [23]: df.sum()
Out[23]: 
A   -1.973878
B   -1.507979
dtype: float64

In [24]: df_bad_idea.sum()  # doh!
Out[24]: Series([], dtype: float64)

If the column names are actually a row that was mistaken as a header row then you should correct this on reading in the data (e.g. read_csv use header=None).

like image 67
Andy Hayden Avatar answered Nov 15 '22 20:11

Andy Hayden