Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Multiple columns with the same name in Pandas

I am creating a dataframe from a CSV file. I have gone through the docs, multiple SO posts, links as I have just started Pandas but didn't get it. The CSV file has multiple columns with same names say a.

So after forming dataframe and when I do df['a'] which value will it return? It does not return all values.

Also only one of the values will have a string rest will be None. How can I get that column?

like image 267
vks Avatar asked Oct 11 '16 21:10

vks


People also ask

Can you have two columns with same name in pandas?

Pandas, however, can be tricked into allowing duplicate column names. Duplicate column names are a problem if you plan to transfer your data set to another statistical language. They're also a problem because it will cause unanticipated and sometimes difficult to debug problems in Python.

How do you drop a column with the same name in pandas?

To drop duplicate columns from pandas DataFrame use df. T. drop_duplicates(). T , this removes all columns that have the same data regardless of column names.

Can a pandas series have multiple columns?

In Pandas, we have the freedom to add columns in the data frame whenever needed. There are multiple ways to add columns to pandas dataframe.


1 Answers

the relevant parameter is mangle_dupe_cols

from the docs

mangle_dupe_cols : boolean, default True
    Duplicate columns will be specified as 'X.0'...'X.N', rather than 'X'...'X'

by default, all of your 'a' columns get named 'a.0'...'a.N' as specified above.

if you used mangle_dupe_cols=False, importing this csv would produce an error.

you can get all of your columns with

df.filter(like='a')

demonstration

from StringIO import StringIO
import pandas as pd

txt = """a, a, a, b, c, d
1, 2, 3, 4, 5, 6
7, 8, 9, 10, 11, 12"""

df = pd.read_csv(StringIO(txt), skipinitialspace=True)
df

enter image description here

df.filter(like='a')

enter image description here

like image 53
piRSquared Avatar answered Oct 22 '22 04:10

piRSquared