Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

When using a pandas dataframe, how do I add column if does not exist?

Tags:

python

pandas

I'm new to using pandas and am writing a script where I read in a dataframe and then do some computation on some of the columns.

Sometimes I will have the column called "Met":

df = pd.read_csv(File, 
  sep='\t', 
  compression='gzip', 
  header=0, 
  names=["Chrom", "Site", "coverage", "Met"]
)

Other times I will have:

df = pd.read_csv(File, 
  sep='\t', 
  compression='gzip', 
  header=0, 
  names=["Chrom", "Site", "coverage", "freqC"]
)

I need to do some computation with the "Met" column so if it isn't present I will need to calculate it using:

df['Met'] = df['freqC'] * df['coverage'] 

is there a way to check if the "Met" column is present in the dataframe, and if not add it?

like image 582
user2165857 Avatar asked Sep 17 '14 17:09

user2165857


People also ask

How do you check if a column does not exist in pandas?

Use the NOT IN Operator to Check if Column Exists in Pandas Copy if 'Promoted' not in df. columns: print("Yes, it does not exist.") else: print("No, it does exist.") The code gives the following output. Copy No, it does exist.


Video Answer


3 Answers

You check it like this:

if 'Met' not in df:
    df['Met'] = df['freqC'] * df['coverage'] 
like image 95
YS-L Avatar answered Oct 09 '22 14:10

YS-L


When interested in conditionally adding columns in a method chain, consider using pipe() with a lambda:

df.pipe(lambda d: (
    d.assign(Met=d['freqC'] * d['coverage'])
    if 'Met' not in d else d
))
like image 7
Jonatan Samoocha Avatar answered Oct 09 '22 14:10

Jonatan Samoocha


If you were creating the dataframe from scratch, you could create the missing columns without a loop merely by passing the column names into the pd.DataFrame() call:

cols = ['column 1','column 2','column 3','column 4','column 5']
df = pd.DataFrame(list_or_dict, index=['a',], columns=cols)
like image 5
autonopy Avatar answered Oct 09 '22 14:10

autonopy