I'm new to using pandas and am writing a script where I read in a dataframe and then do some computation on some of the columns.
Sometimes I will have the column called "Met":
df = pd.read_csv(File,
sep='\t',
compression='gzip',
header=0,
names=["Chrom", "Site", "coverage", "Met"]
)
Other times I will have:
df = pd.read_csv(File,
sep='\t',
compression='gzip',
header=0,
names=["Chrom", "Site", "coverage", "freqC"]
)
I need to do some computation with the "Met" column so if it isn't present I will need to calculate it using:
df['Met'] = df['freqC'] * df['coverage']
is there a way to check if the "Met" column is present in the dataframe, and if not add it?
Use the NOT IN Operator to Check if Column Exists in Pandas Copy if 'Promoted' not in df. columns: print("Yes, it does not exist.") else: print("No, it does exist.") The code gives the following output. Copy No, it does exist.
You check it like this:
if 'Met' not in df:
df['Met'] = df['freqC'] * df['coverage']
When interested in conditionally adding columns in a method chain, consider using pipe()
with a lambda
:
df.pipe(lambda d: (
d.assign(Met=d['freqC'] * d['coverage'])
if 'Met' not in d else d
))
If you were creating the dataframe from scratch, you could create the missing columns without a loop merely by passing the column names into the pd.DataFrame()
call:
cols = ['column 1','column 2','column 3','column 4','column 5']
df = pd.DataFrame(list_or_dict, index=['a',], columns=cols)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With