Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

how to combine two columns with an if/else in python pandas?

Tags:

python

pandas

I am very new to Pandas (i.e., less than 2 days). However, I can't seem to figure out the right syntax for combining two columns with an if/else condition.

Actually, I did figure out one way to do it using 'zip'. This is what I want to accomplish, but it seems there might be a more efficient way to do this in pandas.

For completeness sake, I include some pre-processing I do to make things clear:

records_data = pd.read_csv(open('records.csv'))

## pull out a year from column using a regex
source_years = records_data['source'].map(extract_year_from_source) 

## this is what I want to do more efficiently (if its possible)
records_data['year'] = [s if s else y for (s,y) in zip(source_years, records_data['year'])]
like image 397
pocketfullofcheese Avatar asked Nov 28 '12 01:11

pocketfullofcheese


2 Answers

In pandas >= 0.10.0 try

df['year'] = df['year'].where(source_years!=0,df['year'])

and see:

http://pandas.pydata.org/pandas-docs/stable/indexing.html#the-where-method-and-masking

As noted in the comments, this DOES use np.where under the hood - the difference is that pandas aligns the series with the output (so for example you can only do a partial update)

like image 128
Jeff Avatar answered Oct 03 '22 05:10

Jeff


Perhaps try np.where:

import numpy as np
df['year'] = np.where(source_years,source_years,df['year'])
like image 37
unutbu Avatar answered Oct 03 '22 05:10

unutbu