I am very new to Pandas (i.e., less than 2 days). However, I can't seem to figure out the right syntax for combining two columns with an if/else condition.
Actually, I did figure out one way to do it using 'zip'. This is what I want to accomplish, but it seems there might be a more efficient way to do this in pandas.
For completeness sake, I include some pre-processing I do to make things clear:
records_data = pd.read_csv(open('records.csv'))
## pull out a year from column using a regex
source_years = records_data['source'].map(extract_year_from_source)
## this is what I want to do more efficiently (if its possible)
records_data['year'] = [s if s else y for (s,y) in zip(source_years, records_data['year'])]
In pandas >= 0.10.0 try
df['year'] = df['year'].where(source_years!=0,df['year'])
and see:
http://pandas.pydata.org/pandas-docs/stable/indexing.html#the-where-method-and-masking
As noted in the comments, this DOES use np.where under the hood - the difference is that pandas aligns the series with the output (so for example you can only do a partial update)
Perhaps try np.where:
import numpy as np
df['year'] = np.where(source_years,source_years,df['year'])
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With