Speeding up Pandas apply function

Tags:

For a relatively big Pandas DataFrame (a few 100k rows), I'd like to create a series that is a result of an apply function. The problem is that the function is not very fast and I was hoping that it can be sped up somehow.

df = pd.DataFrame({
 'value-1': [1, 2, 3, 4, 5],
 'value-2': [0.1, 0.2, 0.3, 0.4, 0.5],
 'value-3': somenumbers...,
 'value-4': more numbers...,
 'choice-index': [1, 1, np.nan, 2, 1]
})

def func(row):
  i = row['choice-index']
  return np.nan if math.isnan(i) else row['value-%d' % i]

df['value'] = df.apply(func, axis=1, reduce=True)

# expected value = [1, 2, np.nan, 0.4, 5]

Any suggestions are welcome.

Update

A very small speedup (~1.1) can be achieved by pre-caching the selected columns. func would change to:

cached_columns = [None, 'value-1', 'value-2', 'value-3', 'value-4']
def func(row):
  i = row['choice-index']
  return np.nan if math.isnan(i) else row[cached_columns[i]]

But I was hoping for greater speedups...

692

asked Jul 12 '15 02:07

orange

1 Answers

I think I got a good solution (speedup ~150).

The trick is not to use apply, but to do smart selections.

choice_indices = [1, 2, 3, 4]
for idx in choice_indices:
  mask = df['choice-index'] == idx
  result_column = 'value-%d' % (idx)
  df.loc[mask, 'value'] = df.loc[mask, result_column]

answered Oct 13 '22 23:10

orange

Related questions
                            
                                Print list elements on new line
                            
                                Best practice for using common subexpression elimination with lambdify in SymPy
                            
                                Memory-efficient Benjamini-Hochberg FDR correction using numpy/h5py
                            
                                Easy way to collapse trailing dimensions of numpy array?
                            
                                Regex: Match IP address except when preceded by certain characters?
                            
                                add value to each element in array python
                            
                                How to access hidden file upload field with Selenium WebDriver python
                            
                                How to test if a view is decorated with "login_required" (Django)
                            
                                Incrementing integer variable of global scope in Python [duplicate]
                            
                                writing and saving CSV file from scraping data using python and Beautifulsoup4
                            
                                ImportError: PyCapsule_Import could not import module "pyexpat"
                            
                                Load environment variables from a shell script
                            
                                Write to stdin of a running process in windows
                            
                                Is numpy.linalg.inv() giving the correct matrix inverse? EDIT: Why does inv() gives numerical errors?
                            
                                PyYAML yaml.dump() produces complex key for string key > 122 chars?
                            
                                NumPy or Dictionary?
                            
                                Install pyserial Mac OS 10.10?
                            
                                Geocoding using Geopy and Python
                            
                                DBF - encoding cp1250
                            
                                Why does Python's copy.copy() return a object not equal to the original?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Speeding up Pandas apply function

Tags:

performance

python

pandas

apply

orange

People also ask

1 Answers

orange

Recent Activity

Donate For Us