python pandas dataframe thread safe?

1 Answers

No, pandas is not thread safe. And its not thread safe in surprising ways.

Can I delete from pandas dataframe while another thread is using?

Fuggedaboutit! Nope. And generally no. Not even for GIL-locked python datastructures.

Can I read from a pandas object while someone else is writing to it?
Can I copy a pandas dataframe in my thread, and work on the copy?

Definitely not. There's a long standing open issue: https://github.com/pandas-dev/pandas/issues/2728

Actually I think this is pretty reasonable (i.e. expected) behavior. I wouldn't expect to be able to simultaneouls write and read from, or copy, any datastructure unless either: i) it had been designed for concurrency, or ii) I have an exclusive lock on that object and all the view objects derived from it (.loc, .iloc are views and pandas has may others).

Can I read from a pandas object while no-one else is writing to it?

For almost all data structures in Python, the answer is yes. For pandas, no. And it seems, its not a design goal at present.

Typically, you can perform 'reading' operations on objects if no-one is performing mutating operations. You have to be a little cautious though. Some datastructures, including pandas, perform memoization, to cache expensive operations that are otherwise functionally pure. Its generally easy to implement lockless memoization in Python:

@property
def thing(self):
    if _thing is MISSING:
        self._thing = self._calc_thing()
    return self._thing

... it simple and safe (assuming assignment is safely atomic -- which has not always been the case for every language, but is in CPython, unless you override __setattribute__).

Pandas, series and dataframe indexes are computed lazily, on first use. I hope (but I do not see guarantees in the docs), that they're done in a similar safe way.

For all libraries (including pandas) I would hope that all types of read-only operations (or more specifically, 'functionally pure' operations) would be thread safe if no-one is performing mutating operations. I think this is a 'reasonable' easily-achievable, common, lower-bar for thread safeness.

For pandas, however, you cannot assume this. Even if you can guarantee no-one is performing 'functionally impure' operations on your object (e.g. writing to cells, adding/deleting columns'), pandas is not thread safe.

Here's a recent example: https://github.com/pandas-dev/pandas/issues/25870 (its marked as a duplicate of the .copy-not-threadsafe issue, but it seems it could be a separate issue).

s = pd.Series(...)
f(s)  # Success!

# Thread 1:
   while True: f(s)  

# Thread 2:
   while True: f(s)  # Exception !

... fails for f(s): s.reindex(..., copy=True), which returns it's result a as new object -- you would think it would be functionally pure and thread safe. Unfortunately, it is not.

The result of this is that we could not use pandas in production for our healthcare analytics system - and I now discourage it for internal development since it makes in-memory parallelization of read-only operations unsafe. (!!)

The reindex behavior is weird and surprising. If anyone has ideas about why it fails, please answer here: What's the source of thread-unsafety in this usage of pandas.Series.reindex(, copy=True)?

The maintainers marked this as a duplicate of https://github.com/pandas-dev/pandas/issues/2728 . I'm suspicious, but if .copy is the source, then almost all of pandas is not thread safe in any situation (which is their advice).

113

answered Sep 27 '22 20:09

user48956

Related questions
                            
                                IPython console can't locate "backports.shutil_get_terminal_size" and won't load
                            
                                Pillow: strange behavior using Draw.rectangle
                            
                                Wait for page redirect Selenium WebDriver (Python)
                            
                                Telegram Bot "chat not found"
                            
                                gcloud app deploy : This deployment has too many files
                            
                                Anaconda3 activate.bat is not recognized as an internal or external command
                            
                                ValueError and TypeError in python
                            
                                Why (or why not) Add Anaconda to path?
                            
                                Plot latitude longitude from CSV in Python 3.6
                            
                                How to display all output in Jupyter Notebook within Visual Studio Code?
                            
                                in python, how do i split a number by the decimal point
                            
                                Exception Passing In Python
                            
                                How to match exact "multiple" strings in Python
                            
                                how to concisely create a temporary file that is a copy of another file in python
                            
                                Django models: mutual references between two classes and impossibility to use forward declaration in python
                            
                                converting QdateTime to normal python dateTime?
                            
                                Making a Dictionary List with cx_Oracle
                            
                                How do I capture stderr from Fabric's local command?
                            
                                Celery. Decrease number of processes
                            
                                python tkinter: how to work with pixels?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

python pandas dataframe thread safe?

Tags:

python

pandas

thread-safety

Andrew

People also ask

1 Answers

user48956

Recent Activity

Donate For Us