I have a dfAB <pre class="prettyprint"><code>import pandas as pd import random A = [ random.randint(0,100) for i in range(10) ] B = [ random.randint(0,100) for i in range(10) ] dfAB = pd.DataFrame({ 'A': A, 'B': B }) dfAB </code></pre> We can take the quantile function, because I want to know the 75th percentile of the columns: <pre class="prettyprint"><code>dfAB.quantile(0.75) </code></pre> But say now I put some NaNs in the dfAB and re-do the function, obviously its differnt: <pre class="prettyprint"><code>dfAB.loc[5:8]=np.nan dfAB.quantile(0.75) </code></pre> Basically, when I calculated the mean of the dfAB, I passed skipna to ignore Na's as I didn't want them affecting my stats (I have quite a few in my code, on purpose, and obv making them zero doesn't help) <pre class="prettyprint"><code>dfAB.mean(skipna=True) </code></pre> Thus, what im getting at is whether/how the quantile function addresses NaN's?

Yes, this appears to be the way that <code>pd.quantile</code> deals with <code>NaN</code> values. To illustrate, you can compare the results to <code>np.nanpercentile</code>, which explicitely Computes the qth percentile of the data along the specified axis, while ignoring nan values (quoted from the docs, my emphasis): <pre class="prettyprint"><code>>>> dfAB A B 0 5.0 10.0 1 43.0 67.0 2 86.0 2.0 3 61.0 83.0 4 2.0 27.0 5 NaN NaN 6 NaN NaN 7 NaN NaN 8 NaN NaN 9 27.0 70.0 >>> dfAB.quantile(0.75) A 56.50 B 69.25 Name: 0.75, dtype: float64 >>> np.nanpercentile(dfAB, 75, axis=0) array([56.5 , 69.25]) </code></pre> And see that they are equivalent

Does the quantile() function in Pandas ignore NaN?

Tags:

python

pandas

quantile

I have a dfAB

import pandas as pd
import random

A = [ random.randint(0,100) for i in range(10) ]
B = [ random.randint(0,100) for i in range(10) ]

dfAB = pd.DataFrame({ 'A': A, 'B': B })
dfAB

We can take the quantile function, because I want to know the 75th percentile of the columns:

dfAB.quantile(0.75)

But say now I put some NaNs in the dfAB and re-do the function, obviously its differnt:

dfAB.loc[5:8]=np.nan
dfAB.quantile(0.75)

Basically, when I calculated the mean of the dfAB, I passed skipna to ignore Na's as I didn't want them affecting my stats (I have quite a few in my code, on purpose, and obv making them zero doesn't help)

dfAB.mean(skipna=True)

Thus, what im getting at is whether/how the quantile function addresses NaN's?

606

asked Sep 04 '18 17:09

Junaid Mohammad

1 Answers

Yes, this appears to be the way that pd.quantile deals with NaN values. To illustrate, you can compare the results to np.nanpercentile, which explicitely Computes the qth percentile of the data along the specified axis, while ignoring nan values (quoted from the docs, my emphasis):

>>> dfAB
      A     B
0   5.0  10.0
1  43.0  67.0
2  86.0   2.0
3  61.0  83.0
4   2.0  27.0
5   NaN   NaN
6   NaN   NaN
7   NaN   NaN
8   NaN   NaN
9  27.0  70.0

>>> dfAB.quantile(0.75)
A    56.50
B    69.25
Name: 0.75, dtype: float64

>>> np.nanpercentile(dfAB, 75, axis=0)
array([56.5 , 69.25])

And see that they are equivalent

answered Sep 21 '22 02:09

sacuL

Related questions
                            
                                Shift time in multi-index to merge
                            
                                Reflection padding Conv2D
                            
                                How to use ruamel.yaml to dump literal scalars
                            
                                An error occurred in the current transaction. You can't execute queries until the end of the 'atomic' block
                            
                                Trying to change a single value in pandas dataframe
                            
                                How to implement Merge from Keras.layers
                            
                                aws: boto3 get all instances of a load balancers
                            
                                Break very long lines with access to deeply nested dictionaries
                            
                                python mask netcdf data using shapefile
                            
                                Keras try save and load model error You are trying to load a weight file containing 16 layers into a model with 0 layers
                            
                                python - No module named dill while using pickle.load()
                            
                                ImportError: cannot import name normalize_data_format
                            
                                Pandas merge with duplicated key - removing duplicated rows or preventing it's creation
                            
                                Merge columns in reportlab table
                            
                                Convert pandas dataframe to tuple of tuples
                            
                                __subclasses__ not showing anything
                            
                                Automatic dictionary key resolution with nested schemas using Marshmallow
                            
                                Sending Telegram messages with Telethon: some entity parameters work, others don't?
                            
                                A command without name, in Click
                            
                                Upgrade to Airflow 1.10 - _mysql_exceptions.OperationalError: (1054, "Unknown column 'task_instance.executor_config' in 'field list'")

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With