I have a column in dataframe that has categorical data but some of the data is missing i.e. NaN. I want to carry out linear interpolation on this data to fill the missing values but am not sure how to go about it. I can't drop the NaNs to turn the data into a categorical type because I need to fill them. A simple example to demonstrate what am trying to do. <pre class="prettyprint"><code>col1 col2 5 cloudy 3 windy 6 NaN 7 rainy 10 NaN </code></pre> Say I want to convert <code>col2</code> to categorical data but retain the NaNs and fill them using linear interpolation how do I go about it. Lets say after converting the column to categorical data it looks like this <pre class="prettyprint"><code>col2 1 2 NaN 3 NaN </code></pre> Then I can do linear interpolation and get something like this <pre class="prettyprint"><code>col2 1 2 3 3 2 </code></pre> How can I achieve this?

UPDATE: <blockquote> Is there a way to convert the data back to its original form after interpolation ie instead of 1,2 or 3 you have cloudy,windy and rainy again? </blockquote> Solution: I've intentionally added more rows to your original DF: <pre class="prettyprint"><code>In [129]: df Out[129]: col1 col2 0 5 cloudy 1 3 windy 2 6 NaN 3 7 rainy 4 10 NaN 5 5 cloudy 6 10 NaN 7 7 rainy In [130]: df.dtypes Out[130]: col1 int64 col2 category dtype: object In [131]: df.col2 = (df.col2.cat.codes.replace(-1, np.nan) ...: .interpolate().astype(int).astype('category') ...: .cat.rename_categories(df.col2.cat.categories)) ...: In [132]: df Out[132]: col1 col2 0 5 cloudy 1 3 windy 2 6 rainy 3 7 rainy 4 10 cloudy 5 5 cloudy 6 10 cloudy 7 7 rainy </code></pre> OLD "numerical" answer: IIUC you can do this: <pre class="prettyprint"><code>In [66]: df Out[66]: col1 col2 0 5 cloudy 1 3 windy 2 6 NaN 3 7 rainy 4 10 NaN </code></pre> first let's factorize <code>col2</code>: <pre class="prettyprint"><code>In [67]: df.col2 = pd.factorize(df.col2, na_sentinel=-2)[0] + 1 In [68]: df Out[68]: col1 col2 0 5 1 1 3 2 2 6 -1 3 7 3 4 10 -1 </code></pre> now we can interpolate it (replacing <code>-1</code>'s with <code>NaN</code>'s): <pre class="prettyprint"><code>In [69]: df.col2.replace(-1, np.nan).interpolate().astype(int) Out[69]: 0 1 1 2 2 2 3 3 4 3 Name: col2, dtype: int32 </code></pre> the same approach, but converting interpolated series to <code>category</code> dtype: <pre class="prettyprint"><code>In [70]: df.col2.replace(-1, np.nan).interpolate().astype(int).astype('category') Out[70]: 0 1 1 2 2 2 3 3 4 3 Name: col2, dtype: category Categories (3, int64): [1, 2, 3] </code></pre>

Pandas - Handling NaNs in categorical data

Tags:

python

python-3.x

pandas

dataframe

categorical-data

I have a column in dataframe that has categorical data but some of the data is missing i.e. NaN. I want to carry out linear interpolation on this data to fill the missing values but am not sure how to go about it. I can't drop the NaNs to turn the data into a categorical type because I need to fill them. A simple example to demonstrate what am trying to do.

col1  col2
5     cloudy
3     windy
6     NaN
7     rainy
10    NaN

Say I want to convert col2 to categorical data but retain the NaNs and fill them using linear interpolation how do I go about it. Lets say after converting the column to categorical data it looks like this

col2
1
2
NaN
3
NaN

Then I can do linear interpolation and get something like this

col2
1
2
3
3
2

How can I achieve this?

566

asked Jan 26 '17 20:01

Wasswa Samuel

2 Answers

UPDATE:

Is there a way to convert the data back to its original form after interpolation ie instead of 1,2 or 3 you have cloudy,windy and rainy again?

Solution: I've intentionally added more rows to your original DF:

In [129]: df
Out[129]:
   col1    col2
0     5  cloudy
1     3   windy
2     6     NaN
3     7   rainy
4    10     NaN
5     5  cloudy
6    10     NaN
7     7   rainy

In [130]: df.dtypes
Out[130]:
col1       int64
col2    category
dtype: object

In [131]: df.col2 = (df.col2.cat.codes.replace(-1, np.nan)
     ...:              .interpolate().astype(int).astype('category')
     ...:              .cat.rename_categories(df.col2.cat.categories))
     ...:

In [132]: df
Out[132]:
   col1    col2
0     5  cloudy
1     3   windy
2     6   rainy
3     7   rainy
4    10  cloudy
5     5  cloudy
6    10  cloudy
7     7   rainy

OLD "numerical" answer:

IIUC you can do this:

In [66]: df
Out[66]:
   col1    col2
0     5  cloudy
1     3   windy
2     6     NaN
3     7   rainy
4    10     NaN

first let's factorize col2:

In [67]: df.col2 = pd.factorize(df.col2, na_sentinel=-2)[0] + 1

In [68]: df
Out[68]:
   col1  col2
0     5     1
1     3     2
2     6    -1
3     7     3
4    10    -1

now we can interpolate it (replacing -1's with NaN's):

In [69]: df.col2.replace(-1, np.nan).interpolate().astype(int)
Out[69]:
0    1
1    2
2    2
3    3
4    3
Name: col2, dtype: int32

the same approach, but converting interpolated series to category dtype:

In [70]: df.col2.replace(-1, np.nan).interpolate().astype(int).astype('category')
Out[70]:
0    1
1    2
2    2
3    3
4    3
Name: col2, dtype: category
Categories (3, int64): [1, 2, 3]

186

answered Oct 17 '22 04:10

MaxU - stop WAR against UA

I know your asking for linear interpolation but this is just another way if you want to do this easier.As converting categories to Numbers isn't such a good idea I suggest this one.

you can simply use the interpolation method in pandas library with method 'pad' like:

df.interpolate(method='pad')

you can also see other methods and example of using them in here. (link is the pandas documentation of interpolation)

answered Oct 17 '22 04:10

Fatemeh Rahimi

Related questions
                            
                                Django: Query self referencing objects with no child elements
                            
                                python lowest cost of checking various equalities at once
                            
                                Preventing a Python For-loop from iterating over a single string by char
                            
                                Pyramid with SQLAlchemy: scoped or non-scoped database session
                            
                                sort numpy array elements by the value of a condition on the elements
                            
                                Sympy computing the inverse laplace transform
                            
                                Pythonic way to return a boolean value and a message [duplicate]
                            
                                python find repeated substring in string [closed]
                            
                                Convert PySpark dataframe column type to string and replace the square brackets
                            
                                [matplotlib]: understanding "set_ydata" method
                            
                                Can I use np.resize to pad an array with np.nan
                            
                                TypeError: <Response 36 bytes [200 OK]> is not JSON serializable
                            
                                Converting unicode string to hexadecimal representation
                            
                                python hug api return custom http code
                            
                                Python Win 3.6.0 x64 issue, missing qt designer exe after pip3 install pyqt5
                            
                                How to rewrite this Flask view function to follow the post/redirect/get pattern?
                            
                                How can I move the text label of a radiobutton below the button in Python Tkinter?
                            
                                Sklearn.KMeans : how to avoid Memory or Value Error?
                            
                                Python - calculate the co-occurrence matrix
                            
                                Scrapy - get the value of Javascript variable

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With