I am trying to fill missing values (NAN) using the below code <pre class="prettyprint"><code>NAN_SUBSTITUTION_VALUE = 1 g = g.fillna(NAN_SUBSTITUTION_VALUE) </code></pre> but I am getting the following error <pre class="prettyprint"><code>ValueError: fill value must be in categories. </code></pre> Would anybody please throw some light on this error.

Your question is missing the important point what <code>g</code> is, especially that it has dtype <code>categorical</code>. I assume it is something like this: <pre class="prettyprint"><code>g = pd.Series(["A", "B", "C", np.nan], dtype="category") </code></pre> The problem you are experiencing is that <code>fillna</code> requires a value that already exists as a category. For instance, <code>g.fillna("A")</code> would work, but <code>g.fillna("D")</code> fails. To fill the series with a new value you can do: <pre class="prettyprint"><code>g_without_nan = g.cat.add_categories("D").fillna("D") </code></pre>

Add the category before you fill: <pre class="prettyprint"><code>g = g.cat.add_categories([1]) g.fillna(1) </code></pre>

Once you create Categorical Data, you can insert only values in category. <pre class="prettyprint"><code>>>> df ID value 0 0 20 1 1 43 2 2 45 >>> df["cat"] = df["value"].astype("category") >>> df ID value cat 0 0 20 20 1 1 43 43 2 2 45 45 >>> df.loc[1, "cat"] = np.nan >>> df ID value cat 0 0 20 20 1 1 43 NaN 2 2 45 45 >>> df.fillna(1) ValueError: fill value must be in categories >>> df.fillna(43) ID value cat 0 0 20 20 1 1 43 43 2 2 45 45 </code></pre>

Pandas - filling NaNs in Categorical data

Tags:

python

pandas

I am trying to fill missing values (NAN) using the below code

NAN_SUBSTITUTION_VALUE = 1
g = g.fillna(NAN_SUBSTITUTION_VALUE)

but I am getting the following error

ValueError: fill value must be in categories.

Would anybody please throw some light on this error.

339

asked Sep 22 '15 13:09

deega

5 Answers

Your question is missing the important point what g is, especially that it has dtype categorical. I assume it is something like this:

g = pd.Series(["A", "B", "C", np.nan], dtype="category")

The problem you are experiencing is that fillna requires a value that already exists as a category. For instance, g.fillna("A") would work, but g.fillna("D") fails. To fill the series with a new value you can do:

g_without_nan = g.cat.add_categories("D").fillna("D")

answered Oct 19 '22 23:10

bluenote10

Add the category before you fill:

g = g.cat.add_categories([1])
g.fillna(1)

answered Oct 19 '22 22:10

G. Cheng

Once you create Categorical Data, you can insert only values in category.

>>> df
    ID  value
0    0     20
1    1     43
2    2     45

>>> df["cat"] = df["value"].astype("category")
>>> df
    ID  value    cat
0    0     20     20
1    1     43     43
2    2     45     45

>>> df.loc[1, "cat"] = np.nan
>>> df
    ID  value    cat
0    0     20     20
1    1     43    NaN
2    2     45     45

>>> df.fillna(1)
ValueError: fill value must be in categories
>>> df.fillna(43)
    ID  value    cat
0    0     20     20
1    1     43     43
2    2     45     45

answered Oct 19 '22 22:10

pacholik

As many have said before, this error comes from the fact that that feature's type is 'category'.
I suggest converting it to string first, use fillna and finally convert it back to category if needed.

g = g.astype('string')
g = g.fillna(NAN_SUBSTITUTION_VALUE)
g = g.astype('category')

answered Oct 19 '22 22:10

Yves

Sometimes you may want to replace the NaN with values present in your dataset, you can use that then:

#creates a random permuation of the categorical values
permutation = np.random.permutation(df[field])

#erase the empty values
empty_is = np.where(permutation == "")
permutation = np.delete(permutation, empty_is)

#replace all empty values of the dataframe[field]
end = len(permutation)
df[field] = df[field].apply(lambda x: permutation[np.random.randint(end)] if pd.isnull(x) else x)

It works quite efficiently.

answered Oct 19 '22 23:10

Victor Zuanazzi

Related questions
                            
                                How can I use the python HTMLParser library to extract data from a specific div tag?
                            
                                Fastest way to count number of occurrences in a Python list
                            
                                Python difference between randn and normal
                            
                                Disabling Python 3.2 ResourceWarning
                            
                                Installing numpy on Docker Alpine
                            
                                How to convert column with list of values into rows in Pandas DataFrame
                            
                                Non-recursive os.walk()
                            
                                Save list of ordered tuples as CSV [duplicate]
                            
                                Writing Python lists to columns in csv
                            
                                Merge two DataFrames based on multiple keys in pandas
                            
                                How to get a uniform distribution in a range [r1,r2] in PyTorch?
                            
                                Should I add a trailing comma after the last argument in a function call?
                            
                                `del` on a package has some kind of memory
                            
                                How to convert MP3 to WAV in Python
                            
                                Split pandas dataframe in two if it has more than 10 rows
                            
                                How to check to see if a folder contains files using python 3
                            
                                Colorplot of 2D array matplotlib
                            
                                How to make a Tkinter window not resizable?
                            
                                Received a label value of 1 which is outside the valid range of [0, 1) - Python, Keras
                            
                                Convert RGB to black OR white

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Pandas - filling NaNs in Categorical data

Tags:

python

pandas

deega

People also ask

5 Answers

bluenote10

G. Cheng

pacholik

Yves

Victor Zuanazzi

Recent Activity

Donate For Us