Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

In Pandas, How to use fillna to fill the whole columns with string if the column is empty originally?

Tags:

python

pandas

My table:

In [15]: csv=u"""a,a,,a
   ....: b,b,,b
   ....: c,c,,c
   ....: """

In [18]: df = pd.read_csv(io.StringIO(csv), header=None)

Fill the empty columns as 'UNKNOWN'

In [19]: df
Out[19]: 
   0  1   2  3
0  a  a NaN  a
1  b  b NaN  b
2  c  c NaN  c

In [20]: df.fillna({2:'UNKNOWN'})

Got the error

ValueError: could not convert string to float: UNKNOWN
like image 913
waitingkuo Avatar asked Apr 17 '13 18:04

waitingkuo


People also ask

How do I fill empty columns in pandas?

Fill Data in an Empty Pandas DataFrame by Appending Rows First, create an empty DataFrame with column names and then append rows one by one. The append() method can also append rows. When creating an empty DataFrame with column names and row indices, we can fill data in rows using the loc() method.

How do I fill NaN based on another column?

Using fillna() to fill values from another column Here, we apply the fillna() function on “Col1” of the dataframe df and pass the series df['Col2'] as an argument. The above code fills the missing values in “Col1” with the corresponding values (based on the index) from “Col2”.


1 Answers

Your 2 column probably has a float dtype:

>>> df
   0  1   2  3
0  a  a NaN  a
1  b  b NaN  b
2  c  c NaN  c
>>> df.dtypes
0     object
1     object
2    float64
3     object
dtype: object

Hence the problem. If you don't mind converting the whole frame to object, you could:

>>> df.astype(object).fillna("UNKNOWN")
   0  1        2  3
0  a  a  UNKNOWN  a
1  b  b  UNKNOWN  b
2  c  c  UNKNOWN  c

Depending on whether there's non-string data you might want to be more selective about converting column dtypes, and/or specify the dtypes on read, but the above should work, anyhow.


Update: if you have dtype information you want to preserve, rather than switching it back, I'd go the other way and only fill on the columns that you wanted to, either using a loop with fillna:

>>> df
   0  1  2   3  4   5
0  0  a  a NaN  a NaN
1  1  b  b NaN  b NaN
2  2  c  c NaN  c NaN
>>> df.dtypes
0      int64
1     object
2     object
3    float64
4     object
5    float64
dtype: object
>>> for col in df.columns[pd.isnull(df).all()]:
...         df[col] = df[col].astype(object).fillna("UNKNOWN")
...     
>>> df
   0  1  2        3  4        5
0  0  a  a  UNKNOWN  a  UNKNOWN
1  1  b  b  UNKNOWN  b  UNKNOWN
2  2  c  c  UNKNOWN  c  UNKNOWN
>>> df.dtypes
0     int64
1    object
2    object
3    object
4    object
5    object
dtype: object

Or (if you're using all), then maybe not even use fillna at all:

>>> df
   0  1  2   3  4   5
0  0  a  a NaN  a NaN
1  1  b  b NaN  b NaN
2  2  c  c NaN  c NaN
>>> df.ix[:,pd.isnull(df).all()] = "UNKNOWN"
>>> df
   0  1  2        3  4        5
0  0  a  a  UNKNOWN  a  UNKNOWN
1  1  b  b  UNKNOWN  b  UNKNOWN
2  2  c  c  UNKNOWN  c  UNKNOWN
like image 155
DSM Avatar answered Sep 28 '22 19:09

DSM