My table:
In [15]: csv=u"""a,a,,a
....: b,b,,b
....: c,c,,c
....: """
In [18]: df = pd.read_csv(io.StringIO(csv), header=None)
Fill the empty columns as 'UNKNOWN'
In [19]: df
Out[19]:
0 1 2 3
0 a a NaN a
1 b b NaN b
2 c c NaN c
In [20]: df.fillna({2:'UNKNOWN'})
Got the error
ValueError: could not convert string to float: UNKNOWN
Fill Data in an Empty Pandas DataFrame by Appending Rows First, create an empty DataFrame with column names and then append rows one by one. The append() method can also append rows. When creating an empty DataFrame with column names and row indices, we can fill data in rows using the loc() method.
Using fillna() to fill values from another column Here, we apply the fillna() function on “Col1” of the dataframe df and pass the series df['Col2'] as an argument. The above code fills the missing values in “Col1” with the corresponding values (based on the index) from “Col2”.
Your 2
column probably has a float dtype:
>>> df
0 1 2 3
0 a a NaN a
1 b b NaN b
2 c c NaN c
>>> df.dtypes
0 object
1 object
2 float64
3 object
dtype: object
Hence the problem. If you don't mind converting the whole frame to object
, you could:
>>> df.astype(object).fillna("UNKNOWN")
0 1 2 3
0 a a UNKNOWN a
1 b b UNKNOWN b
2 c c UNKNOWN c
Depending on whether there's non-string data you might want to be more selective about converting column dtypes, and/or specify the dtypes on read, but the above should work, anyhow.
Update: if you have dtype information you want to preserve, rather than switching it back, I'd go the other way and only fill on the columns that you wanted to, either using a loop with fillna
:
>>> df
0 1 2 3 4 5
0 0 a a NaN a NaN
1 1 b b NaN b NaN
2 2 c c NaN c NaN
>>> df.dtypes
0 int64
1 object
2 object
3 float64
4 object
5 float64
dtype: object
>>> for col in df.columns[pd.isnull(df).all()]:
... df[col] = df[col].astype(object).fillna("UNKNOWN")
...
>>> df
0 1 2 3 4 5
0 0 a a UNKNOWN a UNKNOWN
1 1 b b UNKNOWN b UNKNOWN
2 2 c c UNKNOWN c UNKNOWN
>>> df.dtypes
0 int64
1 object
2 object
3 object
4 object
5 object
dtype: object
Or (if you're using all
), then maybe not even use fillna
at all:
>>> df
0 1 2 3 4 5
0 0 a a NaN a NaN
1 1 b b NaN b NaN
2 2 c c NaN c NaN
>>> df.ix[:,pd.isnull(df).all()] = "UNKNOWN"
>>> df
0 1 2 3 4 5
0 0 a a UNKNOWN a UNKNOWN
1 1 b b UNKNOWN b UNKNOWN
2 2 c c UNKNOWN c UNKNOWN
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With