Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Retain None in pandas DataFrame (in spite of astype() and to_parquet())

How can I force a pandas DataFrame to retain None values, even when using astype()?

Details

Since the pd.DataFrame constructor offers no compound dtype parameter, I fix the types (required for to_parquet()) with the following function:

def _typed_dataframe(data: list) -> pd.DataFrame:
    typing = {
        'name': str,
        'value': np.float64,
        'info': str,
        'scale': np.int8,
    }    
    result = pd.DataFrame(data)
    for label in result.keys():
        result[label] = result[label].astype(typing[label])
    return result

Unfortunately, result[info] = result[info].astype(str) transforms all None values in info to "None" strings. How can I forbid this, i.e. retain None values?

To be more precise: None values in data become np.nan in the result DataFrame, which become "nan" by astype(str), which become "None" when extracted from result.

like image 386
DaveFar Avatar asked Oct 15 '25 04:10

DaveFar


1 Answers

Following @frosty's comment, we can use the alternative

    typing = {
        'name': str,
        'value': np.float64,
        'info': pd.StringDtype(),
        'scale': np.int8,
    }    

However, this requires pandas ~= 1.0.0.


As better solution, you can replace

for label in result.keys():
    result[label] = result[label].astype(typing[label])

by

result.astype(schema)

Unfortunately, result.astype(typing) has no effect since it cannot handle compound types.

like image 132
DaveFar Avatar answered Oct 17 '25 19:10

DaveFar