In pandas, how can I convert a column of a DataFrame into dtype object? Or better yet, into a factor? (For those who speak R, in Python, how do I as.factor()
?)
Also, what's the difference between pandas.Factor
and pandas.Categorical
?
The dtype specified can be a buil-in Python, numpy , or pandas dtype. Let's suppose we want to convert column A (which is currently a string of type object ) into a column holding integers. To do so, we simply need to call astype on the pandas DataFrame object and explicitly define the dtype we wish to cast the column.
You can use the astype
method to cast a Series (one column):
df['col_name'] = df['col_name'].astype(object)
Or the entire DataFrame:
df = df.astype(object)
Since version 0.15, you can use the category datatype in a Series/column:
df['col_name'] = df['col_name'].astype('category')
Note: pd.Factor
was been deprecated and has been removed in favor of pd.Categorical
.
There's also pd.factorize function to use:
# use the df data from @herrfz In [150]: pd.factorize(df.b) Out[150]: (array([0, 1, 0, 1, 2]), array(['yes', 'no', 'absent'], dtype=object)) In [152]: df['c'] = pd.factorize(df.b)[0] In [153]: df Out[153]: a b c 0 1 yes 0 1 2 no 1 2 3 yes 0 3 4 no 1 4 5 absent 2
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With