This behavior seems odd to me: the id
column (a string) gets converted to a timestamp upon transposing the df
if the other column is a timedelta.
import pandas as pd
df = pd.DataFrame({'id': ['00115', '01222', '32333'],
'val': [12, 14, 170]})
df['val'] = pd.to_timedelta(df.val, unit='Minutes')
print(df.T)
# 0 1 2
#id 0 days 00:00:00.000000 0 days 00:00:00.000001 0 days 00:00:00.000032
#val 365 days 05:49:12 426 days 02:47:24 5174 days 06:27:00
type(df.T[0][0])
#pandas._libs.tslib.Timedelta
Without the timedelta it works as I'd expect, and the id
column remains a string, even though the other column is an integer and all of the strings could be safely cast to integers.
df2 = pd.DataFrame({'id': ['00115', '01222', '32333'],
'val': [1, 1231, 1413]})
type(df2.T[0][0])
#str
Why does the type of id
get changed in the first instance, but not the second?
This means, if you say when a column is an Object dtype, and it doesn't mean all the values in that column will be a string or text data. In fact, they may be numbers, or a mixture of string, integers, and floats dtype. So with this incompatibility, we can not do any string operations on that column directly.
Pandas uses the object dtype for storing strings.
By default, Pandas will store strings using the object dtype, meaning it store strings as NumPy array of pointers to normal Python object. In Pandas 1.0, a new "string" dtype was added, but as we'll see it didn't have any impact on memory usage.
to_numeric() The best way to convert one or more columns of a DataFrame to numeric values is to use pandas. to_numeric() . This function will try to change non-numeric objects (such as strings) into integers or floating-point numbers as appropriate.
A dataframe should be thought of in columns. Each column must have a single data type. When you transpose, you are changing which cells are now associated with each other in the new columns. Prior to transpose, you had an string column and a timedelta column. After transpose, each column had a string and a timedelta. Pandas has to decide how to cast the new columns. It decided to go with timedelta. It is my opinion that this is a goofy choice.
You can change this behavior by changing the dtype on a newly constructed dataframe.
pd.DataFrame(df.values.T, df.columns, df.index, dtype=object)
0 1 2
id 00115 01222 32333
val 365 days 05:49:12 426 days 02:47:24 5174 days 06:27:00
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With