Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why does transposing a DataFrame with strings and timedeltas convert the dtype?

Tags:

python

pandas

This behavior seems odd to me: the id column (a string) gets converted to a timestamp upon transposing the df if the other column is a timedelta.

import pandas as pd
df = pd.DataFrame({'id': ['00115', '01222', '32333'],
                   'val': [12, 14, 170]})
df['val'] = pd.to_timedelta(df.val, unit='Minutes')

print(df.T)
#                         0                      1                      2
#id  0 days 00:00:00.000000 0 days 00:00:00.000001 0 days 00:00:00.000032
#val      365 days 05:49:12      426 days 02:47:24     5174 days 06:27:00

type(df.T[0][0])
#pandas._libs.tslib.Timedelta

Without the timedelta it works as I'd expect, and the id column remains a string, even though the other column is an integer and all of the strings could be safely cast to integers.

df2 = pd.DataFrame({'id': ['00115', '01222', '32333'],
                    'val': [1, 1231, 1413]})

type(df2.T[0][0])
#str

Why does the type of id get changed in the first instance, but not the second?

like image 352
ALollz Avatar asked Jun 15 '18 20:06

ALollz


People also ask

Is Dtype object same as string?

This means, if you say when a column is an Object dtype, and it doesn't mean all the values in that column will be a string or text data. In fact, they may be numbers, or a mixture of string, integers, and floats dtype. So with this incompatibility, we can not do any string operations on that column directly.

What is the pandas Dtype for storing string data?

Pandas uses the object dtype for storing strings.

Is Dtype a string pandas?

By default, Pandas will store strings using the object dtype, meaning it store strings as NumPy array of pointers to normal Python object. In Pandas 1.0, a new "string" dtype was added, but as we'll see it didn't have any impact on memory usage.

How do you convert a DataFrame type?

to_numeric() The best way to convert one or more columns of a DataFrame to numeric values is to use pandas. to_numeric() . This function will try to change non-numeric objects (such as strings) into integers or floating-point numbers as appropriate.


1 Answers

A dataframe should be thought of in columns. Each column must have a single data type. When you transpose, you are changing which cells are now associated with each other in the new columns. Prior to transpose, you had an string column and a timedelta column. After transpose, each column had a string and a timedelta. Pandas has to decide how to cast the new columns. It decided to go with timedelta. It is my opinion that this is a goofy choice.

You can change this behavior by changing the dtype on a newly constructed dataframe.

pd.DataFrame(df.values.T, df.columns, df.index, dtype=object)

                     0                  1                   2
id               00115              01222               32333
val  365 days 05:49:12  426 days 02:47:24  5174 days 06:27:00
like image 50
piRSquared Avatar answered Oct 19 '22 12:10

piRSquared