Why does transposing a DataFrame with strings and timedeltas convert the dtype?

Tags:

pandas

This behavior seems odd to me: the id column (a string) gets converted to a timestamp upon transposing the df if the other column is a timedelta.

import pandas as pd
df = pd.DataFrame({'id': ['00115', '01222', '32333'],
                   'val': [12, 14, 170]})
df['val'] = pd.to_timedelta(df.val, unit='Minutes')

print(df.T)
#                         0                      1                      2
#id  0 days 00:00:00.000000 0 days 00:00:00.000001 0 days 00:00:00.000032
#val      365 days 05:49:12      426 days 02:47:24     5174 days 06:27:00

type(df.T[0][0])
#pandas._libs.tslib.Timedelta

Without the timedelta it works as I'd expect, and the id column remains a string, even though the other column is an integer and all of the strings could be safely cast to integers.

df2 = pd.DataFrame({'id': ['00115', '01222', '32333'],
                    'val': [1, 1231, 1413]})

type(df2.T[0][0])
#str

Why does the type of id get changed in the first instance, but not the second?

352

asked Jun 15 '18 20:06

1 Answers

A dataframe should be thought of in columns. Each column must have a single data type. When you transpose, you are changing which cells are now associated with each other in the new columns. Prior to transpose, you had an string column and a timedelta column. After transpose, each column had a string and a timedelta. Pandas has to decide how to cast the new columns. It decided to go with timedelta. It is my opinion that this is a goofy choice.

You can change this behavior by changing the dtype on a newly constructed dataframe.

pd.DataFrame(df.values.T, df.columns, df.index, dtype=object)

                     0                  1                   2
id               00115              01222               32333
val  365 days 05:49:12  426 days 02:47:24  5174 days 06:27:00

answered Oct 19 '22 12:10

piRSquared

Related questions
                            
                                `Optimal` variable initialization and learning rate in Tensorflow for matrix factorization
                            
                                Can I prevent pip from downgrading packages implicitly?
                            
                                Redshift + SQLAlchemy long query hangs
                            
                                Is there a way to connecto Spark-Sql with sqlalchemy
                            
                                Keras No module named models
                            
                                Pandas: ascii codec cant encode character in position ordinal not in range - which cell?
                            
                                Get members of Exchange Distribution List in Python
                            
                                How do I shape my input data for use with Conv1D in keras?
                            
                                Dask read_csv fails where pandas doesn't
                            
                                Activate a Conda Environment in Python Script
                            
                                Python: How to generate all combinations of lists of tuples without repeating contents of the tuple
                            
                                os.path.abspath vs os.path.dirname
                            
                                Using ABC, PolymorphicModel, django-models gives metaclass conflict
                            
                                How do I distribute my pip package with data files correctly?
                            
                                HashSets and HashTables in Python
                            
                                What is the return_state output using Keras' RNN Layer
                            
                                How to get admin rights to write Windows usernames in a csv file with Python?
                            
                                Pandas dates being wrongly plotted at start of month
                            
                                Matplotlib pie chart wedge transparency?
                            
                                Can you use loc to select a range of columns plus a column outside of the range?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Why does transposing a DataFrame with strings and timedeltas convert the dtype?

Tags:

python

pandas

ALollz

People also ask

1 Answers

piRSquared

Recent Activity

Donate For Us