I have a csv file that contains a number of columns. Using pandas, I read this csv file into a dataframe and have a datetime index and five or six other columns.
One of the columns is a list of timestamps (example below with index)
CreateDate TimeStamps
4/1/11 [Timestamp('2012-02-29 00:00:00'), Timestamp('2012-03-31 00:00:00'), Timestamp('2012-04-25 00:00:00'), Timestamp('2012-06-30 00:00:00')]
4/2/11 [Timestamp('2014-01-31 00:00:00')]
6/8/11 [Timestamp('2012-08-31 00:00:00'), Timestamp('2012-09-30 00:00:00'), Timestamp('2012-11-07 00:00:00'), Timestamp('2013-01-10 00:00:00'), Timestamp('2013-07-25 00:00:00')]
What I'd like to do is convert the timestamp column into separate rows for each timestamp listed. For example, for row 1 it would convert to 4 rows and row 2 would convert to 1 row. I realize I'd need to reset the index to be able to do this, which is fine.
Everything I've tried just ends up getting out into left field (taking the values and create a list outside of pandas, etc)
Any suggestions appreciated.
split() Pandas provide a method to split string around a passed separator/delimiter. After that, the string can be stored as a list in a series or it can also be used to create multiple column data frames from a single separated string.
We can use Pandas . str accessor, it does fast vectorized string operations for Series and Dataframes and returns a string object. Pandas str accessor has number of useful methods and one of them is str. split , it can be used with split to get the desired part of the string.
Pandas explode() to separate list elements into separate rows() Now that we have column with list as elements, we can use Pandas explode() function on it. Pandas explode() function will split the list by each element and create a new row for each of them.
The way I did it was split the list into seperate columns, and then melt
ed it to put each timestamp in a separate row.
In [48]: df = pd.DataFrame([[1,2,[1,2,4]],[4,5,[1,3]],],columns=['a','b','TimeStamp'])
...: df
Out[48]:
a b TimeStamp
0 1 2 [1, 2, 4]
1 4 5 [1, 3]
You can convert the column to a list and then back to a DataFrame
to split it into columns:
In [53]: TScolumns = pd.DataFrame(df.TimeStamp.tolist(), )
...: TScolumns
Out[53]:
0 1 2
0 1 2 4
1 1 3 NaN
And then splice it onto the original dataframe
In [90]: df = df.drop('TimeStamp',axis=1)
In [58]: split = pd.concat([df, TScolumns], axis=1)
...: split
Out[58]:
a b 0 1 2
0 1 2 1 2 4
1 4 5 1 3 NaN
Finally, use melt
to get it into the shape you want:
In [89]: pd.melt(split, id_vars=['a', 'b'], value_name='TimeStamp')
Out[89]:
a b variable TimeStamp
0 1 2 0 1
1 4 5 0 1
2 1 2 1 2
3 4 5 1 3
4 1 2 2 4
5 4 5 2 NaN
A newer way to do this would be to use explode
(documentation)
import pandas as pd
d = {'date': ['4/1/11', '4/2/11'], 'ts': [[pd.Timestamp('2012-02-29 00:00:00'), pd.Timestamp('2012-03-31 00:00:00'), pd.Timestamp('2012-04-25 00:00:00'), pd.Timestamp('2012-06-30 00:00:00')], [pd.Timestamp('2014-01-31 00:00:00')]]}
test_df = pd.DataFrame(d)
result_df = test_df.explode('ts')
result_df.head()
Output
date ts
0 4/1/11 2012-02-29
0 4/1/11 2012-03-31
0 4/1/11 2012-04-25
0 4/1/11 2012-06-30
1 4/2/11 2014-01-31
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With