Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Splitting a List inside a Pandas DataFrame

Tags:

python

pandas

csv

I have a csv file that contains a number of columns. Using pandas, I read this csv file into a dataframe and have a datetime index and five or six other columns.

One of the columns is a list of timestamps (example below with index)

CreateDate     TimeStamps
4/1/11         [Timestamp('2012-02-29 00:00:00'), Timestamp('2012-03-31 00:00:00'), Timestamp('2012-04-25 00:00:00'), Timestamp('2012-06-30 00:00:00')]
4/2/11         [Timestamp('2014-01-31 00:00:00')]
6/8/11         [Timestamp('2012-08-31 00:00:00'), Timestamp('2012-09-30 00:00:00'), Timestamp('2012-11-07 00:00:00'), Timestamp('2013-01-10 00:00:00'), Timestamp('2013-07-25 00:00:00')]

What I'd like to do is convert the timestamp column into separate rows for each timestamp listed. For example, for row 1 it would convert to 4 rows and row 2 would convert to 1 row. I realize I'd need to reset the index to be able to do this, which is fine.

Everything I've tried just ends up getting out into left field (taking the values and create a list outside of pandas, etc)

Any suggestions appreciated.

like image 997
Eric D. Brown Avatar asked Feb 10 '15 21:02

Eric D. Brown


People also ask

How do you split a list into two columns in Python?

split() Pandas provide a method to split string around a passed separator/delimiter. After that, the string can be stored as a list in a series or it can also be used to create multiple column data frames from a single separated string.

How do you split an object in a DataFrame in Python?

We can use Pandas . str accessor, it does fast vectorized string operations for Series and Dataframes and returns a string object. Pandas str accessor has number of useful methods and one of them is str. split , it can be used with split to get the desired part of the string.

How do you split a list into a row in Python?

Pandas explode() to separate list elements into separate rows() Now that we have column with list as elements, we can use Pandas explode() function on it. Pandas explode() function will split the list by each element and create a new row for each of them.


2 Answers

The way I did it was split the list into seperate columns, and then melted it to put each timestamp in a separate row.

In [48]: df = pd.DataFrame([[1,2,[1,2,4]],[4,5,[1,3]],],columns=['a','b','TimeStamp'])
    ...: df
Out[48]: 
   a  b  TimeStamp
0  1  2  [1, 2, 4]
1  4  5     [1, 3]

You can convert the column to a list and then back to a DataFrame to split it into columns:

In [53]: TScolumns = pd.DataFrame(df.TimeStamp.tolist(), )
    ...: TScolumns
Out[53]: 
   0  1   2
0  1  2   4
1  1  3 NaN

And then splice it onto the original dataframe

In [90]: df = df.drop('TimeStamp',axis=1)
In [58]: split = pd.concat([df, TScolumns], axis=1)
    ...: split
Out[58]: 
   a  b  0  1   2
0  1  2  1  2   4
1  4  5  1  3 NaN

Finally, use melt to get it into the shape you want:

In [89]: pd.melt(split, id_vars=['a', 'b'], value_name='TimeStamp')
Out[89]: 
   a  b variable  TimeStamp
0  1  2        0          1
1  4  5        0          1
2  1  2        1          2
3  4  5        1          3
4  1  2        2          4
5  4  5        2        NaN
like image 81
ari Avatar answered Oct 08 '22 17:10

ari


A newer way to do this would be to use explode (documentation)

import pandas as pd

d = {'date': ['4/1/11', '4/2/11'], 'ts': [[pd.Timestamp('2012-02-29 00:00:00'), pd.Timestamp('2012-03-31 00:00:00'), pd.Timestamp('2012-04-25 00:00:00'), pd.Timestamp('2012-06-30 00:00:00')], [pd.Timestamp('2014-01-31 00:00:00')]]}

test_df = pd.DataFrame(d)

result_df = test_df.explode('ts')

result_df.head()

Output

    date    ts
0   4/1/11  2012-02-29
0   4/1/11  2012-03-31
0   4/1/11  2012-04-25
0   4/1/11  2012-06-30
1   4/2/11  2014-01-31
like image 37
Rahul P Avatar answered Oct 08 '22 15:10

Rahul P