Splitting a List inside a Pandas DataFrame

Tags:

I have a csv file that contains a number of columns. Using pandas, I read this csv file into a dataframe and have a datetime index and five or six other columns.

One of the columns is a list of timestamps (example below with index)

CreateDate     TimeStamps
4/1/11         [Timestamp('2012-02-29 00:00:00'), Timestamp('2012-03-31 00:00:00'), Timestamp('2012-04-25 00:00:00'), Timestamp('2012-06-30 00:00:00')]
4/2/11         [Timestamp('2014-01-31 00:00:00')]
6/8/11         [Timestamp('2012-08-31 00:00:00'), Timestamp('2012-09-30 00:00:00'), Timestamp('2012-11-07 00:00:00'), Timestamp('2013-01-10 00:00:00'), Timestamp('2013-07-25 00:00:00')]

What I'd like to do is convert the timestamp column into separate rows for each timestamp listed. For example, for row 1 it would convert to 4 rows and row 2 would convert to 1 row. I realize I'd need to reset the index to be able to do this, which is fine.

Everything I've tried just ends up getting out into left field (taking the values and create a list outside of pandas, etc)

Any suggestions appreciated.

997

asked Feb 10 '15 21:02

Eric D. Brown

2 Answers

The way I did it was split the list into seperate columns, and then melted it to put each timestamp in a separate row.

In [48]: df = pd.DataFrame([[1,2,[1,2,4]],[4,5,[1,3]],],columns=['a','b','TimeStamp'])
    ...: df
Out[48]: 
   a  b  TimeStamp
0  1  2  [1, 2, 4]
1  4  5     [1, 3]

You can convert the column to a list and then back to a DataFrame to split it into columns:

In [53]: TScolumns = pd.DataFrame(df.TimeStamp.tolist(), )
    ...: TScolumns
Out[53]: 
   0  1   2
0  1  2   4
1  1  3 NaN

And then splice it onto the original dataframe

In [90]: df = df.drop('TimeStamp',axis=1)
In [58]: split = pd.concat([df, TScolumns], axis=1)
    ...: split
Out[58]: 
   a  b  0  1   2
0  1  2  1  2   4
1  4  5  1  3 NaN

Finally, use melt to get it into the shape you want:

In [89]: pd.melt(split, id_vars=['a', 'b'], value_name='TimeStamp')
Out[89]: 
   a  b variable  TimeStamp
0  1  2        0          1
1  4  5        0          1
2  1  2        1          2
3  4  5        1          3
4  1  2        2          4
5  4  5        2        NaN

answered Oct 08 '22 17:10

ari

A newer way to do this would be to use explode (documentation)

import pandas as pd

d = {'date': ['4/1/11', '4/2/11'], 'ts': [[pd.Timestamp('2012-02-29 00:00:00'), pd.Timestamp('2012-03-31 00:00:00'), pd.Timestamp('2012-04-25 00:00:00'), pd.Timestamp('2012-06-30 00:00:00')], [pd.Timestamp('2014-01-31 00:00:00')]]}

test_df = pd.DataFrame(d)

result_df = test_df.explode('ts')

result_df.head()

Output

    date    ts
0   4/1/11  2012-02-29
0   4/1/11  2012-03-31
0   4/1/11  2012-04-25
0   4/1/11  2012-06-30
1   4/2/11  2014-01-31

answered Oct 08 '22 15:10

Rahul P

Related questions
                            
                                Python precision in string formatting with float numbers
                            
                                Python: How can I include the delimiter(s) in a string split? [duplicate]
                            
                                How to encode log file?
                            
                                How to get a file close event in python
                            
                                How do I solve NameError: name 'threading' is not defined in python 3.3
                            
                                How do I safely get the user's real IP address in Flask (using mod_wsgi)?
                            
                                how to load virtualenv using environmental module file (tcl script)?
                            
                                why readline() is much slower than readlines() in Python?
                            
                                How to collect stats from within scrapy spider callback?
                            
                                remove widgets from grid in tkinter
                            
                                Auto reloading Flask app when source code changes
                            
                                Python handling username and password for URL
                            
                                Redirect after login simply appends LOGIN_REDIRECT_URL
                            
                                BeautifulSoup responses with error
                            
                                Python requests module connection timeout
                            
                                How to dump json without quotes in python
                            
                                Python write create file directly in FTP
                            
                                python all possible combinations of 0,1 of length k
                            
                                Django serialize multiple objects in one call
                            
                                Remove row with all NaN from DataFrame in pandas

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Splitting a List inside a Pandas DataFrame

Tags:

python

pandas

csv

Eric D. Brown

People also ask

2 Answers

ari

Rahul P

Recent Activity

Donate For Us