How can I copy DataFrames with datetimes from Stack Overflow into Python?

Tags:

I see frequent pandas examples on SO using time series that have spaces within the timestamps:

                     A
2020-01-01 09:20:00  0
2020-01-01 09:21:00  1
2020-01-01 09:22:00  2
2020-01-01 09:23:00  3
2020-01-01 09:24:00  4

Or this where the times aren't part of the index:

                dates    values cat
0 2020-01-01 09:20:00  0.758513   a
1 2020-01-01 09:21:00  0.337325   b
2 2020-01-01 09:22:00  0.618372   b
3 2020-01-01 09:23:00  0.878714   b
4 2020-01-01 09:24:00  0.311069   b

Is there a good way to copy these (or similar) data back into Python to work with? I have found posts like this and this which were lifesavers for getting many examples out of SO, yet I usually fail to find a copy/paste approach (using pd.read_clipboard() or pd.read_table()) that works for such data. This often then discourages me from attempting an answer¹.

The examples above were created this way:

#one
import pandas as pd
import numpy

dr = pd.date_range('01-01-2020 9:20', '01-01-2020 9:24', freq='1T')
df1 = pd.DataFrame(index=dr, data=range(len(dr)), columns=['A'])

#two
df2 = pd.DataFrame({'dates':dr,
                    'values':np.random.rand(len(dr)),
                    'cat':np.random.choice(['a','b'],len(dr))})

_{1. And for the record, I reckon the onus should be on posters to post their data in a more copy-able format, else not get answered. For time series information, rather than pasting the string representation of the DataFrame I always try to post constructing code (using pd.date_range() or w/e). And I imagine using something like df.to_dict() would be better if there are specific (irregular interval) dates that need to be copied for the example.}

344

asked Jul 05 '20 17:07

Tom

1 Answers

I usually copy the entire string and then parse it. It's not perfect and you usually have to edit both the string and the dataframe to make it usable. Here is one example. This solution was already provided in this answer. I've only added the bit about parsing date/time.

import pandas as pd
from io import StringIO
from dateutil.parser import parse

# I added two more column names `date` and `time`.
# An advantage of having the string in your python code is that
# you can edit it in your text editor/jupyter notebook quickly and directly.
s = """date time A
2020-01-01 09:20:00  0
2020-01-01 09:21:00  1
2020-01-01 09:22:00  2
2020-01-01 09:23:00  3
2020-01-01 09:24:00  4"""

# Parse using whitespace separator. This will still not be perfect as we can
# see below.
df = pd.read_csv(StringIO(s), sep="\s+", index_col=False)
df
#          date      time  A
# 0  2020-01-01  09:20:00  0
# 1  2020-01-01  09:21:00  1
# 2  2020-01-01  09:22:00  2
# 3  2020-01-01  09:23:00  3
# 4  2020-01-01  09:24:00  4

# Combine date and time column together and drop the individual columns.
df['datetime'] = df['date'] + " " + df['time']
df = df.drop(['date', 'time'], axis=1)

# Use a somewhat universal parser in dateutil.parser.parse to parse the
# dates into proper dateime object.
df['datetime'] = df['datetime'].apply(parse)
df
#    A            datetime
# 0  0 2020-01-01 09:20:00
# 1  1 2020-01-01 09:21:00
# 2  2 2020-01-01 09:22:00
# 3  3 2020-01-01 09:23:00
# 4  4 2020-01-01 09:24:00

df.index
# RangeIndex(start=0, stop=5, step=1)

df.dtypes
# A                    int64
# datetime    datetime64[ns]
# dtype: object

df.columns
# Index(['A', 'datetime'], dtype='object')

One method to provide formatted and parseable dataframe on StackOverflow is by outputting a csv-formatted string.

# Continued from above
print(df.to_csv(index=False))
# A,datetime
# 0,2020-01-01 09:20:00
# 1,2020-01-01 09:21:00
# 2,2020-01-01 09:22:00
# 3,2020-01-01 09:23:00
# 4,2020-01-01 09:24:00

# We can indeed parse nicely from the csv-formatted string 
s_redux = df.to_csv(index=False)
pd.read_csv(StringIO(s_redux))
#    A             datetime
# 0  0  2020-01-01 09:20:00
# 1  1  2020-01-01 09:21:00
# 2  2  2020-01-01 09:22:00
# 3  3  2020-01-01 09:23:00
# 4  4  2020-01-01 09:24:00

Here is one attempt at parsing the second example dataframe. As before, we do need to make some "edits" to the dataframe to make it usable.

import pandas as pd
from io import StringIO
from dateutil.parser import parse

s="""                dates    values cat
0 2020-01-01 09:20:00  0.758513   a
1 2020-01-01 09:21:00  0.337325   b
2 2020-01-01 09:22:00  0.618372   b
3 2020-01-01 09:23:00  0.878714   b
4 2020-01-01 09:24:00  0.311069   b"""

df = pd.read_csv(StringIO(s), sep="\s+").reset_index()
df
#    level_0     level_1     dates    values cat
# 0        0  2020-01-01  09:20:00  0.758513   a
# 1        1  2020-01-01  09:21:00  0.337325   b
# 2        2  2020-01-01  09:22:00  0.618372   b
# 3        3  2020-01-01  09:23:00  0.878714   b
# 4        4  2020-01-01  09:24:00  0.311069   b

df['dates'] = df['level_1'] + " " + df['dates']
df = df.drop(['level_0', 'level_1'], axis=1)
df['dates'] = df['dates'].apply(parse)

df
#                 dates    values cat
# 0 2020-01-01 09:20:00  0.758513   a
# 1 2020-01-01 09:21:00  0.337325   b
# 2 2020-01-01 09:22:00  0.618372   b
# 3 2020-01-01 09:23:00  0.878714   b
# 4 2020-01-01 09:24:00  0.311069   b

df.dtypes
# dates     datetime64[ns]
# values           float64
# cat               object
# dtype: object

131

answered Sep 27 '22 22:09

Ankur

Related questions
                            
                                How to count the presence of a set of numbers in a set of intervals efficiently
                            
                                Django+Postgres FATAL: sorry, too many clients already
                            
                                Quantization aware training in TensorFlow version 2 and BatchNorm folding
                            
                                Is the ordering of pathlib's `glob` method consistent between runs?
                            
                                Define action values in keras-rl
                            
                                Flask - unable to get cookies
                            
                                Open CV Contour area miscalculation
                            
                                tkinter ttk treeview colored rows
                            
                                How should I investigate a memory leak when using Google Cloud Datastore Python libraries?
                            
                                Pytest: Finding when each test started and ended
                            
                                vispy visual.HistogramVisual
                            
                                NotImplementedError: Learning rate schedule must override get_config
                            
                                python asyncio - RuntimeError: await wasn't used with future
                            
                                Pytorch RuntimeError: element 0 of tensors does not require grad and does not have a grad_fn
                            
                                Ansible no longer works
                            
                                how to change string matrix to a integer matrix
                            
                                Pip install fails with “connection error" ssl problem
                            
                                Get current zoom and center from mapbox in dash
                            
                                Automatically refactor python lambdas to named functions
                            
                                How to fix "WARNING: Hidden import "pygame._view" not found!" when converting .py to .exe using PyInstaller?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

How can I copy DataFrames with datetimes from Stack Overflow into Python?

Tags:

python

clipboard

copy-paste

pandas

time-series

Tom

People also ask

1 Answers

Ankur

Recent Activity

Donate For Us