Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How can I copy DataFrames with datetimes from Stack Overflow into Python?

I see frequent pandas examples on SO using time series that have spaces within the timestamps:

                     A
2020-01-01 09:20:00  0
2020-01-01 09:21:00  1
2020-01-01 09:22:00  2
2020-01-01 09:23:00  3
2020-01-01 09:24:00  4

Or this where the times aren't part of the index:

                dates    values cat
0 2020-01-01 09:20:00  0.758513   a
1 2020-01-01 09:21:00  0.337325   b
2 2020-01-01 09:22:00  0.618372   b
3 2020-01-01 09:23:00  0.878714   b
4 2020-01-01 09:24:00  0.311069   b

Is there a good way to copy these (or similar) data back into Python to work with? I have found posts like this and this which were lifesavers for getting many examples out of SO, yet I usually fail to find a copy/paste approach (using pd.read_clipboard() or pd.read_table()) that works for such data. This often then discourages me from attempting an answer1.

The examples above were created this way:

#one
import pandas as pd
import numpy

dr = pd.date_range('01-01-2020 9:20', '01-01-2020 9:24', freq='1T')
df1 = pd.DataFrame(index=dr, data=range(len(dr)), columns=['A'])

#two
df2 = pd.DataFrame({'dates':dr,
                    'values':np.random.rand(len(dr)),
                    'cat':np.random.choice(['a','b'],len(dr))})

1. And for the record, I reckon the onus should be on posters to post their data in a more copy-able format, else not get answered. For time series information, rather than pasting the string representation of the DataFrame I always try to post constructing code (using pd.date_range() or w/e). And I imagine using something like df.to_dict() would be better if there are specific (irregular interval) dates that need to be copied for the example.

like image 344
Tom Avatar asked Jul 05 '20 17:07

Tom


People also ask

How do you copy and paste a DataFrame in Python?

pandas: Copy DataFrame to the clipboard with to_clipboard() The to_clipboard() method of pandas. DataFrame copies its contents to the clipboard. You can paste it directly to spreadsheets such as Excel or Numbers. It is very useful when used with IPython or Jupyter Notebook.

How do I save a DataFrame from CSV to Python?

Exporting the DataFrame into a CSV filePandas DataFrame to_csv() function exports the DataFrame to CSV format. If a file argument is provided, the output will be the CSV file. Otherwise, the return value is a CSV format like string. sep: Specify a custom delimiter for the CSV output, the default is a comma.

How do I add months to pandas?

In pandas, a string is converted to a datetime object using the pd. to_datetime() method and pd. DateOffset() method is used to add months to the created pandas object.


1 Answers

I usually copy the entire string and then parse it. It's not perfect and you usually have to edit both the string and the dataframe to make it usable. Here is one example. This solution was already provided in this answer. I've only added the bit about parsing date/time.

import pandas as pd
from io import StringIO
from dateutil.parser import parse

# I added two more column names `date` and `time`.
# An advantage of having the string in your python code is that
# you can edit it in your text editor/jupyter notebook quickly and directly.
s = """date time A
2020-01-01 09:20:00  0
2020-01-01 09:21:00  1
2020-01-01 09:22:00  2
2020-01-01 09:23:00  3
2020-01-01 09:24:00  4"""

# Parse using whitespace separator. This will still not be perfect as we can
# see below.
df = pd.read_csv(StringIO(s), sep="\s+", index_col=False)
df
#          date      time  A
# 0  2020-01-01  09:20:00  0
# 1  2020-01-01  09:21:00  1
# 2  2020-01-01  09:22:00  2
# 3  2020-01-01  09:23:00  3
# 4  2020-01-01  09:24:00  4

# Combine date and time column together and drop the individual columns.
df['datetime'] = df['date'] + " " + df['time']
df = df.drop(['date', 'time'], axis=1)

# Use a somewhat universal parser in dateutil.parser.parse to parse the
# dates into proper dateime object.
df['datetime'] = df['datetime'].apply(parse)
df
#    A            datetime
# 0  0 2020-01-01 09:20:00
# 1  1 2020-01-01 09:21:00
# 2  2 2020-01-01 09:22:00
# 3  3 2020-01-01 09:23:00
# 4  4 2020-01-01 09:24:00

df.index
# RangeIndex(start=0, stop=5, step=1)

df.dtypes
# A                    int64
# datetime    datetime64[ns]
# dtype: object

df.columns
# Index(['A', 'datetime'], dtype='object')

One method to provide formatted and parseable dataframe on StackOverflow is by outputting a csv-formatted string.

# Continued from above
print(df.to_csv(index=False))
# A,datetime
# 0,2020-01-01 09:20:00
# 1,2020-01-01 09:21:00
# 2,2020-01-01 09:22:00
# 3,2020-01-01 09:23:00
# 4,2020-01-01 09:24:00

# We can indeed parse nicely from the csv-formatted string 
s_redux = df.to_csv(index=False)
pd.read_csv(StringIO(s_redux))
#    A             datetime
# 0  0  2020-01-01 09:20:00
# 1  1  2020-01-01 09:21:00
# 2  2  2020-01-01 09:22:00
# 3  3  2020-01-01 09:23:00
# 4  4  2020-01-01 09:24:00

Here is one attempt at parsing the second example dataframe. As before, we do need to make some "edits" to the dataframe to make it usable.

import pandas as pd
from io import StringIO
from dateutil.parser import parse

s="""                dates    values cat
0 2020-01-01 09:20:00  0.758513   a
1 2020-01-01 09:21:00  0.337325   b
2 2020-01-01 09:22:00  0.618372   b
3 2020-01-01 09:23:00  0.878714   b
4 2020-01-01 09:24:00  0.311069   b"""

df = pd.read_csv(StringIO(s), sep="\s+").reset_index()
df
#    level_0     level_1     dates    values cat
# 0        0  2020-01-01  09:20:00  0.758513   a
# 1        1  2020-01-01  09:21:00  0.337325   b
# 2        2  2020-01-01  09:22:00  0.618372   b
# 3        3  2020-01-01  09:23:00  0.878714   b
# 4        4  2020-01-01  09:24:00  0.311069   b

df['dates'] = df['level_1'] + " " + df['dates']
df = df.drop(['level_0', 'level_1'], axis=1)
df['dates'] = df['dates'].apply(parse)

df
#                 dates    values cat
# 0 2020-01-01 09:20:00  0.758513   a
# 1 2020-01-01 09:21:00  0.337325   b
# 2 2020-01-01 09:22:00  0.618372   b
# 3 2020-01-01 09:23:00  0.878714   b
# 4 2020-01-01 09:24:00  0.311069   b

df.dtypes
# dates     datetime64[ns]
# values           float64
# cat               object
# dtype: object
like image 131
Ankur Avatar answered Sep 27 '22 22:09

Ankur