Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Pandas read DataFrame with datetime columns from clipboard

I see a lot of DataFrames posted to StackOverflow that look like:

          a                  dt         b
0 -0.713356 2015-10-01 00:00:00 -0.159170
1 -1.636397 2015-10-01 00:30:00 -1.038110
2 -1.390117 2015-10-01 01:00:00 -1.124016

I still haven't figured out a good way to copy these into my interpreter using .read_clipboard (argument list in .read_table docs).

I thought the key was the parse_dates parameter:

parse_dates : boolean or list of ints or names or list of lists or dict, default False
* boolean. If True -> try parsing the index.
* list of ints or names. e.g. If [1, 2, 3] -> try parsing columns 1, 2, 3 each as a separate date column.
* list of lists. e.g. If [[1, 3]] -> combine columns 1 and 3 and parse as a single date column.
* dict, e.g. {‘foo’ : [1, 3]} -> parse columns 1, 3 as date and call result ‘foo’

pd.read_clipboard(parse_dates={'dt': [1, 2]}) raises the exception NotImplementedError: file structure not yet supported.

When I try skipping the first row pd.read_clipboard(parse_dates=[[1, 2]], names=['a', 'dt1', 'dt2', 'b'], skiprows=1, header=None) I get the same exception.

How do others do this?

like image 554
Alex Avatar asked Jan 25 '18 18:01

Alex


2 Answers

This is what I do. First, ensure that your columns have two spaces in between them:

          a                  dt         b
0 -0.713356  2015-10-01 00:00:00  -0.159170
1 -1.636397  2015-10-01 00:30:00  -1.038110
2 -1.390117  2015-10-01 01:00:00  -1.124016

Notice, the datetime column has one space between the date and time. This is important. Next, I use something like this to load it in:

df = pd.read_clipboard(sep='\s{2,}', parse_dates=[1], engine='python')
df

             a                  dt         b
0  0 -0.713356 2015-10-01 00:00:00 -0.159170
1  1 -1.636397 2015-10-01 00:30:00 -1.038110
2  2 -1.390117 2015-10-01 01:00:00 -1.124016

df.dtypes

a             object
dt    datetime64[ns]
b            float64
dtype: object

Yes, this isn't a fully automated process, but as long as you're dealing with small dataframes to copy in, it isn't that bad. Although I'm open to seeing better alternatives.

like image 116
cs95 Avatar answered Oct 10 '22 07:10

cs95


In case it helps someone, here is what I do now:

df = pd.read_clipboard(skiprows=1, names=['a', 'dt1', 'dt2', 'b'])
df['dt'] = pd.to_datetime(df['dt1'] + ' ' + df['dt2'])
df = df[['a', 'dt', 'b']]
like image 45
Alex Avatar answered Oct 10 '22 07:10

Alex