Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to copy/paste DataFrame from Stack Overflow into Python

In questions and answers, users very often post an example DataFrame which their question/answer works with:

In []: x Out[]:     bar  foo 0    4    1 1    5    2 2    6    3 

It'd be really useful to be able to get this DataFrame into my Python interpreter so I can start debugging the question, or testing the answer.

How can I do this?

like image 587
LondonRob Avatar asked Jul 24 '15 12:07

LondonRob


People also ask

How do you copy and paste a DataFrame in Python?

to_clipboard() By default, the excel parameter is set to True , and the contents of DataFrame are copied to the clipboard separated by TAB \t . It can be pasted directly into spreadsheets such as Excel and Numbers. If excel=False , the string displayed by print(df) is copied to the clipboard.

How do I copy and paste in stackoverflow?

The Key features one Stack Overflow icon button, alongside a C and V button, offering a dedicated keyboard for the Windows/Mac shortcut Ctrl/Command + C and Ctrl/Command + V for copy and paste, respectively.

Can you copy a DataFrame in Python?

Pandas DataFrame copy() MethodThe copy() method returns a copy of the DataFrame. By default, the copy is a "deep copy" meaning that any changes made in the original DataFrame will NOT be reflected in the copy.


2 Answers

Pandas is written by people that really know what people want to do.

Since version 0.13 there's a function pd.read_clipboard which is absurdly effective at making this "just work".

Copy and paste the part of the code in the question that starts bar foo, (i.e. the DataFrame) and do this in a Python interpreter:

In [53]: import pandas as pd In [54]: df = pd.read_clipboard()  In [55]: df Out[55]:     bar  foo 0    4    1 1    5    2 2    6    3 

Caveats

  • Don't include the iPython In or Out stuff or it won't work
  • If you have a named index, you currently need to add engine='python' (see this issue on GitHub). The 'c' engine is currently broken when the index is named.
  • It's not brilliant at MultiIndexes:

Try this:

                      0         1         2 level1 level2                               foo    a       0.518444  0.239354  0.364764        b       0.377863  0.912586  0.760612 bar    a       0.086825  0.118280  0.592211 

which doesn't work at all, or this:

              0         1         2 foo a  0.859630  0.399901  0.052504     b  0.231838  0.863228  0.017451 bar a  0.422231  0.307960  0.801993 

Which works, but returns something totally incorrect!

like image 121
LondonRob Avatar answered Oct 07 '22 02:10

LondonRob


pd.read_clipboard() is nifty. However, if you're writing code in a script or a notebook (and you want your code to work in the future) it's not a great fit. Here's an alternative way to copy/paste the output of a dataframe into a new dataframe object that ensures that df will outlive the contents of your clipboard:

# py3 only, see below for py2 import pandas as pd from io import StringIO  d = '''0   1   2   3   4 A   Y   N   N   Y B   N   Y   N   N C   N   N   N   N D   Y   Y   N   Y E   N   Y   Y   Y F   Y   Y   N   Y G   Y   N   N   Y'''  df = pd.read_csv(StringIO(d), sep='\s+') 

A few notes:

  • The triple-quoted string preserves the newlines in the output.
  • StringIO wraps the output in a file-like object, which read_csv requires.
  • Setting sep to \s+ makes it so that each contiguous block of whitespace is treated as a single delimiter.

update

The above answer is Python 3 only. If you're stuck in Python 2, replace the import line:

from io import StringIO 

with instead:

from StringIO import StringIO 

If you have an old version of pandas (v0.24 or older) there's an easy way to write a Py2/Py3 compatible version of the above code:

import pandas as pd  d = ... df = pd.read_csv(pd.compat.StringIO(d), sep='\s+') 

The newest versions of pandas have dropped the compat module along with Python 2 support.

like image 34
tel Avatar answered Oct 07 '22 00:10

tel