In questions and answers, users very often post an example <code>DataFrame</code> which their question/answer works with: <pre class="prettyprint"><code>In []: x Out[]: bar foo 0 4 1 1 5 2 2 6 3 </code></pre> It'd be really useful to be able to get this <code>DataFrame</code> into my Python interpreter so I can start debugging the question, or testing the answer. How can I do this?

Pandas is written by people that really know what people want to do. Since version <code>0.13</code> there's a function <code>pd.read_clipboard</code> which is absurdly effective at making this "just work". Copy and paste the part of the code in the question that starts <code>bar foo</code>, (i.e. the DataFrame) and do this in a Python interpreter: <pre class="prettyprint"><code>In [53]: import pandas as pd In [54]: df = pd.read_clipboard() In [55]: df Out[55]: bar foo 0 4 1 1 5 2 2 6 3 </code></pre> <h3>Caveats</h3> <ul> <li>Don't include the iPython <code>In</code> or <code>Out</code> stuff or it won't work</li> <li>If you have a named index, you currently need to add <code>engine='python'</code> (see this issue on GitHub). The 'c' engine is currently broken when the index is named.</li> <li>It's not brilliant at MultiIndexes:</li> </ul> Try this: <pre class="prettyprint"><code> 0 1 2 level1 level2 foo a 0.518444 0.239354 0.364764 b 0.377863 0.912586 0.760612 bar a 0.086825 0.118280 0.592211 </code></pre> which doesn't work at all, or this: <pre class="prettyprint"><code> 0 1 2 foo a 0.859630 0.399901 0.052504 b 0.231838 0.863228 0.017451 bar a 0.422231 0.307960 0.801993 </code></pre> Which works, but returns something totally incorrect!

<code>pd.read_clipboard()</code> is nifty. However, if you're writing code in a script or a notebook (and you want your code to work in the future) it's not a great fit. Here's an alternative way to copy/paste the output of a dataframe into a new dataframe object that ensures that <code>df</code> will outlive the contents of your clipboard: <pre class="prettyprint"><code># py3 only, see below for py2 import pandas as pd from io import StringIO d = '''0 1 2 3 4 A Y N N Y B N Y N N C N N N N D Y Y N Y E N Y Y Y F Y Y N Y G Y N N Y''' df = pd.read_csv(StringIO(d), sep='\s+') </code></pre> A few notes: <ul> <li>The triple-quoted string preserves the newlines in the output.</li> <li> <code>StringIO</code> wraps the output in a file-like object, which <code>read_csv</code> requires.</li> <li>Setting <code>sep</code> to <code>\s+</code> makes it so that each contiguous block of whitespace is treated as a single delimiter.</li> </ul> <h3>update</h3> The above answer is Python 3 only. If you're stuck in Python 2, replace the import line: <pre class="prettyprint"><code>from io import StringIO </code></pre> with instead: <pre class="prettyprint"><code>from StringIO import StringIO </code></pre> If you have an old version of <code>pandas</code> (<code>v0.24</code> or older) there's an easy way to write a Py2/Py3 compatible version of the above code: <pre class="prettyprint"><code>import pandas as pd d = ... df = pd.read_csv(pd.compat.StringIO(d), sep='\s+') </code></pre> The newest versions of <code>pandas</code> have dropped the <code>compat</code> module along with Python 2 support.

How to copy/paste DataFrame from Stack Overflow into Python

Tags:

python

clipboard

pandas

In questions and answers, users very often post an example DataFrame which their question/answer works with:

In []: x Out[]:     bar  foo 0    4    1 1    5    2 2    6    3

It'd be really useful to be able to get this DataFrame into my Python interpreter so I can start debugging the question, or testing the answer.

How can I do this?

587

asked Jul 24 '15 12:07

LondonRob

2 Answers

Pandas is written by people that really know what people want to do.

Since version 0.13 there's a function pd.read_clipboard which is absurdly effective at making this "just work".

Copy and paste the part of the code in the question that starts bar foo, (i.e. the DataFrame) and do this in a Python interpreter:

In [53]: import pandas as pd In [54]: df = pd.read_clipboard()  In [55]: df Out[55]:     bar  foo 0    4    1 1    5    2 2    6    3

Caveats

Don't include the iPython In or Out stuff or it won't work
If you have a named index, you currently need to add engine='python' (see this issue on GitHub). The 'c' engine is currently broken when the index is named.
It's not brilliant at MultiIndexes:

Try this:

                      0         1         2 level1 level2                               foo    a       0.518444  0.239354  0.364764        b       0.377863  0.912586  0.760612 bar    a       0.086825  0.118280  0.592211

which doesn't work at all, or this:

              0         1         2 foo a  0.859630  0.399901  0.052504     b  0.231838  0.863228  0.017451 bar a  0.422231  0.307960  0.801993

Which works, but returns something totally incorrect!

121

answered Oct 07 '22 02:10

LondonRob

pd.read_clipboard() is nifty. However, if you're writing code in a script or a notebook (and you want your code to work in the future) it's not a great fit. Here's an alternative way to copy/paste the output of a dataframe into a new dataframe object that ensures that df will outlive the contents of your clipboard:

# py3 only, see below for py2 import pandas as pd from io import StringIO  d = '''0   1   2   3   4 A   Y   N   N   Y B   N   Y   N   N C   N   N   N   N D   Y   Y   N   Y E   N   Y   Y   Y F   Y   Y   N   Y G   Y   N   N   Y'''  df = pd.read_csv(StringIO(d), sep='\s+')

A few notes:

The triple-quoted string preserves the newlines in the output.
StringIO wraps the output in a file-like object, which read_csv requires.
Setting sep to \s+ makes it so that each contiguous block of whitespace is treated as a single delimiter.

update

The above answer is Python 3 only. If you're stuck in Python 2, replace the import line:

from io import StringIO

with instead:

from StringIO import StringIO

If you have an old version of pandas (v0.24 or older) there's an easy way to write a Py2/Py3 compatible version of the above code:

import pandas as pd  d = ... df = pd.read_csv(pd.compat.StringIO(d), sep='\s+')

The newest versions of pandas have dropped the compat module along with Python 2 support.

answered Oct 07 '22 00:10

tel

Related questions
                            
                                python logging ensure a handler is added only once
                            
                                How are generators and coroutines implemented in CPython?
                            
                                How to use logging.getLogger(__name__) in multiple modules
                            
                                What is the reason for difference between integer division and float to int conversion in python?
                            
                                Reverting from multiindex to single index dataframe in pandas
                            
                                Collection object is not callable error with PyMongo
                            
                                How to match a whole word with a regular expression?
                            
                                How to check whether a sentence is correct (simple grammar check in Python)?
                            
                                Why Should I use Redis when I have PostgreSQL as my database for Django? [closed]
                            
                                Invalid Token when using Octal numbers
                            
                                Parsing HTML in python - lxml or BeautifulSoup? Which of these is better for what kinds of purposes?
                            
                                SQLAlchemy DetachedInstanceError with regular attribute (not a relation)
                            
                                How to extract xml attribute using Python ElementTree
                            
                                list to dictionary conversion with multiple values per key?
                            
                                virtualenv: Specifing which packages to use system-wide vs local [duplicate]
                            
                                Python - How do I convert "an OS-level handle to an open file" to a file object?
                            
                                Overriding a static method in python
                            
                                Python - IOError: [Errno 13] Permission denied:
                            
                                Why does `None is None is None` return True? [duplicate]
                            
                                Python: slicing a multi-dimensional array

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

How to copy/paste DataFrame from Stack Overflow into Python

Tags:

python

clipboard

pandas

LondonRob

People also ask

2 Answers

Caveats

LondonRob

update

tel

Recent Activity

Donate For Us