Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

A NumPy equivalent of pandas read_clipboard?

For example, if a question/answer you encounter posts an array like this:

[[ 0  1  2  3  4  5  6  7]
 [ 8  9 10 11 12 13 14 15]
 [16 17 18 19 20 21 22 23]
 [24 25 26 27 28 29 30 31]
 [32 33 34 35 36 37 38 39]
 [40 41 42 43 44 45 46 47]
 [48 49 50 51 52 53 54 55]
 [56 57 58 59 60 61 62 63]]

How would you load it into a variable in a REPL session without having to add commas everywhere?

like image 838
cs95 Avatar asked Dec 10 '22 10:12

cs95


2 Answers

For a one-time occasion, I might do this:

  • Copy the text containing the array to the clipboard.
  • In an ipython shell, enter s = """, but do not hit return.
  • Paste the text from the clipboard.
  • Type the closing triple quote.

That gives me:

In [16]: s = """[[ 0  1  2  3  4  5  6  7]
    ...:  [ 8  9 10 11 12 13 14 15]
    ...:  [16 17 18 19 20 21 22 23]
    ...:  [24 25 26 27 28 29 30 31]
    ...:  [32 33 34 35 36 37 38 39]
    ...:  [40 41 42 43 44 45 46 47]
    ...:  [48 49 50 51 52 53 54 55]
    ...:  [56 57 58 59 60 61 62 63]]"""

Then use np.loadtxt() as follows:

In [17]: a = np.loadtxt([line.lstrip(' [').rstrip(']') for line in s.splitlines()], dtype=int)

In [18]: a
Out[18]: 
array([[ 0,  1,  2,  3,  4,  5,  6,  7],
       [ 8,  9, 10, 11, 12, 13, 14, 15],
       [16, 17, 18, 19, 20, 21, 22, 23],
       [24, 25, 26, 27, 28, 29, 30, 31],
       [32, 33, 34, 35, 36, 37, 38, 39],
       [40, 41, 42, 43, 44, 45, 46, 47],
       [48, 49, 50, 51, 52, 53, 54, 55],
       [56, 57, 58, 59, 60, 61, 62, 63]])
like image 179
Warren Weckesser Avatar answered Dec 12 '22 22:12

Warren Weckesser


If you have Pandas, pyperclip or something else to read from the clipboard you could use something like this:

from pandas.io.clipboard import clipboard_get
# import pyperclip
import numpy as np
import re
import ast

def numpy_from_clipboard():
    inp = clipboard_get()
    # inp = pyperclip.paste()
    inp = inp.strip()
    # if it starts with "array(" we just need to remove the
    # leading "array(" and remove the optional ", dtype=xxx)"
    if inp.startswith('array('):
        inp = re.sub(r'^array\(', '', inp)
        dtype = re.search(r', dtype=(\w+)\)$', inp)
        if dtype:
            return np.array(ast.literal_eval(inp[:dtype.start()]), dtype=dtype.group(1))
        else:
            return np.array(ast.literal_eval(inp[:-1]))
    else:
        # In case it's the string representation it's a bit harder.
        # We need to remove all spaces between closing and opening brackets
        inp = re.sub(r'\]\s+\[', '],[', inp)
        # We need to remove all whitespaces following an opening bracket
        inp = re.sub(r'\[\s+', '[', inp)
        # and all leading whitespaces before closing brackets
        inp = re.sub(r'\s+\]', ']', inp)
        # replace all remaining whitespaces with ","
        inp = re.sub(r'\s+', ',', inp)
        return np.array(ast.literal_eval(inp))

And then read what you saved in the clipboard:

>>> numpy_from_clipboard()
array([[ 0,  1,  2,  3,  4,  5,  6,  7],
       [ 8,  9, 10, 11, 12, 13, 14, 15],
       [16, 17, 18, 19, 20, 21, 22, 23],
       [24, 25, 26, 27, 28, 29, 30, 31],
       [32, 33, 34, 35, 36, 37, 38, 39],
       [40, 41, 42, 43, 44, 45, 46, 47],
       [48, 49, 50, 51, 52, 53, 54, 55],
       [56, 57, 58, 59, 60, 61, 62, 63]])

This should be able to parse (most) arrays (str as well as repr of arrays) from your clipboard. It should even work for multi-line arrays (where np.loadtxt fails):

[[ 0.34866207  0.38494993  0.7053722   0.64586156  0.27607369  0.34850162
   0.20530567  0.46583039  0.52982216  0.92062115]
 [ 0.06973858  0.13249867  0.52419149  0.94707951  0.868956    0.72904737
   0.51666421  0.95239542  0.98487436  0.40597835]
 [ 0.66246734  0.85333546  0.072423    0.76936201  0.40067016  0.83163118
   0.45404714  0.0151064   0.14140024  0.12029861]
 [ 0.2189936   0.36662076  0.90078913  0.39249484  0.82844509  0.63609079
   0.18102383  0.05339892  0.3243505   0.64685352]
 [ 0.803504    0.57531309  0.0372428   0.8308381   0.89134864  0.39525473
   0.84138386  0.32848746  0.76247531  0.99299639]]

>>> numpy_from_clipboard()
array([[ 0.34866207,  0.38494993,  0.7053722 ,  0.64586156,  0.27607369,
         0.34850162,  0.20530567,  0.46583039,  0.52982216,  0.92062115],
       [ 0.06973858,  0.13249867,  0.52419149,  0.94707951,  0.868956  ,
         0.72904737,  0.51666421,  0.95239542,  0.98487436,  0.40597835],
       [ 0.66246734,  0.85333546,  0.072423  ,  0.76936201,  0.40067016,
         0.83163118,  0.45404714,  0.0151064 ,  0.14140024,  0.12029861],
       [ 0.2189936 ,  0.36662076,  0.90078913,  0.39249484,  0.82844509,
         0.63609079,  0.18102383,  0.05339892,  0.3243505 ,  0.64685352],
       [ 0.803504  ,  0.57531309,  0.0372428 ,  0.8308381 ,  0.89134864,
         0.39525473,  0.84138386,  0.32848746,  0.76247531,  0.99299639]])

However I'm not too good with regexes so this probably isn't foolproof and using ast.literal_eval feels a bit awkard (but it avoids doing the parsing yourself).

Feel free to suggest improvements.

like image 24
MSeifert Avatar answered Dec 12 '22 23:12

MSeifert