Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Creating a 2d numpy array to hold characters

Tags:

python

numpy

I have the following numpy array and previous set up that has a word queue and a temporary variable 'temp' to store a word. This word needs to be "put", letter by letter, into the numpy 2d array:

from collections import deque
import numpy as np 
message=input("Write a message:")
wordqueue=message.split()
queue=deque(wordqueue)
print(wordqueue)

for i in range(1):
  temp=wordqueue.pop(0) #store the removed item in the temporary variable 'temp'
print(wordqueue)
print(temp)
display = np.zeros((4,10)) #create a 2d array that is to store the words from the queue
print(display)
display[0, 0] = temp #add the word from the temp variable to fill the array (each character in each sequential position in the array)
print(display)

Unfortunately, the output is as follows:

Write a message: This is a message for the display array
['This', 'is', 'a', 'message', 'for', 'the', 'display', 'array']
['is', 'a', 'message', 'for', 'the', 'display', 'array']
This
[[ 0.  0.  0.  0.  0.  0.  0.  0.  0.  0.]
 [ 0.  0.  0.  0.  0.  0.  0.  0.  0.  0.]
 [ 0.  0.  0.  0.  0.  0.  0.  0.  0.  0.]
 [ 0.  0.  0.  0.  0.  0.  0.  0.  0.  0.]]
Traceback (most recent call last):
  File "python", line 20, in <module>
ValueError: could not convert string to float: 'This'

I did try to define the 2d array and define the data type, but that too wasn't very obvious and I kept getting various errors.

What I would like help with is the following: 1. Ideally, I would like the numpy array to be set up with "*"s instead of zeros/ones (the documentation didn't help with this setup). 2. Replace the *s in the array with the temp variable. One letter for each *

EXAMPLE:

Display array: (4 x 20)

* * * * * * * * * * * * * * * * * * * *
* * * * * * * * * * * * * * * * * * * *
* * * * * * * * * * * * * * * * * * * *
* * * * * * * * * * * * * * * * * * * *

Enter Message: This is a test message temp: This

Updated display would show:

t h i s * * * * * * * * * * * * * * * *
* * * * * * * * * * * * * * * * * * * *
* * * * * * * * * * * * * * * * * * * *
* * * * * * * * * * * * * * * * * * * *

for subsequent words, it would then populate the array (truncating if the word was too big and going to the next line if necessary)

So far: https://repl.it/IcJ3/7

I tried this, for instance, to create the char array:

display = np.chararray((4,10)) #create a 2d array that is to store the letters in the words from the queue
display[:]="*"

but it produced this, with the erroneous "b" inserted. Cannot see why ...

[[b'*' b'*' b'*' b'*' b'*' b'*' b'*' b'*' b'*' b'*']
 [b'*' b'*' b'*' b'*' b'*' b'*' b'*' b'*' b'*' b'*']
 [b'*' b'*' b'*' b'*' b'*' b'*' b'*' b'*' b'*' b'*']
 [b'*' b'*' b'*' b'*' b'*' b'*' b'*' b'*' b'*' b'*']]

Updated (working on )repl.it here: https://repl.it/IcJ3/8

like image 439
Compoot Avatar asked Sep 18 '25 22:09

Compoot


1 Answers

First thing is first, if you want a "character" array, you have to be careful with what exactly you expect. In Python 3, strings are now sequences of unicode code points. In Python 2, strings were the classic "sequence of bytes" strings from languages like C. This means, that from a memory pov, unicode types can be much more memory intensive:

In [1]: import numpy as np

In [2]: chararray = np.zeros((4,10), dtype='S1')

In [3]: unicodearray =  np.zeros((4,10), dtype='U1')

In [4]: chararray.itemsize, unicodearray.itemsize
Out[4]: (1, 4)

In [5]: chararray.nbytes
Out[5]: 40

In [6]: unicodearray.nbytes
Out[6]: 160

So if you know you want to work only with ascii-characters, you can use the S1 dtype cut your memory use to 1/4th. Also note, that since S1 in Python 3 actually corresponds to the bytes data type (which is equilvaent to Python 2 str), the representation is prepended by a b, so b'this is a bytes object':

In [7]: chararray
Out[7]:
array([[b'', b'', b'', b'', b'', b'', b'', b'', b'', b''],
       [b'', b'', b'', b'', b'', b'', b'', b'', b'', b''],
       [b'', b'', b'', b'', b'', b'', b'', b'', b'', b''],
       [b'', b'', b'', b'', b'', b'', b'', b'', b'', b'']],
      dtype='|S1')

In [8]: unicodearray
Out[8]:
array([['', '', '', '', '', '', '', '', '', ''],
       ['', '', '', '', '', '', '', '', '', ''],
       ['', '', '', '', '', '', '', '', '', ''],
       ['', '', '', '', '', '', '', '', '', '']],
      dtype='<U1')

Now, suppose you have some payload you want to assign a message to your array. If your message consists of characters that are representable as ascii, then you can play fast and loose with the dtype:

In [15]: message = 'This'

In [16]: unicodearray.reshape(-1)[:len(message)] = list(message)

In [17]: unicodearray
Out[17]:
array(['T', 'h', 'i', 's', '', '', '', '', '', '', '', '', '', '', '', '',
       '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '',
       '', '', '', '', '', '', ''],
      dtype='<U1')

In [18]: chararray.reshape(-1)[:len(message)] = list(message)

In [19]: chararray
Out[19]:
array([[b'T', b'h', b'i', b's', b'', b'', b'', b'', b'', b''],
       [b'', b'', b'', b'', b'', b'', b'', b'', b'', b''],
       [b'', b'', b'', b'', b'', b'', b'', b'', b'', b''],
       [b'', b'', b'', b'', b'', b'', b'', b'', b'', b'']],
      dtype='|S1')

However, if that is not the case:

In [22]: message = "กขฃคฅฆงจฉ"

In [23]: len(message)
Out[23]: 9

In [24]: unicodearray.reshape(-1)[:len(message)] = list(message)

In [25]: unicodearray
Out[25]:
array(['ก', 'ข', 'ฃ', 'ค', 'ฅ', 'ฆ', 'ง', 'จ', 'ฉ', '', '', '', '', '', '',
       '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '',
       '', '', '', '', '', '', '', ''],
      dtype='<U1')

In [26]: chararray.reshape(-1)[:len(message)] = list(message)
---------------------------------------------------------------------------
UnicodeEncodeError                        Traceback (most recent call last)
<ipython-input-26-7d7cdb93de1f> in <module>()
----> 1 chararray.reshape(-1)[:len(message)] = list(message)

UnicodeEncodeError: 'ascii' codec can't encode character '\u0e01' in position 0: ordinal not in range(128)

In [27]:

Note, if you want to initialize the array with an element other than what it defaults to with np.zeros you can use np.full:

In [27]: chararray = np.full((4,10), '*', dtype='S1')

In [28]: chararray
Out[28]:
array([[b'*', b'*', b'*', b'*', b'*', b'*', b'*', b'*', b'*', b'*'],
       [b'*', b'*', b'*', b'*', b'*', b'*', b'*', b'*', b'*', b'*'],
       [b'*', b'*', b'*', b'*', b'*', b'*', b'*', b'*', b'*', b'*'],
       [b'*', b'*', b'*', b'*', b'*', b'*', b'*', b'*', b'*', b'*']],
      dtype='|S1')

Finally, to do this long-form with for-loops:

In [17]: temp = "a test"

In [18]: display = np.full((4,10), '*', dtype='U1')

In [19]: display
Out[19]:
array([['*', '*', '*', '*', '*', '*', '*', '*', '*', '*'],
       ['*', '*', '*', '*', '*', '*', '*', '*', '*', '*'],
       ['*', '*', '*', '*', '*', '*', '*', '*', '*', '*'],
       ['*', '*', '*', '*', '*', '*', '*', '*', '*', '*']],
      dtype='<U1')

In [20]: it = iter(temp) # give us a single-pass iterator
    ...: for i in range(display.shape[0]):
    ...:     for j, c in zip(range(display.shape[1]), it):
    ...:         display[i, j] = c
    ...:

In [21]: display
Out[21]:
array([['a', ' ', 't', 'e', 's', 't', '*', '*', '*', '*'],
       ['*', '*', '*', '*', '*', '*', '*', '*', '*', '*'],
       ['*', '*', '*', '*', '*', '*', '*', '*', '*', '*'],
       ['*', '*', '*', '*', '*', '*', '*', '*', '*', '*']],
      dtype='<U1')

An another test for good measure, that spans rows:

In [36]: temp = "this is a test, a test this is"

In [37]: display = np.full((4,10), '*', dtype='U1')

In [38]: it = iter(temp) # give us a single-pass iterator
    ...: for i in range(display.shape[0]):
    ...:     for j, c in zip(range(display.shape[1]), it):
    ...:         display[i, j] = c
    ...:

In [39]: display
Out[39]:
array([['t', 'h', 'i', 's', ' ', 'i', 's', ' ', 'a', ' '],
       ['t', 'e', 's', 't', ',', ' ', 'a', ' ', 't', 'e'],
       ['s', 't', ' ', 't', 'h', 'i', 's', ' ', 'i', 's'],
       ['*', '*', '*', '*', '*', '*', '*', '*', '*', '*']],
      dtype='<U1')

Warning The order of the arguments passed to zip matters, since it is a single-pass iterator:

zip(range(display.shape[1]), it)

It should be the last argument, or else it will skip characters between rows!

Finally, note that numpy provides a convenience function for iterating over arrays sequentially:

In [49]: temp = "this is yet another test"

In [50]: display = np.full((4,10), '*', dtype='U1')

In [51]: for c, x in zip(temp, np.nditer(display, op_flags=['readwrite'])):
    ...:     x[...] = c
    ...:

In [52]: display
Out[52]:
array([['t', 'h', 'i', 's', ' ', 'i', 's', ' ', 'y', 'e'],
       ['t', ' ', 'a', 'n', 'o', 't', 'h', 'e', 'r', ' '],
       ['t', 'e', 's', 't', '*', '*', '*', '*', '*', '*'],
       ['*', '*', '*', '*', '*', '*', '*', '*', '*', '*']],
      dtype='<U1')

There is a minor complication of having to pass a op_flags=['readwrite'] to the function to make sure that the iterator that is returned allows modification to the underlying array, but it greatly simplifies the code and we don't need to use a single-pass iterator. I still prefer slice assignment, though.

like image 90
juanpa.arrivillaga Avatar answered Sep 20 '25 10:09

juanpa.arrivillaga