Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

'b' character added when using numpy loadtxt [duplicate]

I tried to create an array from a text file. I saw earlier that numpy had a method loadtxt, so I try it, but it add some junk character before each row...

# my txt file

    .--``--.
.--`        `--.
|              |
|              |
`--.        .--`
    `--..--`

# my python v3.4 program

import numpy as np
f = open('tile', 'r')
a = np.loadtxt(f, dtype=str, delimiter='\n')
print(a)

# my print output

["b'    .--``--.    '"
 "b'.--`        `--.'"
 "b'|              |'"
 "b'|              |'"
 "b'`--.        .--`'"
 "b'    `--..--`    '"]

What are these 'b' and double quotes ? And where do they come from ? I tried some solution picked from internet, like open the file with codecs, change the dtype by 'S20', 'S11', and a lot of other things which don't work... What I expect is an array of unicode strings which look like this :

[['    .--``--.    ']
 ['.--`        `--.']
 ['|              |']
 ['|              |']
 ['`--.        .--`']
 ['    `--..--`    ']]

Info: I'm using python 3.4 and numpy from the debian stable repository

like image 841
krshk Avatar asked Nov 11 '15 16:11

krshk


People also ask

What does the function Loadtxt () do in numpy?

loadtxt() function. The loadtxt() function is used to load data from a text file. Each row in the text file must have the same number of values.

What parameter within numpy Loadtxt can be used to skip a row?

loadtxt) . You'll find the skiprows parameter will allow you to skip the first N rows: In [1]: import numpy as np In [2]: help(np. loadtxt) Help on function loadtxt in module numpy.

Does numpy save overwrite?

npy is used to store a single numpy array. npz can store multiple numpy arrays at the same time, both of which are not limited to the numpy dimension, and both maintain the shape and dtype of the numpy array. When writing a file, the original file content can only be overwritten if it exists.


1 Answers

np.loadtxt and np.genfromtxt operate in byte mode, which is the default string type in Python 2. But Python 3 uses unicode, and marks bytestrings with this b.

I tried some variations, in an python3 ipython session:

In [508]: np.loadtxt('stack33655641.txt',dtype=bytes,delimiter='\n')[0]
Out[508]: b'    .--``--.'
In [509]: np.loadtxt('stack33655641.txt',dtype=str,delimiter='\n')[0]
Out[509]: "b'    .--``--.'"
...
In [511]: np.genfromtxt('stack33655641.txt',dtype=str,delimiter='\n')[0]
Out[511]: '.--``--.'
In [512]: np.genfromtxt('stack33655641.txt',dtype=None,delimiter='\n')[0]
Out[512]: b'.--``--.'
In [513]: np.genfromtxt('stack33655641.txt',dtype=bytes,delimiter='\n')[0]
Out[513]: b'.--``--.'

genfromtxt with dtype=str gives the cleanest display - except it strips blanks. I may have to use a converter to turn that off. These functions are meant to read csv data where (white)spaces are separators, not part of the data.

loadtxt and genfromtxt are over kill for simple text like this. A plain file read does nicely:

In [527]: with open('stack33655641.txt') as f:a=f.read()
In [528]: print(a)
    .--``--.
.--`        `--.
|              |
|              |
`--.        .--`
    `--..--`

In [530]: a=a.splitlines()
In [531]: a
Out[531]: 
['    .--``--.',
 '.--`        `--.',
 '|              |',
 '|              |',
 '`--.        .--`',
 '    `--..--`']

(my text editor is set to strip trailing blanks, hence the ragged lines).


@DSM's suggestion:

In [556]: a=np.loadtxt('stack33655641.txt',dtype=bytes,delimiter='\n').astype(str)
In [557]: a
Out[557]: 
array(['    .--``--.', '.--`        `--.', '|              |',
       '|              |', '`--.        .--`', '    `--..--`'], 
      dtype='<U16')
In [558]: a.tolist()
Out[558]: 
['    .--``--.',
 '.--`        `--.',
 '|              |',
 '|              |',
 '`--.        .--`',
 '    `--..--`']
like image 54
hpaulj Avatar answered Oct 11 '22 11:10

hpaulj