Convert text document to numpy array of ASCII numbers in python

Question

I have a large plain text document (UTF-8) that contains letters, numbers, spaces, and special characters etc.

I want to convert all the individual characters in the text document into numbers, and then represent the document as a numpy array.

Can I use the inbuilt python ord() function for this?

My understanding is that it returns an integer representing the Unicode code point of the character, but only takes on in one character at a time and I'm wondering if there's a better way to convert a large text document to numbers.

Or can I just iterate through the entire document with the ord() function?

edit

I basically want to do something exactly like this! but natively in python https://www.browserling.com/tools/text-to-ascii

This is what I currently have

def convert_to_ascii(text):
    return ",".join(str(ord(char)) for char in text)

with open('test.txt', 'r') as myfile:
    data = myfile.read()

convert_to_ascii(data)

values = [int(i) for i in x.split(',')] 

array = np.array(values)

Is there a better way to do this?

Bill Smith · Accepted Answer

I've been working on the same issue, and came across a much simpler and faster technique, demonstrated below:

import numpy as np

text = 'abcABC00'

letter_array = np.fromiter(text, dtype='c')
letter_array.shape, letter_array.dtype

    ((8,), dtype('S1'))


ascii_array = letter_array.view(np.int8)
ascii_array.shape, ascii_array.dtype, ascii_array

    ((8,), dtype('int8'), array([97, 98, 99, 65, 66, 67, 48, 48], dtype=int8))

I included intermediate values just to show what's going on, but the production code could be reduced to a single line.

ascii_array = np.fromiter(text, dtype='c').view(np.int8)

Convert text document to numpy array of ASCII numbers in python

Tags:

python-3.x

ascii

utf-8

numpy

nlp

borkbork

1 Answers

Bill Smith

Recent Activity

Donate For Us

Convert text document to numpy array of ASCII numbers in python

Tags:

python-3.x

ascii

utf-8

numpy

nlp

borkbork

1 Answers

Bill Smith

Related questions

Recent Activity

Donate For Us