How to read binary files in Python using NumPy?

Tags:

I know how to read binary files in Python using NumPy's np.fromfile() function. The issue I'm faced with is that when I do so, the array has exceedingly large numbers of the order of 10^100 or so, with random nan and inf values.

I need to apply machine learning algorithms to this dataset and I cannot work with this data. I cannot normalise the dataset because of the nan values.

I've tried np.nan_to_num() but that doesn't seem to work. After doing so, my min and max values range from 3e-38 and 3e+38 respectively, so I could not normalize it.

Is there any way to scale this data down? If not, how should I deal with this?

Thank you.

EDIT:

Some context. I'm working on a malware classification problem. My dataset consists of live malware binaries. They are files of the type .exe, .apk etc. My idea is store these binaries as a numpy array, convert to a grayscale image and then perform pattern analysis on it.

765

asked Sep 29 '16 05:09

Suyash Shetty

1 Answers

If you want to make an image out of a binary file, you need to read it in as integer, not float. Currently, the most common format for images is unsigned 8-bit integers.

As an example, let's make an image out of the first 10,000 bytes of /bin/bash:

>>> import numpy as np
>>> import cv2
>>> xbash = np.fromfile('/bin/bash', dtype='uint8')
>>> xbash.shape
(1086744,)
>>> cv2.imwrite('bash1.png', xbash[:10000].reshape(100,100))

In the above, we used the OpenCV library to write the integers to a PNG file. Any of several other imaging libraries could have been used.

This what the first 10,000 bytes of bash "looks" like:

enter image description here

190

answered Sep 20 '22 19:09

John1024

Related questions
                            
                                Use Line2D to plot line in python
                            
                                Read specific bytes of file in python
                            
                                How to combine hash codes in in Python3?
                            
                                How to use the user_passes_test decorator in class based views?
                            
                                pymongo sorting by date
                            
                                Seaborn tsplot does not show datetimes on x axis well
                            
                                How to show date and time on x axis in matplotlib
                            
                                Mean over multiple axis in NumPy
                            
                                Theano config directly in script
                            
                                Best way to force values to uppercase in sqlalchemy field
                            
                                Can't use Jupyter Notebook: jsonschema apparently missing
                            
                                List on python appending always the same value [duplicate]
                            
                                Declare a static variable in an enum class
                            
                                Graph-tool surprisingly slow compared to Networkx
                            
                                Possible to make custom string literal prefixes in Python?
                            
                                how can I parse json with a single line python command?
                            
                                Python pretty print dictionary of lists, abbreviate long lists
                            
                                Retrieve company name with ticker symbol input, yahoo or google API
                            
                                Pandas replace with default value
                            
                                How to change PyCharms docstring autocomplete?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

How to read binary files in Python using NumPy?

Tags:

python

machine-learning

numpy

data-mining

Suyash Shetty

People also ask

1 Answers

John1024

Recent Activity

Donate For Us