Python: load data with comma as decimal separator

Question

enter image description here I have some very large txt fils(about 1.5 GB ) which I want to load into Python as an array. The Problem is in this data a comma is used as a decimal separator. for smaller fils I came up with this solution:

import numpy as np
data= np.loadtxt(file, dtype=np.str, delimiter='	', skiprows=1)
        data = np.char.replace(data, ',', '.')
        data = np.char.replace(data, '\'', '')
        data = np.char.replace(data, 'b', '').astype(np.float64)

But for the large fils Python runs into an Memory Error. Is there any other more memory efficient way to load this data?

Dennis Sakva · Accepted Answer

The problem with np.loadtxt(file, dtype=np.str, delimiter=' ', skiprows=1) is that it uses python objects (strings) instead of float64, which is very memory inefficient. You can use pandas read_table

http://pandas.pydata.org/pandas-docs/stable/generated/pandas.read_table.html#pandas.read_table

to read your file and set decimal=',' to change the default behaviour. This will allow for seamless reading and converting your strings into floats. After loading pandas dataframe use df.values to get a numpy array. If it's still too large for your memory use chunks

http://pandas.pydata.org/pandas-docs/stable/io.html#io-chunking

If still no luck try np.float32 format which further halves memory footprint.

Python: load data with comma as decimal separator

Tags:

python

Greg.P

1 Answers

Dennis Sakva

Recent Activity

Donate For Us

Python: load data with comma as decimal separator

Tags:

python

Greg.P

1 Answers

Dennis Sakva

Related questions

Recent Activity

Donate For Us