Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Filling missing values using numpy.genfromtxt

Despite the advice from the previous questions:

-9999 as missing value with numpy.genfromtxt()

Using genfromtxt to import csv data with missing values in numpy

I still am unable to process a text file that ends with a missing value,

a.txt:

1 2 3
4 5 6
7 8

I've tried multiple arrangements of options of missing_values, filling_values and can not get this to work:

import numpy as np

sol = np.genfromtxt("a.txt", 
                    dtype=float,
                    invalid_raise=False, 
                    missing_values=None,
                    usemask=True,
                    filling_values=0.0)
print sol

What I would like to get is:

[[1.0 2.0 3.0]
 [4.0 5.0 6.0]
 [7.0 8.0 0.0]]

but instead I get:

/usr/local/lib/python2.7/dist-packages/numpy/lib/npyio.py:1641: ConversionWarning: Some errors were detected !
    Line #3 (got 2 columns instead of 3)
  warnings.warn(errmsg, ConversionWarning)
[[1.0 2.0 3.0]
 [4.0 5.0 6.0]]
like image 517
Hooked Avatar asked Jun 25 '13 20:06

Hooked


People also ask

What can be used to fill missing values in CSV with Numpy?

Using genfromtxt to import csv data with missing values in numpy.

How does Numpy Genfromtxt work?

genfromtxt() function. The genfromtxt() used to load data from a text file, with missing values handled as specified. Each line past the first skip_header lines is split at the delimiter character, and characters following the comments character are discarded.

What is the use of Genfromtxt?

The genfromtxt() function is used to load data in a program from a text file. It takes multiple argument values to clean the data of the text file. It also has the ability to deal with missing or null values through the processes of filtering, removing, and replacing.

What is NP Recfromcsv?

The default first line of a csv file contains the field names. The function recfromcsv invoke genfromtxt with parameters names=True as default. It means that it read the first line of the data as the header. Definition: http://docs.scipy.org/doc/numpy/reference/generated/numpy.genfromtxt.html.


1 Answers

Using pandas:

import pandas as pd

df = pd.read_table('data', sep='\s+', header=None)
df.fillna(0, inplace=True)
print(df)
#    0  1  2
# 0  1  2  3
# 1  4  5  6
# 2  7  8  0

pandas.read_table replaces missing data with NaNs. You can replace those NaNs with some other value using df.fillna.

df is a pandas.DataFrame. You can access the underlying NumPy array with df.values:

print(df.values)
# [[ 1.  2.  3.]
#  [ 4.  5.  6.]
#  [ 7.  8.  0.]]
like image 114
unutbu Avatar answered Oct 04 '22 12:10

unutbu