How to preserve column names while importing data using numpy?

Question

I am using the numpy library in Python to import CSV file data into a ndarray as follows:

data = np.genfromtxt('mydata.csv', 
                     delimiter='\,', dtype=None, names=True)

The result provides the following column names:

print(data.dtype.names)

('row_label',
 'MyDataColumn1_0',
 'MyDataColumn1_1')

The original column names are:

row_label, My-Data-Column-1.0, My-Data-Column-1.1

It appears that NumPy is forcing my column names to adopt C-style variable name formatting. Yet there are many cases where my Python scripts require access to columns according to column name, so I need to ensure that column names remain constant. To accomplish this either NumPy needs to preserve the original column names or else I need to convert my column names to the format NumPy is using.

Is there a way to preserve the original column names during import?
If not, is there an easy way to convert column labels to use the format NumPy is using, preferably using some NumPy function?

askewchan · Accepted Answer

if you set names=True, then the first line of your data file is passed through this function:

validate_names = NameValidator(excludelist=excludelist,
                               deletechars=deletechars,
                               case_sensitive=case_sensitive,
                               replace_space=replace_space)

These are those options that you can supply:

excludelist : sequence, optional
    A list of names to exclude. This list is appended to the default list
    ['return','file','print']. Excluded names are appended an underscore:
    for example, `file` would become `file_`.
deletechars : str, optional
    A string combining invalid characters that must be deleted from the
    names.
defaultfmt : str, optional
    A format used to define default field names, such as "f%i" or "f_%02i".
autostrip : bool, optional
    Whether to automatically strip white spaces from the variables.
replace_space : char, optional
    Character(s) used in replacement of white spaces in the variables
    names. By default, use a '_'.

Perhaps you could try to supply your own deletechars string that is an empty string. But you'd be better off modifying and passing this:

defaultdeletechars = set("""~!@#$%^&*()-=+~\|]}[{';: /?.>,<""")

Just take out the period and minus sign from that set, and pass it as:

np.genfromtxt(..., names=True, deletechars="""~!@#$%^&*()=+~\|]}[{';: /?>,<""")

Here's the source: https://github.com/numpy/numpy/blob/master/numpy/lib/_iotools.py#l245

How to preserve column names while importing data using numpy?

Tags:

python

numpy

holocronweaver

1 Answers

askewchan

Recent Activity

Donate For Us

How to preserve column names while importing data using numpy?

Tags:

python

numpy

holocronweaver

1 Answers

askewchan

Related questions

Recent Activity

Donate For Us