I am using the numpy library in Python to import CSV file data into a ndarray as follows:
data = np.genfromtxt('mydata.csv',
delimiter='\,', dtype=None, names=True)
The result provides the following column names:
print(data.dtype.names)
('row_label',
'MyDataColumn1_0',
'MyDataColumn1_1')
The original column names are:
row_label, My-Data-Column-1.0, My-Data-Column-1.1
It appears that NumPy is forcing my column names to adopt C-style variable name formatting. Yet there are many cases where my Python scripts require access to columns according to column name, so I need to ensure that column names remain constant. To accomplish this either NumPy needs to preserve the original column names or else I need to convert my column names to the format NumPy is using.
Is there a way to preserve the original column names during import?
If not, is there an easy way to convert column labels to use the format NumPy is using, preferably using some NumPy function?
if you set names=True, then the first line of your data file is passed through this function:
validate_names = NameValidator(excludelist=excludelist,
deletechars=deletechars,
case_sensitive=case_sensitive,
replace_space=replace_space)
These are those options that you can supply:
excludelist : sequence, optional
A list of names to exclude. This list is appended to the default list
['return','file','print']. Excluded names are appended an underscore:
for example, `file` would become `file_`.
deletechars : str, optional
A string combining invalid characters that must be deleted from the
names.
defaultfmt : str, optional
A format used to define default field names, such as "f%i" or "f_%02i".
autostrip : bool, optional
Whether to automatically strip white spaces from the variables.
replace_space : char, optional
Character(s) used in replacement of white spaces in the variables
names. By default, use a '_'.
Perhaps you could try to supply your own deletechars string that is an empty string. But you'd be better off modifying and passing this:
defaultdeletechars = set("""~!@#$%^&*()-=+~\|]}[{';: /?.>,<""")
Just take out the period and minus sign from that set, and pass it as:
np.genfromtxt(..., names=True, deletechars="""~!@#$%^&*()=+~\|]}[{';: /?>,<""")
Here's the source: https://github.com/numpy/numpy/blob/master/numpy/lib/_iotools.py#l245
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With