I am using the numpy library in Python to import CSV
file data into a ndarray
as follows:
data = np.genfromtxt('mydata.csv',
delimiter='\,', dtype=None, names=True)
The result provides the following column names:
print(data.dtype.names)
('row_label',
'MyDataColumn1_0',
'MyDataColumn1_1')
The original column names are:
row_label, My-Data-Column-1.0, My-Data-Column-1.1
It appears that NumPy
is forcing my column names to adopt C-style variable name formatting. Yet there are many cases where my Python scripts require access to columns according to column name, so I need to ensure that column names remain constant. To accomplish this either NumPy
needs to preserve the original column names or else I need to convert my column names to the format NumPy
is using.
Is there a way to preserve the original column names during import?
If not, is there an easy way to convert column labels to use the format NumPy
is using, preferably using some NumPy
function?
if you set names=True
, then the first line of your data file is passed through this function:
validate_names = NameValidator(excludelist=excludelist,
deletechars=deletechars,
case_sensitive=case_sensitive,
replace_space=replace_space)
These are those options that you can supply:
excludelist : sequence, optional
A list of names to exclude. This list is appended to the default list
['return','file','print']. Excluded names are appended an underscore:
for example, `file` would become `file_`.
deletechars : str, optional
A string combining invalid characters that must be deleted from the
names.
defaultfmt : str, optional
A format used to define default field names, such as "f%i" or "f_%02i".
autostrip : bool, optional
Whether to automatically strip white spaces from the variables.
replace_space : char, optional
Character(s) used in replacement of white spaces in the variables
names. By default, use a '_'.
Perhaps you could try to supply your own deletechars
string that is an empty string. But you'd be better off modifying and passing this:
defaultdeletechars = set("""~!@#$%^&*()-=+~\|]}[{';: /?.>,<""")
Just take out the period and minus sign from that set, and pass it as:
np.genfromtxt(..., names=True, deletechars="""~!@#$%^&*()=+~\|]}[{';: /?>,<""")
Here's the source: https://github.com/numpy/numpy/blob/master/numpy/lib/_iotools.py#l245
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With