I'm trying to import a simple CSV file with Numpy genfromtxt but can't manage to convert the data of first column to dates.
Here is my code:
import numpy as np
from datetime import datetime
str2date = lambda x: datetime.strptime(x, '%Y-%m-%d %H:%M:%S')
data = np.genfromtxt('C:\\\\data.csv',dtype=None,names=True, delimiter=',', converters = {0: str2date})
I get the following error in str2date:
TypeError: must be str, not bytes
The problem is there are many columns, so I'd prefer avoiding the specification of all the column types (which are basically numerical).
The problem is that the argument passed to str2date
is of this form b'%Y-%m-%d %H:%M:%S'
. These are bytes, which rightfully cannot be parsed to a datetime object. The solution to that problem is quite simple though, as you should decode your byte string to a UTF-8
string:
str2date = lambda x: datetime.strptime(x.decode("utf-8"), '%Y-%m-%d %H:%M:%S')
When we want to read in a csv file a column whose value represents a date, we must take into account how it is represented, for example:
- 2021/12/05 = %Y/%m/%d
- 21/12/05 = %y/%m/%d
- 05/12/2021 = %d/%m/%Y
- 05/12/21 = %d/%m/%y
- 05-12-21 = %d-%m-%y
- ...
These ways of representing the date must be taken into account in the creation of the lambda function that we will use as a converter in the NumPy getfromtxt() method. This method accepts several parameters and among them, we can find converters that we can use in different ways, in this case, it will be to convert the values of a column into date type values
converters variable, optional
The set of functions that convert the data of a column to a value. The converters can also be used to provide a default value for missing data:
converters = {num_col: lambda_function }.
num_col - represents the number of the column to which the function will be applied
lambda_function - represents the function that we will build for the conversion
For this example, we will have two columns, date and level, separated by (;) and utf-8 coding:
date | level |
---|---|
02-03-15 | 232.8 |
09-03-15 | 233.0 |
16-03-15 | 233.2 |
23-03-15 | 233.6 |
30-03-15 | 233.9 |
06-04-15 | 234.3 |
13-04-15 | 234.8 |
20-04-15 | 235.3 |
27-04-15 | 235.9 |
Our code should be:
import numpy as np
from datetime import datetime
str2date = lambda x: datetime.strptime(x, '%d-%m-%y')
data = np.genfromtxt(file_path, delimiter=';', dtype=None, names=True, converters = {0: str2date}, encoding='utf-8')
The variable file_path will be replaced by the directory of the file, including the name of the file and its extension.
The delimiter: str, int, or sequence, optional. The string used to separate values. By default, any consecutive whitespaces act as delimiter. An integer or sequence of integers can also be provided as width(s) of each field.
The dtype : dtype, optional. Data type of the resulting array. If None, the dtypes will be determined by the contents of each column, individually.
The names : {None, True, str, sequence}, optional. If names are True, the field names are read from the first line after the first skip_header lines. This line can optionally be proceeded by a comment delimiter. If names is a sequence or a single-string of comma-separated names, the names will be used to define the field names in a structured dtype. If names are None, the names of the dtype fields will be used, if any.
The encoding: str, optional. The encoding used to decode the input file.
To extract the data and work with it we can:
levels= data['level']
dates= data['date']
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With