Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Loading a date in Numpy genfromtxt

I'm trying to import a simple CSV file with Numpy genfromtxt but can't manage to convert the data of first column to dates.

Here is my code:

import numpy as np
from datetime import datetime

str2date = lambda x: datetime.strptime(x, '%Y-%m-%d %H:%M:%S')

data = np.genfromtxt('C:\\\\data.csv',dtype=None,names=True, delimiter=',', converters = {0: str2date})

I get the following error in str2date:

TypeError: must be str, not bytes

The problem is there are many columns, so I'd prefer avoiding the specification of all the column types (which are basically numerical).

like image 567
Mark Morrisson Avatar asked Oct 18 '22 18:10

Mark Morrisson


2 Answers

The problem is that the argument passed to str2date is of this form b'%Y-%m-%d %H:%M:%S'. These are bytes, which rightfully cannot be parsed to a datetime object. The solution to that problem is quite simple though, as you should decode your byte string to a UTF-8 string:

str2date = lambda x: datetime.strptime(x.decode("utf-8"), '%Y-%m-%d %H:%M:%S')

like image 57
Niels Wouda Avatar answered Oct 21 '22 15:10

Niels Wouda


When we want to read in a csv file a column whose value represents a date, we must take into account how it is represented, for example:

- 2021/12/05 = %Y/%m/%d
- 21/12/05 = %y/%m/%d
- 05/12/2021 = %d/%m/%Y
- 05/12/21 = %d/%m/%y
- 05-12-21 = %d-%m-%y
- ...

These ways of representing the date must be taken into account in the creation of the lambda function that we will use as a converter in the NumPy getfromtxt() method. This method accepts several parameters and among them, we can find converters that we can use in different ways, in this case, it will be to convert the values of a column into date type values

converters variable, optional

    The set of functions that convert the data of a column to a value. The converters can also be used to provide a default value for missing data: 
     converters = {num_col: lambda_function }.

num_col - represents the number of the column to which the function will be applied

lambda_function - represents the function that we will build for the conversion

For this example, we will have two columns, date and level, separated by (;) and utf-8 coding:

date level
02-03-15 232.8
09-03-15 233.0
16-03-15 233.2
23-03-15 233.6
30-03-15 233.9
06-04-15 234.3
13-04-15 234.8
20-04-15 235.3
27-04-15 235.9

Our code should be:

import numpy as np
from datetime import datetime

str2date = lambda x: datetime.strptime(x, '%d-%m-%y')
data = np.genfromtxt(file_path, delimiter=';', dtype=None, names=True, converters = {0: str2date}, encoding='utf-8')

The variable file_path will be replaced by the directory of the file, including the name of the file and its extension.

The delimiter: str, int, or sequence, optional. The string used to separate values. By default, any consecutive whitespaces act as delimiter. An integer or sequence of integers can also be provided as width(s) of each field.

The dtype : dtype, optional. Data type of the resulting array. If None, the dtypes will be determined by the contents of each column, individually.

The names : {None, True, str, sequence}, optional. If names are True, the field names are read from the first line after the first skip_header lines. This line can optionally be proceeded by a comment delimiter. If names is a sequence or a single-string of comma-separated names, the names will be used to define the field names in a structured dtype. If names are None, the names of the dtype fields will be used, if any.

The encoding: str, optional. The encoding used to decode the input file.

To extract the data and work with it we can:

levels= data['level']
dates= data['date']
like image 31
Luis Emilio Fdez. Avatar answered Oct 21 '22 16:10

Luis Emilio Fdez.