Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How can I write and read a DataFrame that contains a column of datetime in Julia

2nd UPDATE: Confirmed as a bug by user @Matt B. See his answer below for more detail.

UPDATE: @waTeim has demonstrated that one can write and read a DataFrame that contains a column of type date (confirmed on my setup). This is important, as it means Julia can write and read some composite types that are in the column of a data-frame. However, the case of a type datetime (which is different to type date) still throws an error, so at this point the question remains unanswered.

In Julia, using the HDF5 and JLD package, it is possible to save and load DataFrames in a .jld file using, for example:

#Preamble
using HDF, JLD, DataFrames
filePath = "/home/colin/Test.jld";

#Save the data-frame
fid1 = jldopen(FP, "w");
write(fid1, "MyDataFrame", MyDataFrame);
close(fid1);

#Come back later and load the data-frame
fid1 = jldopen(FP, "r");
X = read(fid1, "MyDataFrame");
close(fid1);

This works nicely, as long as the columns of the data-frame are all vectors of a base Julia type like Float64 or Int64. However, in practice, we will often want the first column of a data-frame to be a datetime, which is not a base type (although might become one in future releases). In this situation, the code above fails for me on the read operation, with a long error message (I'll add it to the bottom if anyone asks in the comments). Following the documentation for the JLD package, I tried the following when saving:

#Save the data-frame
fid1 = jldopen(FP, "w");
addrequire(fid1, "/home/colin/.julia/v0.2/DataFrames/src/dataframe.jl")
addrequire(fid1, "/home/colin/.julia/v0.2/Datetime/src/Datetime.jl")
write(fid1, "MyDataFrame", MyDataFrame);
close(fid1);

but this did not help.

Am I doing something stupid, or is this functionality simply not available?

Note: HDF5 tag included because the JLD package uses it.

like image 827
Colin T Bowers Avatar asked Jul 18 '14 08:07

Colin T Bowers


1 Answers

When HDF5 support for a particular Julia datatype is lacking then one can expect this error. In this case it was not specifically DataFrames using Datetime, but lack of support for the type Datetime itself. Apparently when the library is unable to load the type for whatever reason (see here and here too for other examples). The exact reason and fix were different for each type, but reporting the bug led to prompt fixes (see below).

The error

HDF5-DIAG: Error detected in HDF5 (1.8.11) thread 0:
  #000: H5Dio.c line 182 in H5Dread(): can't read data
    major: Dataset
    minor: Read failed
  #001: H5Dio.c line 438 in H5D__read(): unable to set up type info
    major: Dataset
    minor: Unable to initialize object
  #002: H5Dio.c line 939 in H5D__typeinfo_init(): unable to convert between src and dest datatype
    major: Dataset
    minor: Feature is unsupported
  #003: H5T.c line 4525 in H5T_path_find(): no appropriate function for conversion path
    major: Datatype
    minor: Unable to initialize object

Historical

Version 0.2.25

I would suggest that you migrate to Julia version 0.3 as it's at release candidate status now and update your package repository. My setup is different; I am using different versions of HDF5, JLD, DataFrames, and Datetime. But that being said, the two significant changes I made were to simply indicate the module name instead of the filename in the call to addrequire and also use the @read and @write macros rather than the corresponding functions as the latter seem to be buggy.

Version 0.3.0-rc1+4263 (2014-07-19 02:59 UTC)

Pkg.status()
- DataFrames                    0.5.7
- HDF5                          0.2.25
- Datetime                      0.1.6

Create the datafile

using HDF5,JLD,DataFrames,Datetime

testFile = jldopen("test.jld","w")
addrequire(testFile,"DataFrames")
addrequire(testFile,"Datetime")
df = DataFrame()
df[:column1] = today() 
@write testFile df
close(testFile)

Restarting Julia and reading....

julia> using HDF5,JLD,DataFrames,Datetime

julia> testFile = jldopen("test.jld","r")
Julia data file version 0.0.2: test.jld

julia> @read testFile df
1x1 DataFrame
|-------|------------|
| Row # | column1    |
| 1     | 2014-07-19 |

julia> df[:column1]
 1-element DataArray{Date{ISOCalendar},1}:
 2014-07-19

Version 0.2.25+ (prerelease)

Indeed I can confirm that trying to store Datetime was failing and using the latest from the repo fixes the problem.

 HDF5                          0.2.25+            master

if the above is modified only by changing today() to now()

df[:column1] = now()

Then the following

julia> using HDF5,JLD,DataFrames,Datetime

julia> testFile = jldopen("test.jld","r")
Julia data file version 0.0.2: test.jld

julia> @read testFile df
1x1 DataFrame
|-------|-------------------------|
| Row # | column1                 |
| 1     | 2014-07-26T03:38:45 UTC |

But it appears that the same general looking error message that was occurring for Datetime also happens for type complex despite this fix.

c = 1 + im;
@write testFile c

Version 0.2.26

By this version complex was also supported. Originally it appeared that the problem was lack for support for type complex generally, but it was more likely a special problem of complex being initialized from 1 + im; rather than 1.0 + im.

- HDF5                          0.2.26

julia> using HDF5, JLD

julia> testFile = jldopen("test.jld","r")
Julia data file version 0.0.2: test.jld

julia> @read testFile c
1 + 1im
like image 169
waTeim Avatar answered Jan 01 '23 18:01

waTeim