2nd UPDATE: Confirmed as a bug by user @Matt B. See his answer below for more detail.
UPDATE: @waTeim has demonstrated that one can write and read a DataFrame that contains a column of type date (confirmed on my setup). This is important, as it means Julia can write and read some composite types that are in the column of a data-frame. However, the case of a type datetime (which is different to type date) still throws an error, so at this point the question remains unanswered.
In Julia, using the HDF5 and JLD package, it is possible to save and load DataFrames in a .jld file using, for example:
#Preamble
using HDF, JLD, DataFrames
filePath = "/home/colin/Test.jld";
#Save the data-frame
fid1 = jldopen(FP, "w");
write(fid1, "MyDataFrame", MyDataFrame);
close(fid1);
#Come back later and load the data-frame
fid1 = jldopen(FP, "r");
X = read(fid1, "MyDataFrame");
close(fid1);
This works nicely, as long as the columns of the data-frame are all vectors of a base Julia type like Float64
or Int64
. However, in practice, we will often want the first column of a data-frame to be a datetime
, which is not a base type (although might become one in future releases). In this situation, the code above fails for me on the read
operation, with a long error message (I'll add it to the bottom if anyone asks in the comments). Following the documentation for the JLD package, I tried the following when saving:
#Save the data-frame
fid1 = jldopen(FP, "w");
addrequire(fid1, "/home/colin/.julia/v0.2/DataFrames/src/dataframe.jl")
addrequire(fid1, "/home/colin/.julia/v0.2/Datetime/src/Datetime.jl")
write(fid1, "MyDataFrame", MyDataFrame);
close(fid1);
but this did not help.
Am I doing something stupid, or is this functionality simply not available?
Note: HDF5 tag included because the JLD package uses it.
When HDF5 support for a particular Julia datatype is lacking then one can expect this error. In this case it was not specifically DataFrames using Datetime, but lack of support for the type Datetime itself. Apparently when the library is unable to load the type for whatever reason (see here and here too for other examples). The exact reason and fix were different for each type, but reporting the bug led to prompt fixes (see below).
HDF5-DIAG: Error detected in HDF5 (1.8.11) thread 0:
#000: H5Dio.c line 182 in H5Dread(): can't read data
major: Dataset
minor: Read failed
#001: H5Dio.c line 438 in H5D__read(): unable to set up type info
major: Dataset
minor: Unable to initialize object
#002: H5Dio.c line 939 in H5D__typeinfo_init(): unable to convert between src and dest datatype
major: Dataset
minor: Feature is unsupported
#003: H5T.c line 4525 in H5T_path_find(): no appropriate function for conversion path
major: Datatype
minor: Unable to initialize object
I would suggest that you migrate to Julia version 0.3 as it's at release candidate status now and update your package repository. My setup is different; I am using different versions of HDF5, JLD, DataFrames, and Datetime. But that being said, the two significant changes I made were to simply indicate the module name instead of the filename in the call to addrequire and also use the @read and @write macros rather than the corresponding functions as the latter seem to be buggy.
Version 0.3.0-rc1+4263 (2014-07-19 02:59 UTC)
Pkg.status()
- DataFrames 0.5.7
- HDF5 0.2.25
- Datetime 0.1.6
Create the datafile
using HDF5,JLD,DataFrames,Datetime
testFile = jldopen("test.jld","w")
addrequire(testFile,"DataFrames")
addrequire(testFile,"Datetime")
df = DataFrame()
df[:column1] = today()
@write testFile df
close(testFile)
Restarting Julia and reading....
julia> using HDF5,JLD,DataFrames,Datetime
julia> testFile = jldopen("test.jld","r")
Julia data file version 0.0.2: test.jld
julia> @read testFile df
1x1 DataFrame
|-------|------------|
| Row # | column1 |
| 1 | 2014-07-19 |
julia> df[:column1]
1-element DataArray{Date{ISOCalendar},1}:
2014-07-19
Indeed I can confirm that trying to store Datetime was failing and using the latest from the repo fixes the problem.
HDF5 0.2.25+ master
if the above is modified only by changing today() to now()
df[:column1] = now()
Then the following
julia> using HDF5,JLD,DataFrames,Datetime
julia> testFile = jldopen("test.jld","r")
Julia data file version 0.0.2: test.jld
julia> @read testFile df
1x1 DataFrame
|-------|-------------------------|
| Row # | column1 |
| 1 | 2014-07-26T03:38:45 UTC |
But it appears that the same general looking error message that was occurring for Datetime also happens for type complex despite this fix.
c = 1 + im;
@write testFile c
By this version complex was also supported. Originally it appeared that the problem was lack for support for type complex generally, but it was more likely a special problem of complex being initialized from 1 + im; rather than 1.0 + im.
- HDF5 0.2.26
julia> using HDF5, JLD
julia> testFile = jldopen("test.jld","r")
Julia data file version 0.0.2: test.jld
julia> @read testFile c
1 + 1im
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With