Background I tried to replace some <code>CSV</code> output files with <code>rds</code> files to improve efficiency. These are intermediate files that will serve as inputs to other R scripts. Question I started investigating when my scripts failed and found that <code>readRDS()</code> and <code>load()</code> do not return identical <code>data tables</code> as the original. Is this supposed to happen? Or did I miss something? Sample code <pre class="prettyprint"><code>library( data.table ) aDT <- data.table( a=1:10, b=LETTERS[1:10] ) saveRDS( aDT, file = "aDT.rds") bDT <- readRDS( file = "aDT.rds" ) identical( aDT, bDT, ignore.environment = T ) # Gives 'False' aDF <- data.frame( a=1:10, b=LETTERS[1:10] ) saveRDS( aDF, file = "aDF.rds") bDF <- readRDS( file = "aDF.rds" ) identical( aDF, bDF, ignore.environment = T ) # Gives 'True' # Using 'save'& 'load' doesn't help either aDT2 <- data.table( a=1:10, b=LETTERS[1:10] ) save( aDT2, file = "aDT2.RData") bDT2 <- aDT2; rm( aDT2 ) load( file = "aDT2.RData" ) identical( aDT2, bDT2, ignore.environment = T ) # Gives 'False' </code></pre> I am running R ver 3.2.0 on Linux Mint and have tested with <code>data.table</code> ver 1.9.4 and 1.9.5 (latest). Searching in SO and google returned this and this but I don't think they answer this issue. I am still trying to figure out why my scripts failed when I switched to <code>rds</code> but I am starting with this. Would appreciate it very much if knowledgeable SO members can help. Thanks! Edit: Hi everyone, I happened to find a way to resolve the issue - have posted the solution below. I apologise if it's rather inelegant. Now, I have 2 further questions: (1) Is there a better way? (2) Can something be done at the <code>R</code> and/or <code>data.table</code> code to resolve this? I mean, this issue causes unpredictable bugs and is not the first thing that comes to mind. My 2 cents worth.

Probably, this has to do with pointers: <pre class="prettyprint"><code> attributes(aDT) $names [1] "a" "b" $row.names [1] 1 2 3 4 5 6 7 8 9 10 $class [1] "data.table" "data.frame" $.internal.selfref <pointer: 0x0000000000390788> > attributes(bDT) $names [1] "a" "b" $row.names [1] 1 2 3 4 5 6 7 8 9 10 $class [1] "data.table" "data.frame" $.internal.selfref <pointer: (nil)> > attributes(bDF) $names [1] "a" "b" $row.names [1] 1 2 3 4 5 6 7 8 9 10 $class [1] "data.frame" > attributes(aDF) $names [1] "a" "b" $row.names [1] 1 2 3 4 5 6 7 8 9 10 $class [1] "data.frame" </code></pre> You can closely look at what's going using <code>.Internal(inspect(.))</code> command: <pre class="prettyprint"><code>.Internal(inspect(aDT)) .Internal(inspect(bDT)) </code></pre>

The newly loaded <code>data.table</code> doesn't know the pointer value of the already loaded one. You could tell it with <pre class="prettyprint"><code>attributes(bDT)$.internal.selfref <- attributes(aDT)$.internal.selfref identical( aDT, bDT, ignore.environment = T ) # [1] TRUE </code></pre> <code>data.frame</code> don't keep this attribute, probably because they don't do in place modification.

R - readRDS() & load() fail to give identical data.tables as the original

Tags:

r

save

data.table

load

Background

I tried to replace some CSV output files with rds files to improve efficiency. These are intermediate files that will serve as inputs to other R scripts.

Question

I started investigating when my scripts failed and found that readRDS() and load() do not return identical data tables as the original. Is this supposed to happen? Or did I miss something?

Sample code

library( data.table )

aDT <- data.table( a=1:10, b=LETTERS[1:10] )
saveRDS( aDT, file = "aDT.rds")
bDT <- readRDS( file = "aDT.rds" )
identical( aDT, bDT, ignore.environment = T )  # Gives 'False'

aDF <- data.frame( a=1:10, b=LETTERS[1:10] )
saveRDS( aDF, file = "aDF.rds")
bDF <- readRDS( file = "aDF.rds" )
identical( aDF, bDF, ignore.environment = T )  # Gives 'True'

# Using 'save'& 'load' doesn't help either
aDT2 <- data.table( a=1:10, b=LETTERS[1:10] )
save( aDT2, file = "aDT2.RData")
bDT2 <- aDT2; rm( aDT2 )
load( file = "aDT2.RData" )
identical( aDT2, bDT2, ignore.environment = T )  # Gives 'False'

I am running R ver 3.2.0 on Linux Mint and have tested with data.table ver 1.9.4 and 1.9.5 (latest).

Searching in SO and google returned this and this but I don't think they answer this issue. I am still trying to figure out why my scripts failed when I switched to rds but I am starting with this.

Would appreciate it very much if knowledgeable SO members can help. Thanks!

Edit:

Hi everyone, I happened to find a way to resolve the issue - have posted the solution below. I apologise if it's rather inelegant. Now, I have 2 further questions:

(1) Is there a better way?

(2) Can something be done at the R and/or data.table code to resolve this? I mean, this issue causes unpredictable bugs and is not the first thing that comes to mind. My 2 cents worth.

386

asked Jul 06 '15 16:07

NoviceProg

2 Answers

Probably, this has to do with pointers:

 attributes(aDT)
$names
[1] "a" "b"

$row.names
 [1]  1  2  3  4  5  6  7  8  9 10

$class
[1] "data.table" "data.frame"

$.internal.selfref
<pointer: 0x0000000000390788>

> attributes(bDT)
$names
[1] "a" "b"

$row.names
 [1]  1  2  3  4  5  6  7  8  9 10

$class
[1] "data.table" "data.frame"

$.internal.selfref
<pointer: (nil)>

> attributes(bDF)
$names
[1] "a" "b"

$row.names
 [1]  1  2  3  4  5  6  7  8  9 10

$class
[1] "data.frame"

> attributes(aDF)
$names
[1] "a" "b"

$row.names
 [1]  1  2  3  4  5  6  7  8  9 10

$class
[1] "data.frame"

You can closely look at what's going using .Internal(inspect(.)) command:

.Internal(inspect(aDT))

 .Internal(inspect(bDT))

145

answered Oct 27 '22 08:10

user227710

The newly loaded data.table doesn't know the pointer value of the already loaded one. You could tell it with

attributes(bDT)$.internal.selfref <- attributes(aDT)$.internal.selfref
identical( aDT, bDT, ignore.environment = T )
# [1] TRUE

data.frame don't keep this attribute, probably because they don't do in place modification.

answered Oct 27 '22 09:10

Rorschach

Related questions
                            
                                How to programmatically generate R codes and directory structures with template
                            
                                How to quickly replicate/update local library under $R_LIBS_USER?
                            
                                Why does method inheritance kill additional arguments?
                            
                                How to use subfolders in 'src/' in R packages?
                            
                                Subsetting a large vector uses unnecessarily large amounts of memory
                            
                                R Googlsheets: Unable to use `gs_auth()` in googlesheets package - Sign In With Google Temporarily Disabled App Not Verified Issue
                            
                                Error installing tidyr on Ubuntu 18.04 & R 4.0.2
                            
                                Newman's modularity clustering for graphs
                            
                                What to do with imperfect-but-useful functions?
                            
                                Order of legend entries in ggplot2 barplots with coord_flip()
                            
                                How do you apply a function to a nested list?
                            
                                How to change the melt.data.frame function in reshape2 package returned "variable" column to "character" class?
                            
                                How do I conditionally change the aspect ratio of charts in R's Shiny package?
                            
                                Changing legend names without changing colors in ggplot2
                            
                                Why is subsetting on a "logical" type slower than subsetting on "numeric" type?
                            
                                all.equal on object with NULL names causes 'Error: not compatible with STRSXP' -- bug or expected?
                            
                                strsplit inconsistent with gregexpr
                            
                                Outputting Shiny (non-ggplot) plot to PDF
                            
                                How to use S3 methods from another package which uses export rather than S3method in its namespace without using Depends or library()
                            
                                In place modification of matrices in R [duplicate]

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With