I have a process in R that creates a bunch of objects, serializes them, and puts them into plain text files. This seemed like a really good way to handle things since I am working with Hadoop and all output needs to stream through stdin and stdout. The problem I am left with is how to read these objects out of the text file and back into R on my desktop machine. Here's a working example that illustrates the challenge: Let's create a tmp file and write a single object into it. This object is just a vector: <pre class="prettyprint"><code>outCon <- file("c:/tmp", "w") mychars <- rawToChar(serialize(1:10, NULL, ascii=T)) cat(mychars, file=outCon) close(outCon) </code></pre> The mychars object looks like this: <pre class="prettyprint"><code>> mychars [1] "A\n2\n133633\n131840\n13\n10\n1\n2\n3\n4\n5\n6\n7\n8\n9\n10\n" </code></pre> when written to the text file it looks like this: <pre class="prettyprint"><code>A 2 133633 131840 13 10 1 2 3 4 5 6 7 8 9 10 </code></pre> I'm probably overlooking something terribly obvious, but how do I read this file back into R and unserialize the object? When I try scan() or readLines() both want to treat the new line characters as record delimiters and I end up with a vector where each element is a row from the text file. What I really want is a text string with the whole contents of the file. Then I can unserialize the string. Perl will read line breaks back into a string, but I can't figure out how to override the way R treats line breaks.

JD, we do that in the digest package via <code>serialize()</code> to/from <code>raw</code>. That is nice as you can store serialized objects in SQL and other places. I would actually store this as RData as well which is way quicker to <code>load()</code> (no parsing!) and <code>save()</code>. Or, if it has to be <code>RawToChar()</code> and ascii then use something like this (taken straight from <code>help(digest)</code> where we compare serialization of the file COPYING: <pre class="prettyprint"><code> # test 'length' parameter and file input fname <- file.path(R.home(),"COPYING") x <- readChar(fname, file.info(fname)$size) # read file for (alg in c("sha1", "md5", "crc32")) { # partial file h1 <- digest(x , length=18000, algo=alg, serialize=FALSE) h2 <- digest(fname, length=18000, algo=alg, serialize=FALSE, file=TRUE) h3 <- digest( substr(x,1,18000) , algo=alg, serialize=FALSE) stopifnot( identical(h1,h2), identical(h1,h3) ) # whole file h1 <- digest(x , algo=alg, serialize=FALSE) h2 <- digest(fname, algo=alg, serialize=FALSE, file=TRUE) stopifnot( identical(h1,h2) ) } </code></pre> so with that your example becomes this: <pre class="prettyprint"><code>R> outCon <- file("/tmp/jd.txt", "w") R> mychars <- rawToChar(serialize(1:10, NULL, ascii=T)) R> cat(mychars, file=outCon); close(outCon) R> fname <- "/tmp/jd.txt" R> readChar(fname, file.info(fname)$size) [1] "A\n2\n133633\n131840\n13\n10\n1\n2\n3\n4\n5\n6\n7\n8\n9\n10\n" R> unserialize(charToRaw(readChar(fname, file.info(fname)$size))) [1] 1 2 3 4 5 6 7 8 9 10 R> </code></pre>

R: serialize objects to text file and back again

Tags:

serialization

r

I have a process in R that creates a bunch of objects, serializes them, and puts them into plain text files. This seemed like a really good way to handle things since I am working with Hadoop and all output needs to stream through stdin and stdout.

The problem I am left with is how to read these objects out of the text file and back into R on my desktop machine. Here's a working example that illustrates the challenge:

Let's create a tmp file and write a single object into it. This object is just a vector:

outCon <- file("c:/tmp", "w")
mychars <- rawToChar(serialize(1:10, NULL, ascii=T))
cat(mychars, file=outCon)
close(outCon)

The mychars object looks like this:

> mychars
[1] "A\n2\n133633\n131840\n13\n10\n1\n2\n3\n4\n5\n6\n7\n8\n9\n10\n"

when written to the text file it looks like this:

I'm probably overlooking something terribly obvious, but how do I read this file back into R and unserialize the object? When I try scan() or readLines() both want to treat the new line characters as record delimiters and I end up with a vector where each element is a row from the text file. What I really want is a text string with the whole contents of the file. Then I can unserialize the string.

Perl will read line breaks back into a string, but I can't figure out how to override the way R treats line breaks.

642

asked Feb 13 '10 18:02

JD Long

1 Answers

JD, we do that in the digest package via serialize() to/from raw. That is nice as you can store serialized objects in SQL and other places. I would actually store this as RData as well which is way quicker to load() (no parsing!) and save().

Or, if it has to be RawToChar() and ascii then use something like this (taken straight from help(digest) where we compare serialization of the file COPYING:

 # test 'length' parameter and file input
 fname <- file.path(R.home(),"COPYING")
 x <- readChar(fname, file.info(fname)$size) # read file
 for (alg in c("sha1", "md5", "crc32")) {
   # partial file
   h1 <- digest(x    , length=18000, algo=alg, serialize=FALSE)
   h2 <- digest(fname, length=18000, algo=alg, serialize=FALSE, file=TRUE)
   h3 <- digest( substr(x,1,18000) , algo=alg, serialize=FALSE)
   stopifnot( identical(h1,h2), identical(h1,h3) )
   # whole file
   h1 <- digest(x    , algo=alg, serialize=FALSE)
   h2 <- digest(fname, algo=alg, serialize=FALSE, file=TRUE)
   stopifnot( identical(h1,h2) )
 }

so with that your example becomes this:

R> outCon <- file("/tmp/jd.txt", "w")
R> mychars <- rawToChar(serialize(1:10, NULL, ascii=T))
R> cat(mychars, file=outCon); close(outCon)
R> fname <- "/tmp/jd.txt"
R> readChar(fname, file.info(fname)$size)
[1] "A\n2\n133633\n131840\n13\n10\n1\n2\n3\n4\n5\n6\n7\n8\n9\n10\n"
R> unserialize(charToRaw(readChar(fname, file.info(fname)$size)))
[1]  1  2  3  4  5  6  7  8  9 10
R>

168

answered Nov 29 '22 22:11

Dirk Eddelbuettel

Related questions
                            
                                Plotly Sankey finetuning; node alignment along x-axis, drop-off
                            
                                Is it possible to comment out part of a line in R/RStudio?
                            
                                R Shiny DataTable How to prevent row selection/deselection in columns containing hyperlinks
                            
                                Key-value mapping of axis/variable labels in ggplot
                            
                                Automatically - "Convert numbers stored as text to numbers"
                            
                                Columns not available for when training lasso model using caret
                            
                                DT Editing in Shiny application with client-side processing (server = F) throws JSON Error
                            
                                Pass a named list of models to anova.merMod
                            
                                How to check whether a vector is LIFO/FIFO decreasing
                            
                                Error in gam function in names(x) <- value: 'names' attribute must be the same length as the vector
                            
                                Reconnect to PostgreSQL database with R's pool package
                            
                                How can I pass individual `curvature` arguments in `ggplot2` `geom_curve` function?
                            
                                Is there a faster way than fread() to read big data?
                            
                                Conditionally modify ggplot theme based on presence of facets?
                            
                                How to operator join two matrix in raku-lang？
                            
                                How to write two vectors of different length into one data frame by writing same values into same row?
                            
                                Calling R script from Python does not save log file in version 4
                            
                                How to increase the width of underline drawed in legend labels in ggplot?
                            
                                Cannot fix the lack of memory problem in running "pvargmm"
                            
                                Calculating percent of row total with plyr

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With