Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Convert long string to data.frame

This is a noob question, but I am getting crazy on this. I have a character vector named bars.list which I download from a FTP Sever. The vector looks like this:

"\"\",\"times\",\"open\",\"high\",\"low\",\"close\",\"numEvents\",\"volume\"\r\n\"1\",2015-05-18 06:50:00,23.98,23.98,23.5,23.77,421,0\r\n\"2\",2015-05-18 07:50:00,23.77,23.9,23.34,23.6,720,0\r\n\"3\",2015-05-18 08:50:00,23.6,23.6,23.32,23.42,720,0\r\n\"4\",2015-05-18 09:50:00,23.44,23.91,23.43,23.66,720,0\r\n\"5\",2015-05-18 10:50:00,23.67,24.06,23.59,24.02,720,0\r\n\"6\",2015-05-18 11:50:00,24.02,24.04,23.32,23.33,720,0\r\n\"7\",2015-05-18 12:50:00,23.33,23.42,22.74,22.81,720,0\r\n\"8\",2015-05-18 13:50:00,22.79,22.92,22.49,22.69,720,0\r\n\"9\",2015-05-18 14:50:00,22.69,22.7,22.14,22.14,481,0\r\n\"10\",2015-05-19 06:50:00,21.09,21.49,20.82,21.47,421,0\r\n\"11\",2015-05-19 07:50:00,21.48,21.68,21.46,21.51,720,0\r\n\"12\",2015-05-19 08:50:00,21.51,21.93,21.45,21.92,720,0\r\n\"13\",2015-05-19 09:50:00,21.92,21.92,21.55,21.55,720,0\r\n\"

I need to have this vector converted into usable format but

> read.table(bars.list, header = TRUE, sep = ",", quote = "", dec = ".")
Error in file(file, "rt") : cannot open the connection
In addition: Warning message:
In file(file, "rt") :
  cannot open file '"","times","open","high","low","close","numEvents","volume"
"1",2015-05-18 06:50:00,23.98,23.98,23.5,23.77,421,0
"2",2015-05-18 07:50:00,23.77,23.9,23.34,23.6,720,0
"3",2015-05-18 08:50:00,23.6,23.6,23.32,23.42,720,0
"4",2015-05-18 09:50:00,23.44,23.91,23.43,23.66,720,0

It is not clear to me why R tells me that some Connection cannot be opened as the object is already pasted as an argument to the function. The output R Shows me with a warning sign is already pretty close to what I need...

like image 894
Pat Avatar asked Nov 29 '15 18:11

Pat


1 Answers

Here are two options. The first one offers a fix to your current code, and the second looks at a simpler more efficient alternative.

Option 1: The first argument in read.table() is file. You are reading from a vector not from a file, so you need to use the text argument, with text = bars.list.

Also, we can get rid of all the quotes with gsub() first and then use read.csv() instead of read.table() since header = TRUE and sep = "," are the defaults there.

read.csv(text = gsub("\"", "", bars.list), row.names = 1)
#                  times  open  high   low close numEvents volume
# 1  2015-05-18 06:50:00 23.98 23.98 23.50 23.77       421      0
# 2  2015-05-18 07:50:00 23.77 23.90 23.34 23.60       720      0
# 3  2015-05-18 08:50:00 23.60 23.60 23.32 23.42       720      0
# 4  2015-05-18 09:50:00 23.44 23.91 23.43 23.66       720      0
# 5  2015-05-18 10:50:00 23.67 24.06 23.59 24.02       720      0
# 6  2015-05-18 11:50:00 24.02 24.04 23.32 23.33       720      0
# 7  2015-05-18 12:50:00 23.33 23.42 22.74 22.81       720      0
# 8  2015-05-18 13:50:00 22.79 22.92 22.49 22.69       720      0
# 9  2015-05-18 14:50:00 22.69 22.70 22.14 22.14       481      0
# 10 2015-05-19 06:50:00 21.09 21.49 20.82 21.47       421      0
# 11 2015-05-19 07:50:00 21.48 21.68 21.46 21.51       720      0
# 12 2015-05-19 08:50:00 21.51 21.93 21.45 21.92       720      0
# 13 2015-05-19 09:50:00 21.92 21.92 21.55 21.55       720      0

For me this worked better than using the quote argument in read.csv().

Option 2: fread() from the data.table package works nicely too. It is faster and the code is cleaner. No need to use gsub() with it. We can put bars.list in directly and drop the first column.

data.table::fread(bars.list, drop = 1)

Now, you will get a warning with this method due to the final \" quote. You can either live with it or get a warning-less result by removing that last quote mark.

data.table::fread(sub("\"$", "", bars.list), drop = 1)

Data:

bars.list <- "\"\",\"times\",\"open\",\"high\",\"low\",\"close\",\"numEvents\",\"volume\"\r\n\"1\",2015-05-18 06:50:00,23.98,23.98,23.5,23.77,421,0\r\n\"2\",2015-05-18 07:50:00,23.77,23.9,23.34,23.6,720,0\r\n\"3\",2015-05-18 08:50:00,23.6,23.6,23.32,23.42,720,0\r\n\"4\",2015-05-18 09:50:00,23.44,23.91,23.43,23.66,720,0\r\n\"5\",2015-05-18 10:50:00,23.67,24.06,23.59,24.02,720,0\r\n\"6\",2015-05-18 11:50:00,24.02,24.04,23.32,23.33,720,0\r\n\"7\",2015-05-18 12:50:00,23.33,23.42,22.74,22.81,720,0\r\n\"8\",2015-05-18 13:50:00,22.79,22.92,22.49,22.69,720,0\r\n\"9\",2015-05-18 14:50:00,22.69,22.7,22.14,22.14,481,0\r\n\"10\",2015-05-19 06:50:00,21.09,21.49,20.82,21.47,421,0\r\n\"11\",2015-05-19 07:50:00,21.48,21.68,21.46,21.51,720,0\r\n\"12\",2015-05-19 08:50:00,21.51,21.93,21.45,21.92,720,0\r\n\"13\",2015-05-19 09:50:00,21.92,21.92,21.55,21.55,720,0\r\n\""
like image 83
Rich Scriven Avatar answered Oct 22 '22 19:10

Rich Scriven