Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Single text file with multiple tables

Tags:

r

I am trying to import data from a single text file that has multiple tables. The tables vary in length, but have a common speration between each. The seperation between each table is a number followed by a character. For example,

19,EOP
1,10.,92.9144,202.1271,0,B,10-Dec-2014 11:46

2,5.,0.,153.3754,0.,,10-Dec-2014 11:52

3,5.,20380.8867,162.0626,24555.9395,,10-Dec-2014 11:58

4,5.,21941.2773,197.9289,25361.4414,,10-Dec-2014 12:04

10,EOP
1,0.98,164702.1563,179.828,0,B,10-Dec-2014 09:46

2,1.08,0.,180.6869,0.,,10-Dec-2014 09:48

3,1.07,0.,190.6853,0.,,10-Dec-2014 09:50

4,1.32,0.,163.7527,0.,,10-Dec-2014 09:52

5,1.29,0.,167.3766,0.,,10-Dec-2014 09:54

I have been trying to use the read table function, but i cannot seem to use the function to recognize the table indicator.

like image 833
MadmanLee Avatar asked Dec 11 '14 16:12

MadmanLee


2 Answers

You can try to use read.mtable from my GitHub-only "SOfun" package.

Using the sample data you shared saved in a file called "test.txt" in my present working, directory, I tried the following:

library(SOfun) ## Or just copy and paste the function for your session...
read.mtable("test.txt", chunkId = "\\d+,EOP", header = FALSE, sep = ",")
# $`19,EOP`
#   V1 V2         V3       V4       V5 V6                V7
# 1  1 10    92.9144 202.1271     0.00  B 10-Dec-2014 11:46
# 2  2  5     0.0000 153.3754     0.00    10-Dec-2014 11:52
# 3  3  5 20380.8867 162.0626 24555.94    10-Dec-2014 11:58
# 4  4  5 21941.2773 197.9289 25361.44    10-Dec-2014 12:04
# 
# $`10,EOP`
#   V1   V2       V3       V4 V5 V6                V7
# 1  1 0.98 164702.2 179.8280  0  B 10-Dec-2014 09:46
# 2  2 1.08      0.0 180.6869  0    10-Dec-2014 09:48
# 3  3 1.07      0.0 190.6853  0    10-Dec-2014 09:50
# 4  4 1.32      0.0 163.7527  0    10-Dec-2014 09:52
# 5  5 1.29      0.0 167.3766  0    10-Dec-2014 09:54

As you can see if you view the source, the function is a basic wrapper for read.table that has a few other lines to help identify the number of lines to skip with each round of read.table.


Obviously, change your "chunkID" argument to be representative of what your table names actually are :-)

like image 199
A5C1D2H2I1M1N2O1R2T1 Avatar answered Nov 12 '22 20:11

A5C1D2H2I1M1N2O1R2T1


You can't do this with any of the base R functions i know of. What you can do is read all the data in, then find the break points with a regular expression (or something else) and then parse each chunk. For example

lines <- readLines("data.csv")
group <- cumsum(grepl("^\\d+,\\w+$", lines))  #number,character

lapply(split(lines, group), function(x) read.table(text=x[-1], sep=","))

to get

$`1`
  V1 V2         V3       V4       V5 V6                V7
1  1 10    92.9144 202.1271     0.00  B 10-Dec-2014 11:46
2  2  5     0.0000 153.3754     0.00    10-Dec-2014 11:52
3  3  5 20380.8867 162.0626 24555.94    10-Dec-2014 11:58
4  4  5 21941.2773 197.9289 25361.44    10-Dec-2014 12:04

$`2`
  V1   V2       V3       V4 V5 V6                V7
1  1 0.98 164702.2 179.8280  0  B 10-Dec-2014 09:46
2  2 1.08      0.0 180.6869  0    10-Dec-2014 09:48
3  3 1.07      0.0 190.6853  0    10-Dec-2014 09:50
4  4 1.32      0.0 163.7527  0    10-Dec-2014 09:52
5  5 1.29      0.0 167.3766  0    10-Dec-2014 09:54
like image 35
MrFlick Avatar answered Nov 12 '22 19:11

MrFlick