How to extract data from a text file using R or PowerShell?

Tags:

I have a text file containing data like this:

This is just text
-------------------------------
Username:          SOMETHI           C:                 [Text]
Account:           DFAG              Finish time:        1-JAN-2011 00:31:58.91
Process ID:        2028aaB           Start time:        31-DEC-2010 20:27:15.30

This is just text
-------------------------------
Username:          SOMEGG            C:                 [Text]
Account:           DFAG              Finish time:        1-JAN-2011 00:31:58.91
Process ID:        20dd33DB          Start time:        12-DEC-2010 20:27:15.30

This is just text
-------------------------------
Username:          SOMEYY            C:                 [Text]
Account:           DFAG              Finish time:        1-JAN-2011 00:31:58.91
Process ID:        202223DB          Start time:        15-DEC-2010 20:27:15.30

Is there a way to extract Username, Finish time, Start time from this kind of data? I'm looking for some starting point usign R or Powershell.

666

asked Jan 24 '12 13:01

jrara

1 Answers

R may not be the best tool to process text files, but you can proceed as follows: identify the two columns by reading the file as a fixed-width file, separate the fields from their value by splitting the strings on the colons, add an "id" column, and put everything back in order.

# Read the file
d <- read.fwf("A.txt", c(37,100), stringsAsFactors=FALSE)

# Separate fields and values
d <- d[grep(":", d$V1),]
d <- cbind( 
  do.call( rbind, strsplit(d$V1, ":\\s+") ), 
  do.call( rbind, strsplit(d$V2, ":\\s+") ) 
)

# Add an id column
d <- cbind( d, cumsum( d[,1] == "Username" ) )

# Stack the left and right parts
d <- rbind( d[,c(5,1,2)], d[,c(5,3,4)] )
colnames(d) <- c("id", "field", "value")
d <- as.data.frame(d)
d$value <- gsub("\\s+$", "", d$value)

# Convert to a wide data.frame
library(reshape2)
d <- dcast( d, id ~ field )

196

answered Sep 20 '22 14:09

Vincent Zoonekynd

Related questions
                            
                                Should "while loops" be preferred to "for loops" for large, necessary loops in R?
                            
                                Draw hyperplane in R?
                            
                                understanding dates/times (POSIXc and POSIXct) in R
                            
                                How to merge two data.frames together in R, referencing a lookup table
                            
                                How to draw a chart with sorted horizontal error bars (sorted barcharts with error marks)?
                            
                                How would you program Pascal's triangle in R?
                            
                                How do I calculate the length of consecutive runs of events, e.g. wins, visits, in R
                            
                                Improvements to the base R graphics
                            
                                Two stage least square in R
                            
                                If an R package's licence X is, do all the content in that package have to be licenced under X? [closed]
                            
                                In R how can you tell if a string includes escape sequences?
                            
                                Regression in R -- 4 features, 4 million instances
                            
                                How to read multiple excel sheets in R programming? [closed]
                            
                                Running out of memory with merge
                            
                                read.delim() - errors "more columns than column names" and "header and ''col.names" are of different lengths"
                            
                                Negative subscripts error in R
                            
                                Cox regression output in xtable - choosing rows/columns and adding a confidence interval
                            
                                Selecting a non-contiguous submatrix in Rcpp
                            
                                How do I extract lmer fixed effects by observation?
                            
                                Milliseconds puzzle when calling strptime in R

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

How to extract data from a text file using R or PowerShell?

Tags:

powershell

r

powershell-2.0

text-processing

jrara

People also ask

1 Answers

Vincent Zoonekynd

Recent Activity

Donate For Us