Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to extract data from a text file using R or PowerShell?

I have a text file containing data like this:

This is just text
-------------------------------
Username:          SOMETHI           C:                 [Text]
Account:           DFAG              Finish time:        1-JAN-2011 00:31:58.91
Process ID:        2028aaB           Start time:        31-DEC-2010 20:27:15.30

This is just text
-------------------------------
Username:          SOMEGG            C:                 [Text]
Account:           DFAG              Finish time:        1-JAN-2011 00:31:58.91
Process ID:        20dd33DB          Start time:        12-DEC-2010 20:27:15.30

This is just text
-------------------------------
Username:          SOMEYY            C:                 [Text]
Account:           DFAG              Finish time:        1-JAN-2011 00:31:58.91
Process ID:        202223DB          Start time:        15-DEC-2010 20:27:15.30

Is there a way to extract Username, Finish time, Start time from this kind of data? I'm looking for some starting point usign R or Powershell.

like image 666
jrara Avatar asked Jan 24 '12 13:01

jrara


People also ask

How do I extract data from a text file in PowerShell?

One of the easiest tasks is retrieving all text from an existing text file. For most text files, a PowerShell scripter can use the Get-Content cmdlet. The Get-Content cmdlet is a very popular PowerShell cmdlet that will retrieve all text from a text file specified by the Path parameter.

How do I read a text file in PowerShell?

When you want to read the entire contents of a text file, the easiest way is to use the built-in Get-Content function. When you execute this command, the contents of this file will be displayed in your command prompt or the PowerShell ISE screen, depending on where you execute it.


1 Answers

R may not be the best tool to process text files, but you can proceed as follows: identify the two columns by reading the file as a fixed-width file, separate the fields from their value by splitting the strings on the colons, add an "id" column, and put everything back in order.

# Read the file
d <- read.fwf("A.txt", c(37,100), stringsAsFactors=FALSE)

# Separate fields and values
d <- d[grep(":", d$V1),]
d <- cbind( 
  do.call( rbind, strsplit(d$V1, ":\\s+") ), 
  do.call( rbind, strsplit(d$V2, ":\\s+") ) 
)

# Add an id column
d <- cbind( d, cumsum( d[,1] == "Username" ) )

# Stack the left and right parts
d <- rbind( d[,c(5,1,2)], d[,c(5,3,4)] )
colnames(d) <- c("id", "field", "value")
d <- as.data.frame(d)
d$value <- gsub("\\s+$", "", d$value)

# Convert to a wide data.frame
library(reshape2)
d <- dcast( d, id ~ field )
like image 196
Vincent Zoonekynd Avatar answered Sep 20 '22 14:09

Vincent Zoonekynd