Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

R command to check for an ALL CAPS sequence in every row of a file

Tags:

r

There is a csv data file that is populated with lot of raw data as follows:

data.frame(
  id=1:4,
  data=c(
         "it's a programming language",
         "this data is JUNK",
         "refer www.google.com",
         "check for more information")
)

I need to process this data, and check for an ALL CAPS sequence for every row and populate a new column with a 0/1 entry.

Output file is as follows:

id  data                         all_caps
1   it's a programming language         0
2   this data is JUNK                   1
3   refer www.google.com                0
4   check for more information          0

How to achieve this with R? I have been searching for this for a while now, not able to find any fruitful results for the processing of each row.

like image 423
N2M Avatar asked Mar 22 '23 19:03

N2M


1 Answers

Assuming your data.frame is called test:

test$all_caps <- grepl("[A-Z]{2,}",test$data)

  id                        data all_caps
1  1 it's a programming language    FALSE
2  2           this data is JUNK     TRUE
3  3        refer www.google.com    FALSE
4  4  check for more information    FALSE

Which you can make 0's and 1's by calling as.numeric

test$all_caps <- as.numeric(grepl("[A-Z]{2,}",test$data))

  id                        data all_caps
1  1 it's a programming language        0
2  2           this data is JUNK        1
3  3        refer www.google.com        0
4  4  check for more information        0
like image 65
thelatemail Avatar answered Apr 29 '23 19:04

thelatemail