Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

For R: How to exclude some data files based on file language

Tags:

r

I'm rather new to R (and in law school so this is all very new to me), so apologies if this is poorly worded. I have a series of about 1500 documents that I am importing into R to categorize and analyze later. The first thing that I need to do is exclude all documents that are written in French, which are labelled with an "FR" in the title/doc.info. I was curious what kind of code I could use to exclude that before importing the files to have a clean data set before analyzing anything (since it will obvious make a mess of processes like sentiment analysis). Any help is appreciated (even if that help is explaining how to better talk about coding). Kind regards!

edit 1 The code that I am using is readtext(folder), which you can see below: folder<-"C:/[pathway]" submissions<-readtext(folder)

submissions_text<-submissions$text

submission_number<- numeric()
submission_person<- factor()
submission_code<- factor()
submission_language<-factor()
submission_location<-factor()

for (submission_name in submissions$doc_id) {
  submission_name<-gsub(".txt","",submission_name)
  number<-as.numeric(strsplit(submission_name, "_|-")[[1]][1])
  submission_number<-c(submission_number,number)
  person<-strsplit(submission_name, "_")[[1]][2]
  submission_person<-c(submission_person, person)
  code<-strsplit(submission_name, "_")[[1]][3]
  submission_code<-c(submission_code, code)
  lang<-strsplit(submission_name, "_")[[1]][4]
  submission_language<-c(submission_language, lang)
  location<-strsplit(submission_name, "_")[[1]][5]
  submission_location<-c(submission_location, location)
}

submissions<-cbind(submissions,submission_number)
submissions<-cbind(submissions,submission_person)
submissions<-cbind(submissions,submission_code)
submissions<-cbind(submissions,submission_language)
submissions<-cbind(submissions,submission_location)


submissions<-submissions[order(submissions$submission_number, decreasing = FALSE),]

This is just the organizational aspect of my code. I am looking to hopefully exclude all of the French data before this point (but if it comes afterward, I would also be more than happy with that).

like image 804
televised-god Avatar asked Jan 18 '19 23:01

televised-god


People also ask

How do I exclude files and directories from a text file?

To do so, create a text file with the name of the files and directories you want to exclude. Then, pass the name of the file to the --exlude-from option. The command looks like this: rsync -av --exclude-from= {'list.txt'} sourcedir/ destinationdir/. The rsync tool skips all files and directories you list in the file.

When to use an exclude file in Linux?

Alternatively, using an exclude file is convenient when there’s a relatively large directory tree with thousands of files and directories. If you have a few years of experience in the Linux ecosystem, and you’re interested in sharing that experience with the community, have a look at our Contribution Guidelines.

How do I exclude specific files from tar?

Using an Exclude File Alternatively, we can provide the tar command a file containing the list of files or directories to exclude when creating or extracting archive files. This file is called an exclude file. Let’s see how to use an exclude file to ignore specific files and directories while archiving.

How do I exclude files from rsync?

rsync -av --exclude= {'*.txt','dir3','dir4'} sourcedir/ destinationdir/ The output shows that the listed files and directories are excluded from the transfer. When you need to exclude a large number of different files and directories, you can use the rsync --exclude-from flag.


1 Answers

The functionality you are after can be found in the list.files() function. Documentation can be found here.

In short, your code will likely end up looking something like this:

setwd("c:/path/to/your/data/here")
files <- list.files()
non_french_files <- files[!grepl("FR", files)]
lapply(non_french_files, function(x) {
  f <- read.csv(x)
  #do stuff with f
}]

Note - you could directly leverage the pattern parameter found in `list.files(), but I chose to do that in two steps in case you wanted to do something else with the French files. This also simplifies what each line of code is doing...

...good luck and welcome to R!

like image 159
Chase Avatar answered Oct 12 '22 22:10

Chase