I'm using R and I have a data.frame with nearly 2,000 entries that looks as follows:
> head(PVs,15)
LogFreq Word PhonCV FreqDev
1593 140 was CVC 5.480774
482 139 had CVC 5.438114
1681 138 zou CVVC 5.395454
1662 137 zei CVV 5.352794
1619 136 werd CVCC 5.310134
1592 135 waren CVV-CV 5.267474
620 134 kon CVC 5.224814
646 133 kwam CCVC 5.182154
483 132 hadden CVC-CV 5.139494
436 131 ging CVC 5.096834
734 130 moest CVVCC 5.054174
1171 129 stond CCVCC 5.011514
1654 128 zag CVC 4.968854
1620 127 werden CVC-CV 4.926194
1683 126 zouden CVV-CV 4.883534
What I want to do is to create a new data.frame that is equal to PVs, except that all entries having as a member of the "Word" column a string of character that does NOT end in either "te" or "de" removed. i.e. All words not ending in either "de" or "te" should be removed from the data.frame.
I know how to slectively remove entries from data.frames using logical operators, but those work when you're setting numeric criteria. I think to do this I need to use regular expressions, but sadly R is the only programming language I "know", so I'm far from knowing what type of code to use here.
I appreciate your help. Thanks in advance.
By using df[rows,columns] approach lets select the rows by row name from the R data frame. In order to select the rows specify the rows option.
By using bracket notation on R DataFrame (data.name) we can select rows by column value, by index, by name, by condition e.t.c. You can also use the R base function subset() to get the same results. Besides these, R also provides another function dplyr::filter() to get the rows from the DataFrame.
Select Rows by list of Column Values. By using the same notation you can also use an operator %in% to select the DataFrame rows based on a list of values. The following example returns all rows when state values are present in vector values c('CA','AZ','PH') .
The last n rows of the data frame can be accessed by using the in-built tail() method in R. Supposedly, N is the total number of rows in the data frame, then n <=N last rows can be extracted from the structure.
Method 1
You can use grepl
with an appropraite regular expression. Consider the following:
x <- c("blank","wade","waste","rubbish","dedekind","bated")
grepl("^.+(de|te)$",x)
[1] FALSE TRUE TRUE FALSE FALSE FALSE
The regular expression says begin (^
) with anything any number of times (.+
) and then find either de or te ((de|te)
) then end ($
).
So for your data.frame try,
subset(PVs,grepl("^.+(de|te)$",Word))
Method 2
To avoid the regexp method you can use a substr
method instead.
# substr the last two characters and test
substr(x,nchar(x)-1,nchar(x)) %in% c("de","te")
[1] FALSE TRUE TRUE FALSE FALSE FALSE
So try:
subset(PVs,substr(Word,nchar(Word)-1,nchar(Word)) %in% c("de","te"))
I modified the data a bit so that there were words that ended in te or de.
> PV
LogFreq Word PhonCV FreqDev
1593 140 blahte CVC 5.480774
482 139 had CVC 5.438114
1681 138 aaaade CVVC 5.395454
1662 137 zei CVV 5.352794
1619 136 werd CVCC 5.310134
1592 135 waren CVV-CV 5.267474
620 134 kon CVC 5.224814
646 133 kwamde CCVC 5.182154
483 132 hadden CVC-CV 5.139494
436 131 ging CVC 5.096834
734 130 moeste CVVCC 5.054174
1171 129 stond CCVCC 5.011514
1654 128 zagde CVC 4.968854
1620 127 werden CVC-CV 4.926194
1683 126 zouden CVV-CV 4.883534
# Add a column to PV that you can visually check the regular expression matches.
PV$Match <- grepl(pattern = "(de|te)$", PV$Word)
# Subset PV data frame to show only TRUE matches
PV <- PV[PV$Match == FALSE, ]
The result is shown below
LogFreq Word PhonCV FreqDev Match
482 139 had CVC 5.438114 FALSE
1662 137 zei CVV 5.352794 FALSE
1619 136 werd CVCC 5.310134 FALSE
1592 135 waren CVV-CV 5.267474 FALSE
620 134 kon CVC 5.224814 FALSE
483 132 hadden CVC-CV 5.139494 FALSE
436 131 ging CVC 5.096834 FALSE
1171 129 stond CCVCC 5.011514 FALSE
1620 127 werden CVC-CV 4.926194 FALSE
1683 126 zouden CVV-CV 4.883534 FALSE
Using grep
grep -xvE '.{17}(de|te).*' file.txt
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With