Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

R - gsub a specific character of a specific position

I would like to delete the last character of a variable. I was wondering if it is possible to select the position with gsub and delete the character at this particular position.

In this example, I want to delete the last digit in the end, after the E, for my 4 variables.

variables = c('B10243E1', 'B10243E2', 'B10243E3', 'B10243E4')
gsub(pattern = '[[:xdigit:]]{8}.', replacement = '', x = variables)

I thought we could use the command

{}

in order to select a specific position.

like image 344
giac Avatar asked May 10 '15 12:05

giac


3 Answers

You can do it by capturing all the characters but the last:

variables = c('B10243E1', 'B10243E2', 'B10243E3', 'B10243E4')
gsub('^(.*).$', '\\1', variables)

Explanation:

  • ^ - Start of the string
  • (.*) - All characters but a newline up to
  • .$ - The last character (captured with .) before the end of string ($).

Thus, this regex is good to use if you plan to remove the final character, and the string does not contain newline.

See demo

Output:

[1] "B10243E" "B10243E" "B10243E" "B10243E"  

To only replace the 8th character (here is a sample where I added T at the end of each item):

variables = c('B10247E1T', 'B10243E2T', 'B10243E3T', 'B10243E4T')
gsub('^(.{7}).', '\\1', variables)

Output of the sample program (not ET at the end of each item, the digit was removed):

[1] "B10247ET" "B10243ET" "B10243ET" "B10243ET" 
like image 74
Wiktor Stribiżew Avatar answered Oct 30 '22 15:10

Wiktor Stribiżew


Try any of these. The first removes the last character, the second replaces E and anything after it with E, the third returns the first 7 characters assuming there are 8 characters, the remaining each return the first 7 characters. All are vectorized, i.e. variables may be a vector of character strings as in the question.

sub(".$", "", variables)

sub("E.*", "E", variables)

sub("^(.{7}).", "\\1", variables)

sub("^(.{7}).*", "\\1", variables)

substr(variables, 1, 7)

substring(variables, 1, 7)

trimws("abc333", "right", "\\d") # requires R 3.6 (currently r-devel)

Here is a visualization of the regular expression in the third solution:

^(.{7}).

Regular expression visualization

Debuggex Demo

and there is a visualization of the regular expression in the fourth solution:

^(.{7}).*

Regular expression visualization

Debuggex Demo

like image 33
G. Grothendieck Avatar answered Oct 30 '22 14:10

G. Grothendieck


If you always want to remove after E you can capture everything after it and replace by E

sub("E(.*)", 'E', variables)
## [1] "B10243E" "B10243E" "B10243E" "B10243E"

Alternatively, you can count 7 characters using positive look behind and remove everything after

sub("(?<=.{7})(.)", "", variables, perl = TRUE)
## [1] "B10243E" "B10243E" "B10243E" "B10243E"
like image 40
David Arenburg Avatar answered Oct 30 '22 14:10

David Arenburg