I want to delete everything in this column 3 characters after '18'
MGL18JUNFUT
NATIONALUM18JUNFUT
NTPC18JUNFUT
ONGC18JUNFUT
PCJEWELLER18JUNFUT
PEL18JUNFUT
PFC18JUNFUT
PIDILITIND18JUNFUT
POWERGRID18JULFUT
PTC18JULFUT
RAYMOND18JULFUT
RBLBANK18JULFUT
RECLTD18JULFUT
RPOWER18JULFUT
MGL18JUN800PE
I want my output to look like
MGL18JUN
NATIONALUM18JUN
NTPC18JUN
ONGC18JUN
PCJEWELLER18JUN
PEL18JUN
PFC18JUN
PIDILITIND18JUN
POWERGRID18JUL
PTC18JUL
RAYMOND18JUL
RBLBANK18JUL
RECLTD18JUL
RPOWER18JUL
MGL18JUN
I have tried the following code.
output <- sub('(^.*?)18???.*', '' , df$column)
But the output is coming
8JUNFUT
8JUNFUT
8JUNFUT
8JUNFUT
8JUNFUT
8JUNFUT
8JUNFUT
8JUNFUT
8JUNFUT
8JUNFUT
8JUNFUT
8JUNFUT
8JUNFUT
8JUNFUT
8JUN800PE
Excel equivalent for this is.
=LEFT(A1, FIND("18",A1,1) +4)
I have tried many other options like sub, gregexpr , substr but nothing seems to work
Use gsub() function to remove a character from a string or text in R.
To remove first character from column name in R data frame, we can use str_sub function of stringr package.
The easiest way is to use the built-in substring() method of the String class. In order to remove the last character of a given String, we have to use two parameters: 0 as the starting index, and the index of the penultimate character.
Using gsub() Function and \\ It is also possible to remove all characters in front of a point using the gsub function.
We could change the sub
by capturing the pattern of characters (.*
followed by 18 and then zero to three characters (.{0,3}
or specifically 3 characters (.{3}
) in a group ((...)
) and replace by the backreference (\\1
) of the captured group
sub("^(.*18.{0,3}).*", "\\1", df$column)
or
sub("^(.*18.{3}).*", "\\1", df$column)
#[1] "MGL18JUN" "NATIONALUM18JUN" "NTPC18JUN" "ONGC18JUN"
#[5] "PCJEWELLER18JUN" "PEL18JUN" "PFC18JUN" "PIDILITIND18JUN"
#[9] "POWERGRID18JUL" "PTC18JUL" "RAYMOND18JUL" "RBLBANK18JUL"
#[13] "RECLTD18JUL" "RPOWER18JUL" "MGL18JUN"
Based on the OP's comments, if there are multiple instances of 18
v1 <- "PIDILITIND18JUN1180CE"
sub("^(.*?18.{3}).*", "\\1", v1)
It would also work on the initial data
sub("^(.*?18.{3}).*", "\\1", df$column)
#[1] "MGL18JUN" "NATIONALUM18JUN" "NTPC18JUN" "ONGC18JUN"
#[5] "PCJEWELLER18JUN" "PEL18JUN" "PFC18JUN" "PIDILITIND18JUN"
#[9] "POWERGRID18JUL" "PTC18JUL" "RAYMOND18JUL" "RBLBANK18JUL"
#[13] "RECLTD18JUL" "RPOWER18JUL" "MGL18JUN"
df <- structure(list(column = c("MGL18JUNFUT", "NATIONALUM18JUNFUT",
"NTPC18JUNFUT", "ONGC18JUNFUT", "PCJEWELLER18JUNFUT", "PEL18JUNFUT",
"PFC18JUNFUT", "PIDILITIND18JUNFUT", "POWERGRID18JULFUT", "PTC18JULFUT",
"RAYMOND18JULFUT", "RBLBANK18JULFUT", "RECLTD18JULFUT", "RPOWER18JULFUT",
"MGL18JUN800PE")), .Names = "column", class = "data.frame",
row.names = c(NA,
-15L))
You can also use stringr::str_extract
stringr::str_extract(string, "(.*)18\\w{3}")
Logic:
str_extract extracts the regex (regular expression match). Here I am trying to match everything (using .*, .
means any character and * matches zero or more character) till 18 then extracting 3 letters(consists of alphabets and numbers, using \w with {3}), also please note in case you do want it to extract between 1 to 3 you can use {m,n}, where m suggests minimum number of match, and n suggests maximum number of match. An example: \w{2,3} would match any string with 2 or 3 alphabets and so on. You can use help(regex)
to have detailed understanding for the same. Thanks. I hope this is helpful.
Output:
#> stringr::str_extract(string, "(.*)18\\w{3}")
# [1] "MGL18JUN" "NATIONALUM18JUN" "NTPC18JUN" "ONGC18JUN"
# [5] "PCJEWELLER18JUN" "PEL18JUN" "PFC18JUN" "PIDILITIND18JUN"
# [9] "POWERGRID18JUL" "PTC18JUL" "RAYMOND18JUL" "RBLBANK18JUL"
# [13] "RECLTD18JUL" "RPOWER18JUL" "MGL18JUN"
Input:
string <- c("MGL18JUNFUT",
"NATIONALUM18JUNFUT",
"NTPC18JUNFUT",
"ONGC18JUNFUT",
"PCJEWELLER18JUNFUT",
"PEL18JUNFUT",
"PFC18JUNFUT",
"PIDILITIND18JUNFUT",
"POWERGRID18JULFUT",
"PTC18JULFUT",
"RAYMOND18JULFUT",
"RBLBANK18JULFUT",
"RECLTD18JULFUT",
"RPOWER18JULFUT",
"MGL18JUN800PE")
EDIT:-
If you have multiple 18s in your data and wanted to match till first 18 then you can stop the greediness of regex character *
by using ?
, like below:
stringr::str_extract(string, "(.*?)18\\w{3}")
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With