I am working with NCBI Reference Sequence accession numbers like variable a
:
a <- c("NM_020506.1","NM_020519.1","NM_001030297.2","NM_010281.2","NM_011419.3", "NM_053155.2")
To get information from the biomart package I need to remove the .1
, .2
etc. after the accession numbers. I normally do this with this code:
b <- sub("..*", "", a) # [1] "" "" "" "" "" ""
But as you can see, this isn't the correct way for this variable. Can anyone help me with this?
The substr() and strpos() function is used to remove portion of string after certain character. strpos() function: This function is used to find the first occurrence position of a string inside another string. Function returns an integer value of position of first occurrence of string.
You can also remove a specified character or substring from a string by calling the String. Replace(String, String) method and specifying an empty string (String. Empty) as the replacement. The following example removes all commas from a string.
To get the substring after a specific character, call the substring() method, passing it the index after the character's index as a parameter. The substring method will return the part of the string after the specified character.
You just need to escape the period:
a <- c("NM_020506.1","NM_020519.1","NM_001030297.2","NM_010281.2","NM_011419.3", "NM_053155.2") gsub("\\..*","",a) [1] "NM_020506" "NM_020519" "NM_001030297" "NM_010281" "NM_011419" "NM_053155"
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With