I have a file containing a certain number of lines. Each line looks like this:
TF_list_to_test10004/Nus_k0.345_t0.1_e0.1.adj:PKMYT1
I would like to remove all before ":" character in order to retain only PKMYT1 that is a gene name. Since I'm not an expert in regex scripting can anyone help me to do this using Unix (sed or awk) or in R?
Press Ctrl + H to open the Find and Replace dialog. In the Find what box, enter one of the following combinations: To eliminate text before a given character, type the character preceded by an asterisk (*char). To remove text after a certain character, type the character followed by an asterisk (char*).
In the 'Find what' field, enter ,* (i.e., comma followed by an asterisk sign) Leave the 'Replace with' field empty. Click on the Replace All button.
Here are two ways of doing it in R:
foo <- "TF_list_to_test10004/Nus_k0.345_t0.1_e0.1.adj:PKMYT1" # Remove all before and up to ":": gsub(".*:","",foo) # Extract everything behind ":": regmatches(foo,gregexpr("(?<=:).*",foo,perl=TRUE))
A simple regular expression used with gsub()
:
x <- "TF_list_to_test10004/Nus_k0.345_t0.1_e0.1.adj:PKMYT1" gsub(".*:", "", x) "PKMYT1"
See ?regex
or ?gsub
for more help.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With