Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Extract string before "|" [duplicate]

Tags:

r

substr

extract

I have a data set wherein a column looks like this:

ABC|DEF|GHI,   ABCD|EFG|HIJK,   ABCDE|FGHI|JKL,   DEF|GHIJ|KLM,   GHI|JKLM|NO|PQRS,   BCDE|FGHI|JKL   

.... and so on

I need to extract the characters that appear before the first | symbol.

In Excel, we would use a combination of MID-SEARCH or a LEFT-SEARCH, R contains substr().

The syntax is - substr(x, <start>,<stop>)

In my case, start will always be 1. For stop, we need to search by |. How can we achieve this? Are there alternate ways to do this?

like image 692
Shounak Chakraborty Avatar asked Jul 10 '16 12:07

Shounak Chakraborty


People also ask

How do you get a string before a specific substring?

Use the substring() method to get the substring before a specific character, e.g. const before = str. substring(0, str. indexOf('_')); . The substring method will return a new string containing the part of the string before the specified character.

How do you extract part of a string?

The substr() method extracts a part of a string. The substr() method begins at a specified position, and returns a specified number of characters. The substr() method does not change the original string. To extract characters from the end of the string, use a negative start position.

How do I extract a string before a character in Python?

Python Substring Before Character You can extract a substring from a string before a specific character using the rpartition() method. What is this? rpartition() method partitions the given string based on the last occurrence of the delimiter and it generates tuples that contain three elements where.


2 Answers

We can use sub

sub("\\|.*", "", str1) #[1] "ABC" 

Or with strsplit

strsplit(str1, "[|]")[[1]][1] #[1] "ABC" 

Update

If we use the data from @hrbrmstr

sub("\\|.*", "", df$V1) #[1] "ABC"   "ABCD"  "ABCDE" "DEF"   "GHI"   "BCDE"  

These are all base R methods. No external packages used.

data

str1 <- "ABC|DEF|GHI ABCD|EFG|HIJK ABCDE|FGHI|JKL DEF|GHIJ|KLM GHI|JKLM|NO|PQRS BCDE|FGHI|JKL" 
like image 107
akrun Avatar answered Oct 22 '22 08:10

akrun


Another option word function of stringr package

library(stringr) word(df1$V1,1,sep = "\\|") 

Data

df1 <- read.table(text = "ABC|DEF|GHI,   ABCD|EFG|HIJK,   ABCDE|FGHI|JKL,   DEF|GHIJ|KLM,   GHI|JKLM|NO|PQRS,   BCDE|FGHI|JKL") 
like image 43
user2100721 Avatar answered Oct 22 '22 08:10

user2100721