Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

remove all delimiters at beginning and end of string

Tags:

regex

r

After I collapse my rows and separate using a semicolon, I'd like to delete the semicolons at the front and back of my string. Multiple semicolons represent blanks in a cell. For example an observation may look as follows after the collapse:

;TX;PA;CA;;;;;;;

I'd like the cell to look like this:

TX;PA;CA

Here is my collapse code:

new_df <- group_by(old_df, unique_id) %>% summarize_each(funs(paste(., collapse = ';')))

If I try to gsub for semicolon it removes all of them. If if I remove the end character it just removes one of the semicolons. Any ideas on how to remove all at the beginning and end, but leaving the ones in between the observations? Thanks.

like image 800
DCRubyHound Avatar asked Oct 20 '16 19:10

DCRubyHound


2 Answers

use the regular expression ^;+|;+$

x <- ";TX;PA;CA;;;;;;;"
gsub("^;+|;+$", "", x)

The ^ indicates the start of the string, the + indicates multiple matches, and $ indicates the end of the string. The | states "OR". So, combined, it's searching for any number of ; at the start of a string OR any number of ; at the end of the string, and replace those with an empty space.

like image 90
Benjamin Avatar answered Sep 28 '22 00:09

Benjamin


The stringi package allows you to specify patterns which you wish to preserve and trim everything else. If you only have letters there (though you could specify other pattern too), you could simply do

stringi::stri_trim_both(";TX;PA;CA;;;;;;;", "\\p{L}")
## [1] "TX;PA;CA"
like image 21
David Arenburg Avatar answered Sep 28 '22 01:09

David Arenburg