I have a list of birthdays that look something like this: <pre class="prettyprint"><code>dob <- c("9/9/43 12:00 AM/PM", "9/17/88 12:00 AM/PM", "11/21/48 12:00 AM/PM") </code></pre> I want to just grab the calendar date from this variable (ie drop everything after the first occurrence of white-space). Here's what I have tried so far: <pre class="prettyprint"><code>dob.abridged <- substring(dob,1,8) dob [1] "9/9/43 1" "9/17/88 " "11/21/48" dob.abridged <- gsub(" $","", dob.abridged, perl=T) > dob.abridged [1] "9/9/43 1" "9/17/88" "11/21/48" </code></pre> So my code works for calendar dates of length 6 or 7, but not length 8. Any pointers on a more effective regex to use with gsub that can handle calendar dates of length 6, 7 or 8? Thank you.

No need for substring, just use gsub: <pre class="prettyprint"><code>gsub( " .*$", "", dob ) # [1] "9/9/43" "9/17/88" "11/21/48" </code></pre> A space (<code></code>), then any character (<code>.</code>) any number of times (<code>*</code>) until the end of the string (<code>$</code>). See ?regex to learn regular expressions.

I often use <code>strsplit</code> for these sorts of problems but liked how simple Romain's answer was. I thought it would be interesting to compare Romain's solution to a <code>strsplit</code> answer: Here's a <code>strsplit</code> solution: <pre class="prettyprint"><code>sapply(strsplit(dob, "\\s+"), "[", 1) </code></pre> Using the microbenchmark package and <code>dob <- rep(dob, 1000)</code> with the original data: <pre class="prettyprint"><code>Unit: milliseconds expr min lq median gsub(" .*$", "", dob) 4.228843 4.247969 4.258232 sapply(strsplit(dob, "\\\\s+"), "[", 1) 14.438241 14.558832 14.634638 uq max neval 4.268029 5.081608 1000 14.756628 53.344984 1000 </code></pre> The clear winner on a Win 7 machine is the <code>gsub</code> regex from Romain. Thanks for the answer and explanation Romain.

Using gsub to extract character string before white space in R

Tags:

r

I have a list of birthdays that look something like this:

dob <- c("9/9/43 12:00 AM/PM", "9/17/88 12:00 AM/PM", "11/21/48 12:00 AM/PM")

I want to just grab the calendar date from this variable (ie drop everything after the first occurrence of white-space).

Here's what I have tried so far:

dob.abridged <- substring(dob,1,8) dob [1] "9/9/43 1" "9/17/88 " "11/21/48" dob.abridged <- gsub(" $","", dob.abridged, perl=T) > dob.abridged [1] "9/9/43 1" "9/17/88"  "11/21/48"

So my code works for calendar dates of length 6 or 7, but not length 8. Any pointers on a more effective regex to use with gsub that can handle calendar dates of length 6, 7 or 8?

Thank you.

401

asked Apr 09 '13 06:04

Anupa Fabian

2 Answers

No need for substring, just use gsub:

gsub( " .*$", "", dob ) # [1] "9/9/43"   "9/17/88"  "11/21/48"

A space (), then any character (.) any number of times (*) until the end of the string ($). See ?regex to learn regular expressions.

145

answered Sep 18 '22 19:09

Romain Francois

I often use strsplit for these sorts of problems but liked how simple Romain's answer was. I thought it would be interesting to compare Romain's solution to a strsplit answer:

Here's a strsplit solution:

sapply(strsplit(dob, "\\s+"), "[", 1)

Using the microbenchmark package and dob <- rep(dob, 1000) with the original data:

Unit: milliseconds                                     expr       min        lq    median                    gsub(" .*$", "", dob)  4.228843  4.247969  4.258232  sapply(strsplit(dob, "\\\\s+"), "[", 1) 14.438241 14.558832 14.634638         uq       max neval   4.268029  5.081608  1000  14.756628 53.344984  1000

The clear winner on a Win 7 machine is the gsub regex from Romain. Thanks for the answer and explanation Romain.

answered Sep 18 '22 19:09

Tyler Rinker

Related questions
                            
                                Assigning NULL to a list element in R?
                            
                                How can I label points in this scatterplot?
                            
                                R tick data : merging date and time into a single object
                            
                                Get specific object from Rdata file
                            
                                Passing list of named parameters to function?
                            
                                ggplot2: Is there a fix for jagged, poor-quality text produced by geom_text()?
                            
                                Adding a new column to each element in a list of tables or data frames
                            
                                Error in contrasts when defining a linear model in R
                            
                                Insert rows for missing dates/times
                            
                                add commas into number for output
                            
                                Make Frequency Histogram for Factor Variables
                            
                                Plot 3D data in R
                            
                                Right way to convert data.frame to a numeric matrix, when df also contains strings?
                            
                                Aggregate a dataframe on a given column and display another column
                            
                                Format number in R with both comma thousands separator and specified decimals
                            
                                DocumentTermMatrix error on Corpus argument
                            
                                Configuration failed because libcurl was not found
                            
                                Rotate a Matrix in R by 90 degrees clockwise
                            
                                Read Stata 13 file in R
                            
                                How do I change the default library path for R packages

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With