I am new to R, any suggestions would be appreciated. This is the data: <pre class="prettyprint"><code>coordinates <- "(-79.43591570873059, 43.68015339477487), (-79.43491506339724, 43.68036886994886), (-79.43394727223847, 43.680578504490335), (-79.43388162422195, 43.68058996121469), (-79.43281544978878, 43.680808044458765), (-79.4326971769691, 43.68079658822322)" </code></pre> I would like this to become: <pre class="prettyprint"><code>Latitude Longitude -79.43591570873059 43.68015339477487 -79.43491506339724 43.68036886994886 -79.43394727223847 43.680578504490335 -79.43388162422195 43.68058996121469 -79.43281544978878 43.680808044458765 -79.4326971769691 43.68079658822322 </code></pre>

You can use <code>scan</code> with a little <code>gsub</code>: <pre class="prettyprint"><code>matrix(scan(text = gsub("[()]", "", coordinates), sep = ","), ncol = 2, byrow = TRUE, dimnames = list(NULL, c("Lat", "Long"))) # Read 12 items # Lat Long # [1,] -79.43592 43.68015 # [2,] -79.43492 43.68037 # [3,] -79.43395 43.68058 # [4,] -79.43388 43.68059 # [5,] -79.43282 43.68081 # [6,] -79.43270 43.68080 </code></pre> The precision is still there--just truncated in the matrix display. Two clear advantages: <ul> <li>Fast.</li> <li>Handles multi-element "coordinates" vector (eg: <code>coordinates <- rep(coordinates, 10)</code> as an input).</li> </ul> <hr> Here's another option: <pre class="prettyprint"><code>library(data.table) fread(gsub("[()]", "", gsub("), (", "\n", toString(coordinates), fixed = TRUE)), header = FALSE) </code></pre> The <code>toString(coordinates)</code> is for cases when <code>length(coordinates) > 1</code>. You could also use <code>fread(text = gsub(...), ...)</code> and skip using <code>toString</code>. I'm not sure of the advantages or limitations of either approach.

We can use <code>str_extract_all</code> from <code>stringr</code> <pre class="prettyprint"><code>library(stringr) df <- data.frame(Latitude = str_extract_all(coordinates, "(?<=\\()-\\d+\\.\\d+")[[1]], Longitude = str_extract_all(coordinates, "(?<=,\\s)\\d+\\.\\d+(?=\\))")[[1]]) df # Latitude Longitude #1 -79.43591570873059 43.68015339477487 #2 -79.43491506339724 43.68036886994886 #3 -79.43394727223847 43.680578504490335 #4 -79.43388162422195 43.68058996121469 #5 -79.43281544978878 43.680808044458765 #6 -79.4326971769691 43.68079658822322 </code></pre> <code>Latitude</code> captures the negative decimal number from opening round brackets (<code>(</code>) whereas <code>Longitude</code> captures it from comma (<code>,</code>) to closing round brackets (<code>)</code>). Or without regex lookahead and behind and capturing it together using <code>str_match_all</code> <pre class="prettyprint"><code>df <- data.frame(str_match_all(coordinates, "\\((-\\d+\\.\\d+),\\s(\\d+\\.\\d+)\\)")[[1]][, c(2, 3)]) </code></pre> To convert data into their respective types, you could use <code>type.convert</code> <pre class="prettyprint"><code>df <- type.convert(df) </code></pre>

Convert string data into data frame

Tags:

string

regex

r

I am new to R, any suggestions would be appreciated.

This is the data:

coordinates <- "(-79.43591570873059, 43.68015339477487), (-79.43491506339724, 43.68036886994886), (-79.43394727223847, 43.680578504490335), (-79.43388162422195, 43.68058996121469), (-79.43281544978878, 43.680808044458765), (-79.4326971769691, 43.68079658822322)"

I would like this to become:

Latitude           Longitude
-79.43591570873059 43.68015339477487
-79.43491506339724 43.68036886994886
-79.43394727223847 43.680578504490335
-79.43388162422195 43.68058996121469
-79.43281544978878 43.680808044458765
-79.4326971769691  43.68079658822322

376

asked Dec 04 '19 04:12

Johnny Tang

4 Answers

You can use scan with a little gsub:

matrix(scan(text = gsub("[()]", "", coordinates), sep = ","), 
       ncol = 2, byrow = TRUE, dimnames = list(NULL, c("Lat", "Long")))
# Read 12 items
#            Lat     Long
# [1,] -79.43592 43.68015
# [2,] -79.43492 43.68037
# [3,] -79.43395 43.68058
# [4,] -79.43388 43.68059
# [5,] -79.43282 43.68081
# [6,] -79.43270 43.68080

The precision is still there--just truncated in the matrix display.

Two clear advantages:

Fast.
Handles multi-element "coordinates" vector (eg: coordinates <- rep(coordinates, 10) as an input).

Here's another option:

library(data.table)
fread(gsub("[()]", "", gsub("), (", "\n", toString(coordinates), fixed = TRUE)), header = FALSE)

The toString(coordinates) is for cases when length(coordinates) > 1. You could also use fread(text = gsub(...), ...) and skip using toString. I'm not sure of the advantages or limitations of either approach.

answered Oct 18 '22 18:10

A5C1D2H2I1M1N2O1R2T1

We can use str_extract_all from stringr

library(stringr)

df <- data.frame(Latitude = str_extract_all(coordinates, "(?<=\\()-\\d+\\.\\d+")[[1]], 
      Longitude = str_extract_all(coordinates, "(?<=,\\s)\\d+\\.\\d+(?=\\))")[[1]])
df
#            Latitude          Longitude
#1 -79.43591570873059  43.68015339477487
#2 -79.43491506339724  43.68036886994886
#3 -79.43394727223847 43.680578504490335
#4 -79.43388162422195  43.68058996121469
#5 -79.43281544978878 43.680808044458765
#6  -79.4326971769691  43.68079658822322

Latitude captures the negative decimal number from opening round brackets (() whereas Longitude captures it from comma (,) to closing round brackets ()).

Or without regex lookahead and behind and capturing it together using str_match_all

df <- data.frame(str_match_all(coordinates, 
                        "\\((-\\d+\\.\\d+),\\s(\\d+\\.\\d+)\\)")[[1]][, c(2, 3)])

To convert data into their respective types, you could use type.convert

df <- type.convert(df)

answered Oct 18 '22 19:10

Ronak Shah

Here is a base R option:

coordinates <- "(-79.43591570873059, 43.68015339477487), (-79.43491506339724, 43.68036886994886), (-79.43394727223847, 43.680578504490335), (-79.43388162422195, 43.68058996121469), (-79.43281544978878, 43.680808044458765), (-79.4326971769691, 43.68079658822322)"
coordinates <- gsub("^\\(|\\)$", "", coordinates)
x <- strsplit(coordinates, "\\), \\(")[[1]]
df <- data.frame(lat=sub(",.*$", "", x), lng=sub("^.*, ", "", x), stringsAsFactors=FALSE)
df

The strategy here is to first strip the leading trailing parentheses, then string split on \), \( to generate a single character vector with each latitude/longitude pair. Finally, we generate a data frame output.

                 lat                lng
1 -79.43591570873059  43.68015339477487
2 -79.43491506339724  43.68036886994886
3 -79.43394727223847 43.680578504490335
4 -79.43388162422195  43.68058996121469
5 -79.43281544978878 43.680808044458765
6  -79.4326971769691 43.68079658822322

answered Oct 18 '22 19:10

Tim Biegeleisen

Yet another base R version with a bit of regex, relying on the fact that replacing the punctuation with blank lines will mean they get skipped on import.

read.csv(text=gsub(")|(, |^)\\(", "\n", coordinates), col.names=c("lat","long"), header=FALSE)
#        lat     long
#1 -79.43592 43.68015
#2 -79.43492 43.68037
#3 -79.43395 43.68058
#4 -79.43388 43.68059
#5 -79.43282 43.68081
#6 -79.43270 43.68080

Advantages:

Deals with vector input as well like the other scan answer.
Converts to correct numeric types in output

Disadvantages:

Not super fast

answered Oct 18 '22 18:10

thelatemail

Related questions
                            
                                Relative image paths for Twitter cards in blogdown
                            
                                Find overlapping dates for each ID and create a new row for the overlap
                            
                                shiny dashboard mainpanel height issue
                            
                                Horizontal legend with title on top in ggplot
                            
                                Functional programming with dplyr
                            
                                R time_trans works with objects of class POSIXct
                            
                                How to change colors on barplot?
                            
                                data.table avoid recycling
                            
                                How to group by in base R
                            
                                Filter the middle row of each group
                            
                                Use select_helpers with dplyr::coalesce
                            
                                Replace column values with column name using dplyr's transmute_all
                            
                                Create a new column based on column that does not yet exist
                            
                                Draw border around certain rows using cowplot and ggplot2
                            
                                How to correctly use group_by() and summarise() in a For loop in R
                            
                                wrap text in knitr::kable table cell using "\n"
                            
                                Error in contrib.url(repos, "source") in R trying to use CRAN without setting a mirror Calls: install.packages -> contrib.url Execution halted
                            
                                How to aggregate categorical data in R?
                            
                                Bind vectors across lists to single list of matrices
                            
                                Is it possible to pass multible variables to the same curly curly?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With