I am new to R, any suggestions would be appreciated.
This is the data:
coordinates <- "(-79.43591570873059, 43.68015339477487), (-79.43491506339724, 43.68036886994886), (-79.43394727223847, 43.680578504490335), (-79.43388162422195, 43.68058996121469), (-79.43281544978878, 43.680808044458765), (-79.4326971769691, 43.68079658822322)"
I would like this to become:
Latitude           Longitude
-79.43591570873059 43.68015339477487
-79.43491506339724 43.68036886994886
-79.43394727223847 43.680578504490335
-79.43388162422195 43.68058996121469
-79.43281544978878 43.680808044458765
-79.4326971769691  43.68079658822322
                Example 3: Convert an Entire DataFrame to Strings Lastly, we can convert every column in a DataFrame to strings by using the following syntax: #convert every column to strings df = df.astype (str) #check data type of each column df.dtypes player object points object assists object dtype: object
- GeeksforGeeks How to Convert String to Integer in Pandas DataFrame? Let’s see methods to convert string to an integer in Pandas DataFrame: Method 1: Use of Series.astype () method. dtype: Data type to convert the series into. (for example str, float, int). copy: Makes a copy of dataframe /series.
As you can see, the data type of all the columns across the DataFrame is object: You can then add the following syntax to convert all the values into floats under the entire DataFrame: df = df.astype (float) So the complete Python code to perform the conversion would be: import pandas as pd data = {'Price_1': ['300','750','600','770','920'], ...
To read data given in string form as a DataFrame, use read_csv (~) along with StringIO like so: Did you find this page useful? Ask a question or leave a feedback...
You can use scan with a little gsub:
matrix(scan(text = gsub("[()]", "", coordinates), sep = ","), 
       ncol = 2, byrow = TRUE, dimnames = list(NULL, c("Lat", "Long")))
# Read 12 items
#            Lat     Long
# [1,] -79.43592 43.68015
# [2,] -79.43492 43.68037
# [3,] -79.43395 43.68058
# [4,] -79.43388 43.68059
# [5,] -79.43282 43.68081
# [6,] -79.43270 43.68080
The precision is still there--just truncated in the matrix display.
Two clear advantages:
coordinates <- rep(coordinates, 10) as an input).Here's another option:
library(data.table)
fread(gsub("[()]", "", gsub("), (", "\n", toString(coordinates), fixed = TRUE)), header = FALSE)
The toString(coordinates) is for cases when length(coordinates) > 1. You could also use fread(text = gsub(...), ...) and skip using toString. I'm not sure of the advantages or limitations of either approach.
We can use str_extract_all from stringr
library(stringr)
df <- data.frame(Latitude = str_extract_all(coordinates, "(?<=\\()-\\d+\\.\\d+")[[1]], 
      Longitude = str_extract_all(coordinates, "(?<=,\\s)\\d+\\.\\d+(?=\\))")[[1]])
df
#            Latitude          Longitude
#1 -79.43591570873059  43.68015339477487
#2 -79.43491506339724  43.68036886994886
#3 -79.43394727223847 43.680578504490335
#4 -79.43388162422195  43.68058996121469
#5 -79.43281544978878 43.680808044458765
#6  -79.4326971769691  43.68079658822322
Latitude captures the negative decimal number from opening round brackets (() whereas Longitude captures it from comma (,) to closing round brackets ()).
Or without regex lookahead and behind and capturing it together using str_match_all
df <- data.frame(str_match_all(coordinates, 
                        "\\((-\\d+\\.\\d+),\\s(\\d+\\.\\d+)\\)")[[1]][, c(2, 3)])
To convert data into their respective types, you could use type.convert
df <- type.convert(df)
                        Here is a base R option:
coordinates <- "(-79.43591570873059, 43.68015339477487), (-79.43491506339724, 43.68036886994886), (-79.43394727223847, 43.680578504490335), (-79.43388162422195, 43.68058996121469), (-79.43281544978878, 43.680808044458765), (-79.4326971769691, 43.68079658822322)"
coordinates <- gsub("^\\(|\\)$", "", coordinates)
x <- strsplit(coordinates, "\\), \\(")[[1]]
df <- data.frame(lat=sub(",.*$", "", x), lng=sub("^.*, ", "", x), stringsAsFactors=FALSE)
df
The strategy here is to first strip the leading trailing parentheses, then string split on \), \( to generate a single character vector with each latitude/longitude pair.  Finally, we generate a data frame output.
                 lat                lng
1 -79.43591570873059  43.68015339477487
2 -79.43491506339724  43.68036886994886
3 -79.43394727223847 43.680578504490335
4 -79.43388162422195  43.68058996121469
5 -79.43281544978878 43.680808044458765
6  -79.4326971769691 43.68079658822322
                        Yet another base R version with a bit of regex, relying on the fact that replacing the punctuation with blank lines will mean they get skipped on import.
read.csv(text=gsub(")|(, |^)\\(", "\n", coordinates), col.names=c("lat","long"), header=FALSE)
#        lat     long
#1 -79.43592 43.68015
#2 -79.43492 43.68037
#3 -79.43395 43.68058
#4 -79.43388 43.68059
#5 -79.43282 43.68081
#6 -79.43270 43.68080
Advantages:
scan answer.Disadvantages:
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With