I am new to R, any suggestions would be appreciated.
This is the data:
coordinates <- "(-79.43591570873059, 43.68015339477487), (-79.43491506339724, 43.68036886994886), (-79.43394727223847, 43.680578504490335), (-79.43388162422195, 43.68058996121469), (-79.43281544978878, 43.680808044458765), (-79.4326971769691, 43.68079658822322)"
I would like this to become:
Latitude Longitude
-79.43591570873059 43.68015339477487
-79.43491506339724 43.68036886994886
-79.43394727223847 43.680578504490335
-79.43388162422195 43.68058996121469
-79.43281544978878 43.680808044458765
-79.4326971769691 43.68079658822322
Example 3: Convert an Entire DataFrame to Strings Lastly, we can convert every column in a DataFrame to strings by using the following syntax: #convert every column to strings df = df.astype (str) #check data type of each column df.dtypes player object points object assists object dtype: object
- GeeksforGeeks How to Convert String to Integer in Pandas DataFrame? Let’s see methods to convert string to an integer in Pandas DataFrame: Method 1: Use of Series.astype () method. dtype: Data type to convert the series into. (for example str, float, int). copy: Makes a copy of dataframe /series.
As you can see, the data type of all the columns across the DataFrame is object: You can then add the following syntax to convert all the values into floats under the entire DataFrame: df = df.astype (float) So the complete Python code to perform the conversion would be: import pandas as pd data = {'Price_1': ['300','750','600','770','920'], ...
To read data given in string form as a DataFrame, use read_csv (~) along with StringIO like so: Did you find this page useful? Ask a question or leave a feedback...
You can use scan
with a little gsub
:
matrix(scan(text = gsub("[()]", "", coordinates), sep = ","),
ncol = 2, byrow = TRUE, dimnames = list(NULL, c("Lat", "Long")))
# Read 12 items
# Lat Long
# [1,] -79.43592 43.68015
# [2,] -79.43492 43.68037
# [3,] -79.43395 43.68058
# [4,] -79.43388 43.68059
# [5,] -79.43282 43.68081
# [6,] -79.43270 43.68080
The precision is still there--just truncated in the matrix display.
Two clear advantages:
coordinates <- rep(coordinates, 10)
as an input).Here's another option:
library(data.table)
fread(gsub("[()]", "", gsub("), (", "\n", toString(coordinates), fixed = TRUE)), header = FALSE)
The toString(coordinates)
is for cases when length(coordinates) > 1
. You could also use fread(text = gsub(...), ...)
and skip using toString
. I'm not sure of the advantages or limitations of either approach.
We can use str_extract_all
from stringr
library(stringr)
df <- data.frame(Latitude = str_extract_all(coordinates, "(?<=\\()-\\d+\\.\\d+")[[1]],
Longitude = str_extract_all(coordinates, "(?<=,\\s)\\d+\\.\\d+(?=\\))")[[1]])
df
# Latitude Longitude
#1 -79.43591570873059 43.68015339477487
#2 -79.43491506339724 43.68036886994886
#3 -79.43394727223847 43.680578504490335
#4 -79.43388162422195 43.68058996121469
#5 -79.43281544978878 43.680808044458765
#6 -79.4326971769691 43.68079658822322
Latitude
captures the negative decimal number from opening round brackets ((
) whereas Longitude
captures it from comma (,
) to closing round brackets ()
).
Or without regex lookahead and behind and capturing it together using str_match_all
df <- data.frame(str_match_all(coordinates,
"\\((-\\d+\\.\\d+),\\s(\\d+\\.\\d+)\\)")[[1]][, c(2, 3)])
To convert data into their respective types, you could use type.convert
df <- type.convert(df)
Here is a base R option:
coordinates <- "(-79.43591570873059, 43.68015339477487), (-79.43491506339724, 43.68036886994886), (-79.43394727223847, 43.680578504490335), (-79.43388162422195, 43.68058996121469), (-79.43281544978878, 43.680808044458765), (-79.4326971769691, 43.68079658822322)"
coordinates <- gsub("^\\(|\\)$", "", coordinates)
x <- strsplit(coordinates, "\\), \\(")[[1]]
df <- data.frame(lat=sub(",.*$", "", x), lng=sub("^.*, ", "", x), stringsAsFactors=FALSE)
df
The strategy here is to first strip the leading trailing parentheses, then string split on \), \(
to generate a single character vector with each latitude/longitude pair. Finally, we generate a data frame output.
lat lng
1 -79.43591570873059 43.68015339477487
2 -79.43491506339724 43.68036886994886
3 -79.43394727223847 43.680578504490335
4 -79.43388162422195 43.68058996121469
5 -79.43281544978878 43.680808044458765
6 -79.4326971769691 43.68079658822322
Yet another base R version with a bit of regex, relying on the fact that replacing the punctuation with blank lines will mean they get skipped on import.
read.csv(text=gsub(")|(, |^)\\(", "\n", coordinates), col.names=c("lat","long"), header=FALSE)
# lat long
#1 -79.43592 43.68015
#2 -79.43492 43.68037
#3 -79.43395 43.68058
#4 -79.43388 43.68059
#5 -79.43282 43.68081
#6 -79.43270 43.68080
Advantages:
scan
answer.Disadvantages:
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With