I have a list of weather stations and their locations by latitude and longitude. There was formatting issue and some of them have have hours and minutes while other have hours, minutes and seconds. I can find the pattern using regex but I'm having trouble extracting the individual pieces.
Here's data:
> head(wthrStat1 )
Station lat lon
1940 K01R 31-08N 092-34W
1941 K01T 28-08N 094-24W
1942 K03Y 48-47N 096-57W
1943 K04V 38-05-50N 106-10-07W
1944 K05F 31-25-16N 097-47-49W
1945 K06D 48-53-04N 099-37-15W
I'd like something like this:
Station latHr latMin latSec latDir lonHr lonMin lonSec lonDir
1940 K01R 31 08 00 N 092 34 00 W
1941 K01T 28 08 00 N 094 24 00 W
1942 K03Y 48 47 00 N 096 57 00 W
1943 K04V 38 05 50 N 106 10 07 W
1944 K05F 31 25 16 N 097 47 49 W
1945 K06D 48 53 04 N 099 37 15 W
I can get matches to this regex:
data.format <- "\\d{1,3}-\\d{1,3}(?:-\\d{1,3})?[NSWE]{1}"
grep(data.format, wthrStat1$lat)
But am unsure how to get the individual parts into columns. I've tried a few things like:
wthrStat1$latHr <- ifelse(grepl(data.format, wthrStat1$lat), gsub(????), NA)
but with no luck.
Here's a dput():
> dput(wthrStat1[1:10,] )
structure(list(Station = c("K01R", "K01T", "K03Y", "K04V", "K05F",
"K06D", "K07G", "K07S", "K08D", "K0B9"), lat = c("31-08N", "28-08N",
"48-47N", "38-05-50N", "31-25-16N", "48-53-04N", "42-34-28N",
"47-58-27N", "48-18-03N", "43-20N"), lon = c("092-34W", "094-24W",
"096-57W", "106-10-07W", "097-47-49W", "099-37-15W", "084-48-41W",
"117-25-42W", "102-24-23W", "070-24W")), .Names = c("Station",
"lat", "lon"), row.names = 1940:1949, class = "data.frame")
Any suggestions?
strapplyc
in the gsubfn package will extract each group in the regular expression surrounded with parentheses:
library(gsubfn)
data.format <- "(\\d{1,3})-(\\d{1,3})-?(\\d{1,3})?([NSWE]{1})"
parts <- strapplyc(wthrStat1$lat, data.format, simplify = rbind)
parts[parts == ""] <- "00"
which gives:
> parts
[,1] [,2] [,3] [,4]
[1,] "31" "08" "00" "N"
[2,] "28" "08" "00" "N"
[3,] "48" "47" "00" "N"
[4,] "38" "05" "50" "N"
[5,] "31" "25" "16" "N"
[6,] "48" "53" "04" "N"
[7,] "42" "34" "28" "N"
[8,] "47" "58" "27" "N"
[9,] "48" "18" "03" "N"
[10,] "43" "20" "00" "N"
it is extremely inefficient , I hope someone else had better solution:
dat <- read.table(text =' Station lat lon
1940 K01R 31-08N 092-34W
1941 K01T 28-08N 094-24W
1942 K03Y 48-47N 096-57W
1943 K04V 38-05-50N 106-10-07W
1944 K05F 31-25-16N 097-47-49W
1945 K06D 48-53-04N 099-37-15W', head=T)
pattern <- '([0-9]+)[-]([0-9]+)([-|A-Z]+)([0-9]*)([A-Z]*)'
dat$latHr <- gsub(pattern,'\\1',dat$lat)
dat$latMin <- gsub(pattern,'\\2',dat$lat)
latSec <- gsub(pattern,'\\4',dat$lat)
latSec[nchar(latSec)==0] <- '00'
dat$latSec <- latSec
latDir <- gsub(pattern,'\\5',dat$lat)
latDir[nchar(latDir)==0] <- latDir[nchar(latDir)!=0][1]
dat$latDir <- latDir
dat
Station lat lon latHr latMin latSec latDir
1940 K01R 31-08N 092-34W 31 08 00 N
1941 K01T 28-08N 094-24W 28 08 00 N
1942 K03Y 48-47N 096-57W 48 47 00 N
1943 K04V 38-05-50N 106-10-07W 38 05 50 N
1944 K05F 31-25-16N 097-47-49W 31 25 16 N
1945 K06D 48-53-04N 099-37-15W 48 53 04 N
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With