I have a data set with some locations:
ex <- data.frame(lat = c(55, 60, 40), long = c(6, 6, 10))
and than I have climate data
clim <- structure(list(lat = c(55.047, 55.097, 55.146, 55.004, 55.054,
55.103, 55.153, 55.202, 55.252, 55.301), long = c(6.029, 6.0171,
6.0051, 6.1269, 6.1151, 6.1032, 6.0913, 6.0794, 6.0675, 6.0555
), alt = c(0.033335, 0.033335, 0.033335, 0.033335, 0.033335,
0.033335, 0.033335, 0.033335, 0.033335, 0.033335), x = c(0, 0,
0, 0, 0, 0, 0, 0, 0, 0), y = c(1914, 1907.3, 1901.8, 1921.1,
1914.1, 1908.3, 1902.4, 1896, 1889.8, 1884)), row.names = c(NA,
10L), class = "data.frame", .Names = c("lat", "long", "alt",
"x", "y"))
lat long alt x y
1 55.047 6.0290 0.033335 0 1914.0
2 55.097 6.0171 0.033335 0 1907.3
3 55.146 6.0051 0.033335 0 1901.8
4 55.004 6.1269 0.033335 0 1921.1
5 55.054 6.1151 0.033335 0 1914.1
6 55.103 6.1032 0.033335 0 1908.3
7 55.153 6.0913 0.033335 0 1902.4
8 55.202 6.0794 0.033335 0 1896.0
9 55.252 6.0675 0.033335 0 1889.8
10 55.301 6.0555 0.033335 0 1884.0
What I want to do is to "merge" both datasets to have climate data in the ex
file. The values of lat
and long
in ex
are different than values of lat
and long
in clim
so I they can not be merged directly (it is the same for long
).
I need to find the best point (closest point in clim
for each of row in the ex
considering both lat
and long
)
The expected output for the example is:
lat long alt x y
1 55 6 0.033335 0 1914.0
2 60 6 0.033335 0 1884.0
3 40 10 0.033335 0 1921.1
In Python's Pandas module, the Dataframe class provides a head() function to fetch top rows from a Dataframe i.e. It returns the first n rows from a dataframe. If n is not provided then default value is 5.
To find maximum value of every row in DataFrame just call the max() member function with DataFrame object with argument axis=1 i.e. It returned a series with row index label and maximum value of each row.
The function dist
can be used to calculate Euclidean (or other) distances between all points in a matrix or data frame, so a way of finding the points in clim
that are closest to those in ex
is by
# Distance between all points in ex and clim combined,
# with distances between points in same matrix filtered out
n <- nrow(ex)
tmp <- as.matrix(dist(rbind(ex, clim[, 1:2])))[-(1:n), 1:n]
# Indices in clim corresponding to the closest points to those in ex
idx <- apply(tmp, 2, which.min)
# Points from ex with additional info from closest points in clim
cbind(ex, clim[idx, -(1:2)])
#> lat long alt x y
#> 1 55 6 0.033335 0 1914.0
#> 10 60 6 0.033335 0 1884.0
#> 4 40 10 0.033335 0 1921.1
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With