Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Find a best row in the data frame

Tags:

I have a data set with some locations:

ex <- data.frame(lat = c(55, 60, 40), long = c(6, 6, 10))

and than I have climate data

clim <- structure(list(lat = c(55.047, 55.097, 55.146, 55.004, 55.054, 
55.103, 55.153, 55.202, 55.252, 55.301), long = c(6.029, 6.0171, 
6.0051, 6.1269, 6.1151, 6.1032, 6.0913, 6.0794, 6.0675, 6.0555
), alt = c(0.033335, 0.033335, 0.033335, 0.033335, 0.033335, 
0.033335, 0.033335, 0.033335, 0.033335, 0.033335), x = c(0, 0, 
0, 0, 0, 0, 0, 0, 0, 0), y = c(1914, 1907.3, 1901.8, 1921.1, 
1914.1, 1908.3, 1902.4, 1896, 1889.8, 1884)), row.names = c(NA, 
10L), class = "data.frame", .Names = c("lat", "long", "alt", 
"x", "y"))

      lat   long      alt x      y
1  55.047 6.0290 0.033335 0 1914.0
2  55.097 6.0171 0.033335 0 1907.3
3  55.146 6.0051 0.033335 0 1901.8
4  55.004 6.1269 0.033335 0 1921.1
5  55.054 6.1151 0.033335 0 1914.1
6  55.103 6.1032 0.033335 0 1908.3
7  55.153 6.0913 0.033335 0 1902.4
8  55.202 6.0794 0.033335 0 1896.0
9  55.252 6.0675 0.033335 0 1889.8
10 55.301 6.0555 0.033335 0 1884.0

What I want to do is to "merge" both datasets to have climate data in the ex file. The values of lat and long in ex are different than values of lat and long in clim so I they can not be merged directly (it is the same for long). I need to find the best point (closest point in clim for each of row in the ex considering both lat and long)

The expected output for the example is:

  lat long      alt x      y
1  55    6 0.033335 0 1914.0
2  60    6 0.033335 0 1884.0
3  40   10 0.033335 0 1921.1
like image 200
Mateusz1981 Avatar asked May 24 '18 06:05

Mateusz1981


People also ask

How do you select the top 5 rows in Python?

In Python's Pandas module, the Dataframe class provides a head() function to fetch top rows from a Dataframe i.e. It returns the first n rows from a dataframe. If n is not provided then default value is 5.

How do I find the maximum row in a DataFrame?

To find maximum value of every row in DataFrame just call the max() member function with DataFrame object with argument axis=1 i.e. It returned a series with row index label and maximum value of each row.


1 Answers

The function dist can be used to calculate Euclidean (or other) distances between all points in a matrix or data frame, so a way of finding the points in clim that are closest to those in ex is by

# Distance between all points in ex and clim combined,
# with distances between points in same matrix filtered out
n <- nrow(ex)
tmp <- as.matrix(dist(rbind(ex, clim[, 1:2])))[-(1:n), 1:n]

# Indices in clim corresponding to the closest points to those in ex
idx <- apply(tmp, 2, which.min)

# Points from ex with additional info from closest points in clim
cbind(ex, clim[idx, -(1:2)])
#>    lat long      alt x      y
#> 1   55    6 0.033335 0 1914.0
#> 10  60    6 0.033335 0 1884.0
#> 4   40   10 0.033335 0 1921.1
like image 100
janusvm Avatar answered Oct 06 '22 04:10

janusvm