Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to properly organize search of the person?

Let's say I have list of persons in my datastore. Each person there may have the following fields:

  • last name (*)
  • first name
  • middle name
  • id (*)
  • driving licence id (*)
  • another id (*)
  • date of birth
  • region
  • place of birth

At least one of the fields marked with (*) must exist.

Now user provides me with the same list of fields (and again at least one of the fields marked with (*) must be provided). I should search for the person user provided. But not all fields should be matched. I should display to the user somehow how I am sure in the results of search. Something like:

  • if person matched by id and last name (and user provided just these 2 fields for the search), then I am sure that result is correct (100%);
  • if person matched by id and last name (and user provided other fields, which were found in the database, but were not matched), then I am sure that result is almost correct by 60%;
  • etc.

(numbers are provided just as example)

How can I organize such search? Is there any standard algorithm? I also would like to minimize number of requests to the database.

P.S. I can not provide user with the actual field values from the database.

like image 878
LA_ Avatar asked Nov 13 '22 02:11

LA_


1 Answers

It sounds like your logic for determining the quality of a match will be too complex to handle at the database layer. I think you'll get the best performance by retrieving all of the records that match at least one of the mandatory keys, calculating the match score for each of them in memory, and returning the best score. For example, if the user provides you with an id, last name and place of birth, your query would look something like:

SELECT * FROM users WHERE id = `the_id` OR last_name = `the_last_name`;

This could be a performance problem if you have a VERY large dataset with lots of common last names but otherwise I would expect not to see too many collisions. You can check this on your own dataset outside of GAE. You could also get better performance if all mandatory fields MUST match by changing the OR to an AND.

like image 55
narced133 Avatar answered Dec 22 '22 06:12

narced133