Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Should I prefer integers or strings in my Solr schema if a field will fit either one?

Tags:

solr

lucene

Say that I have a field in my Solr schema that either has the value 1, 2, 3 or 4. I do no arithmetic on this field. The field is a status of the record. It could just as easily be A, B, C or D. Each of the 11,000,000 records has one of these statuses.

In this question an answer says that ints are "more memory-efficient", so that's a start. Are there other factors to consider? Does one match faster than the other?

This field is not going to be sorted. The values are arbitrary, and we'll never do a sort. It's only going to be used in filter queries.

like image 977
Andy Lester Avatar asked Jan 14 '23 12:01

Andy Lester


2 Answers

Will you ever query on a range? So if your 1...4 is really marking statuses of say Bad to Great, would you ever query on records from 1-2? This is the only thing of where you may need them to be ints (and, since you only have 4, it's not that big of a deal).

My rule in data storage is that if the int will never be used as an int, then store it as a string. It may require more space, etc. but you can do more string manipulations, etc. And the memory requirements of 11m records may not matter if that one field is a string or int (11m is a lot of records, but not a heavy load for Solr/Lucene).

like image 76
MikeHoss Avatar answered Jan 17 '23 01:01

MikeHoss


With only 4 distinct values, int and String will perform very similarly for filter queries, sorting and even range queries.

like image 37
jpountz Avatar answered Jan 17 '23 00:01

jpountz