Say that I have a field in my Solr schema that either has the value 1, 2, 3 or 4. I do no arithmetic on this field. The field is a status of the record. It could just as easily be A, B, C or D. Each of the 11,000,000 records has one of these statuses.
In this question an answer says that ints are "more memory-efficient", so that's a start. Are there other factors to consider? Does one match faster than the other?
This field is not going to be sorted. The values are arbitrary, and we'll never do a sort. It's only going to be used in filter queries.
Will you ever query on a range? So if your 1...4 is really marking statuses of say Bad to Great, would you ever query on records from 1-2? This is the only thing of where you may need them to be ints (and, since you only have 4, it's not that big of a deal).
My rule in data storage is that if the int will never be used as an int, then store it as a string. It may require more space, etc. but you can do more string manipulations, etc. And the memory requirements of 11m records may not matter if that one field is a string or int (11m is a lot of records, but not a heavy load for Solr/Lucene).
With only 4 distinct values, int and String will perform very similarly for filter queries, sorting and even range queries.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With