Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Elasticsearch bulk upload error with PHP - Limit of total fields [1000] in index has been exceeded

We are planning use ElasticSearch in one of our projects. Currently, we are testing ElasticSearch 5.0.1 with our data. One issue we are facing is when we are doing a bulk upload from our MySQL tables to elasticsearch following error we are getting...

java.lang.IllegalArgumentException: Limit of total fields [1000] in index [shopfront] has been exceeded
at org.elasticsearch.index.mapper.MapperService.checkTotalFieldsLimit(MapperService.java:482) ~[elasticsearch-5.0.1.jar:5.0.1]
at org.elasticsearch.index.mapper.MapperService.merge(MapperService.java:343) ~[elasticsearch-5.0.1.jar:5.0.1]
at org.elasticsearch.index.mapper.MapperService.merge(MapperService.java:277) ~[elasticsearch-5.0.1.jar:5.0.1]
at org.elasticsearch.cluster.metadata.MetaDataMappingService$PutMappingExecutor.applyRequest(MetaDataMappingService.java:323) ~[elasticsearch-5.0.1.jar:5.0.1]
at org.elasticsearch.cluster.metadata.MetaDataMappingService$PutMappingExecutor.execute(MetaDataMappingService.java:241) ~[elasticsearch-5.0.1.jar:5.0.1]
at org.elasticsearch.cluster.service.ClusterService.runTasksForExecutor(ClusterService.java:555) ~[elasticsearch-5.0.1.jar:5.0.1]
at org.elasticsearch.cluster.service.ClusterService$UpdateTask.run(ClusterService.java:896) ~[elasticsearch-5.0.1.jar:5.0.1]
at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingRunnable.run(ThreadContext.java:451) ~[elasticsearch-5.0.1.jar:5.0.1]
at org.elasticsearch.common.util.concurrent.PrioritizedEsThreadPoolExecutor$TieBreakingPrioritizedRunnable.runAndClean(PrioritizedEsThreadPoolExecutor.java:238) ~[elasticsearch-5.0.1.jar:5.0.1]
at org.elasticsearch.common.util.concurrent.PrioritizedEsThreadPoolExecutor$TieBreakingPrioritizedRunnable.run(PrioritizedEsThreadPoolExecutor.java:201) ~[elasticsearch-5.0.1.jar:5.0.1]
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) [?:1.8.0_111]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) [?:1.8.0_111]
at java.lang.Thread.run(Thread.java:745) [?:1.8.0_111]

We are using PHP as elasticsearch client to doing the bulk upload from MySQL to Elastic. After doing some googling I got this piece of info - https://discuss.elastic.co/t/es-2-3-5-x-metricbeat-index-field-limit/66821

Somewhere also I read that using of "index.mapping.total_fields.limit" will fix the thing. But, can't able to understand how to using that in my PHP code. Here is my PHP code.

$params = ['body' => []];

$i = 1;
foreach ($productsList as $key => $value) {

    $params['body'][] = [
        'index' => [
            '_index' => 'shopfront',
            '_type' => 'products'
        ],
        'settings' => ['index.mapping.total_fields.limit' => 3000]
    ];

    $params['body'][] = [
        'product_displayname' => $value['product_displayname'],
        'product_price' => $value['product_price'],
        'popularity' => $value['popularity'],
        'lowestcomp_price' => $value['lowestcomp_price']
    ];

    // Every 1000 documents stop and send the bulk request
    if ($i % 1000 == 0) {
        $responses = $client->bulk($params);

        // erase the old bulk request
        $params = ['body' => []];

        // unset the bulk response when you are done to save memory
        unset($responses);
    }

    $i++;
}

// Send the last batch if it exists
if (!empty($params['body'])) {
    $responses = $client->bulk($params);
}

NOTE - I've used same code with Elasticsearch 2.4.1 & it's working fine with that.

like image 701
Suresh Avatar asked Dec 01 '22 15:12

Suresh


2 Answers

In ES 5, the ES folks decided to limit the number of fields that a mapping type can contain to prevent a mapping explosion. As you've noticed, that limit has been set at 1000 fields per mapping, but you can lift that limit to suit your needs by specifying the index.mapping.total_fields.limit setting either at index creation time or by updating the index settings, like this:

curl -XPUT 'localhost:9200/shopfront/_settings' -d '
{
    "index.mapping.total_fields.limit": 3000
}'

Note that you also need to ask yourself whether having that many fields is a good thing. Do you need them all? Can you combine some? etc, etc

like image 54
Val Avatar answered Dec 06 '22 15:12

Val


This feature was decided in this github issue. Two ways to solve this problem:

You can specify a greater value when creating the index:

PUT test
{
  "shopfront": {
    "index.mapping.total_fields.limit": 2000,
    "number_of_shards": 5,
    "number_of_replicas": 2
  },
  "mappings": {
    ...
  }
}

Or if you would like to increase the limit for an existing index:

PUT shopfront/_settings
{
  "index.mapping.total_fields.limit": 2000
}
like image 44
Hyder B. Avatar answered Dec 06 '22 17:12

Hyder B.