How to index a .PDF file in ElasticSearch

Tags:

elasticsearch

I am new to ElasticSearch. I have gone through very basic tutorial on creating Indexes. I do understand the concept of a indexing. I want ElasticSearch to search inside a .PDF File. Based on my understanding of creating Indexes, it seems I need to read the .PDF file and extract all the keywords for indexing. But, I do not understand what steps I need to follow. How do I read .PFD file to extract keywords.

710

asked Jan 18 '16 14:01

KurioZ7

1 Answers

It seems that the elasticsearch-mapper-attachment plugin has been deprecated in 5.0.0 (Released Oct. 26th, 2016). The documentation recommends using the Ingest Attachment Processor Plugin as a replacement.

To install:

sudo bin/elasticsearch-plugin install ingest-attachment

See How to index a pdf file in Elasticsearch 5.0.0 with ingest-attachment plugin? for information on how to use the Ingest Attachment plugin.

answered Oct 05 '22 17:10

Ben.12

Related questions
                            
                                Elasticsearch 503 error when checking server status
                            
                                elasticsearch - Aggregation returns terms in key , but not the complete field, how can I get full field returned?
                            
                                Elasticsearch upserting and appending to array
                            
                                How to solve MapperParsingException: object mapping for [test] tried to parse as object, but got EOF
                            
                                what does the field docs.deleted mean in elasticsearch _cat/indices API response?
                            
                                ElasticSearch: I want to get count with group by
                            
                                Error on creation of a mapping for an index
                            
                                ElasticSearch Analyzer and Tokenizer for Emails
                            
                                How to do an unpretty print on pretty JSON file in shell >> serial string JSON >> ES _bulk?
                            
                                Port 9300 on Elasticsearch
                            
                                What's the difference between bind_host and publish_host in ElasticSearch?
                            
                                check elasticsearch connection status in python
                            
                                Multiple filters and an aggregate in elasticsearch
                            
                                Elasticsearch, Tire, and Nested queries / associations with ActiveRecord
                            
                                How to send elasticsearch multi search request in Postman?
                            
                                How do I update an existing document inside ElasticSearch index using NEST?
                            
                                Perform Elasticsearch aggregation without returning hits array
                            
                                ElasticSearch gives error about queue size
                            
                                How to erase ElasticSearch index?
                            
                                Random document in ElasticSearch

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With