I have 1000s and 1000s of PDF articles from which I need to extract only author name and his relevant details like address and email ID and whatever provided inside the PDF (I mean the content inside). I don't want to do this by getting the details associated with the metadata of the PDF. Since I tried that where I end up with only less details like author name, title and some other usual details which I do not need at all.
I have gone via all APIs in internet, but still I did get the solution. I need to do it in Java.
I think you can't get it directly from any library. Use iTest library for reading pdf. Once you are able to read text find the Author using regular expression.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With