Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Android- Get text from PDF

Tags:

android

pdf

I want to read text from a PDF file present in SD card.How can we get text from a PDF file which is stored in sd card?

I tried like:

public class MainActivity extends ActionBarActivity implements TextToSpeech.OnInitListener {

    private TextToSpeech tts;
    private String line = null;

    @Override
    public void onCreate(Bundle savedInstanceState) {
        super.onCreate(savedInstanceState);
        setContentView(R.layout.activity_main);

        tts = new TextToSpeech(getApplicationContext(), this);

        final TextView text1 = (TextView) findViewById(R.id.textView1);

        findViewById(R.id.button1).setOnClickListener(new OnClickListener() {

            private String[] arr;

            @Override
            public void onClick(View v) {
                File sdcard = Environment.getExternalStorageDirectory();

                // Get the text file

                File file = new File(sdcard, "test.pdf");

                // ob.pathh
                // Read text from file

                StringBuilder text = new StringBuilder();
                try {
                    BufferedReader br = new BufferedReader(new                            FileReader(file));

                    // int i=0;
                    List<String> lines = new ArrayList<String>();

                    while ((line = br.readLine()) != null) {
                        lines.add(line);
                        // arr[i]=line;
                        // i++;
                        text.append(line);
                        text.append('\n');
                    }
                    for (String string : lines) {
                        tts.speak(string, TextToSpeech.SUCCESS, null);
                    }
                    arr = lines.toArray(new String[lines.size()]);
                    System.out.println(arr.length);
                    text1.setText(text);

                } catch (Exception e) {
                    e.printStackTrace();
                }

            }
        });

    }

    @Override
    public void onInit(int status) {
        if (status == TextToSpeech.SUCCESS) {
            int result = tts.setLanguage(Locale.US);
            if (result == TextToSpeech.LANG_MISSING_DATA || result == TextToSpeech.LANG_NOT_SUPPORTED) {
                Log.e("TTS", "This Language is not supported");
            } else {
                // speakOut();
            }

        } else {
            Log.e("TTS", "Initilization Failed!");
        }
    }

}

Note: It's working fine if the file is text file (test.txt) but not working for pdf (test.pdf)

But here the text is not getting from PDF as it is, it's getting like byte code. How can I achieve this?

Thanks in advance.

like image 420
Shailendra Madda Avatar asked Apr 21 '15 05:04

Shailendra Madda


People also ask

Can you pull text from a PDF?

Copy specific content from a PDFRight-click the document, and choose Select Tool from the pop-up menu. Drag to select text, or click to select an image. Right-click the selected item, and choose Copy.

Can you copy text from a PDF on Android?

For Android users, it is possible to copy text from PDF files. However, it is important to note that while you can copy the text, it may not copy the text exactly and you may lose the formatting. However, this shouldn't be a problem if you are using the right tool.


2 Answers

I have got the solution with iText.

Gradle,

compile 'com.itextpdf:itextg:5.5.10'

Java,

  try {
            String parsedText="";
            PdfReader reader = new PdfReader(yourPdfPath);
            int n = reader.getNumberOfPages();
            for (int i = 0; i <n ; i++) {
                parsedText   = parsedText+PdfTextExtractor.getTextFromPage(reader, i+1).trim()+"\n"; //Extracting the content from the different pages
            }
            System.out.println(parsedText);
            reader.close();
        } catch (Exception e) {
            System.out.println(e);
        }
like image 97
REMITH Avatar answered Oct 10 '22 01:10

REMITH


PDF format is not your normal text file.. You need to do a little more research on PDFs this is the best answer you'll get How to read pdf in my android application?

like image 27
BSathvik Avatar answered Oct 10 '22 00:10

BSathvik