Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Convert a pdf file to text in C# [closed]

Tags:

c#

text-files

pdf

I need to convert a .pdf file to a .txt file

How can I do this in C#?

like image 808
aharon Avatar asked Dec 22 '09 06:12

aharon


2 Answers

I've had the need myself and I used this article to get me started: http://www.codeproject.com/KB/string/pdf2text.aspx

like image 171
Don Avatar answered Sep 29 '22 22:09

Don


Ghostscript could do what you need. Below is a command for extracting text from a pdf file into a txt file (you can run it from a command line to test if it works for you):

gswin32c.exe -q -dNODISPLAY -dSAFER -dDELAYBIND -dWRITESYSTEMDICT -dSIMPLE -c save -f ps2ascii.ps "test.pdf" -c quit >"test.txt"

Check here: codeproject: Convert PDF to Image Using Ghostscript API for details on how to use ghostscript with C#

like image 34
serge_gubenko Avatar answered Sep 29 '22 20:09

serge_gubenko