How to read table from PDF using itextsharp?

Tags:

I am having an problem with reading a table from pdf file. It's a very simple pdf file with some text and a table. The tool i am using is itextsharp. I know there is no table concept in PDF. After some googling, someone said it might be possible to achieve that using itextsharp + custom ITextExtractionStrategy. But I have no idea how to start it. Can someone please give me some hints? or a small piece of sample code?

Cheers

452

asked Mar 28 '13 10:03

Victor

1 Answers

This code is for reading a table content. all the values are enclosed by ()Tj, so we look for all the values, you can do anything then with the string resulst.

    string _filePath = @"~\MyPDF.pdf";
    public List<String> Read()
    {
        var pdfReader = new PdfReader(_filePath);
        var pages = new List<String>();

        for (int i = 0; i < pdfReader.NumberOfPages; i++)
        {
            string textFromPage = Encoding.UTF8.GetString(Encoding.Convert(Encoding.Default, Encoding.UTF8, pdfReader.GetPageContent(i + 1)));

            pages.Add(GetDataConvertedData(textFromPage));
        }

        return pages;
    }

    string GetDataConvertedData(string textFromPage)
    {
        var texts = textFromPage.Split(new[] { "\n" }, StringSplitOptions.None)
                                .Where(text => text.Contains("Tj")).ToList();

        return texts.Aggregate(string.Empty, (current, t) => current + 
                   t.TrimStart('(')
                    .TrimEnd('j')
                    .TrimEnd('T')
                    .TrimEnd(')'));
    }

169

answered Sep 28 '22 08:09

gustavo.a.hansen

Related questions
                            
                                Create a rectangle, add paragraph inside that and adjust the height of the rectangle according to the text using iText
                            
                                Itext PDF do not display correctly Myanmar Unicode Font
                            
                                Can't open PDF file with PyPDF2
                            
                                What value to use for .MoveUp of canvas
                            
                                Is there any java library for converting document from pdf to html?
                            
                                c# Read line from PDF
                            
                                reading/writing xmp metadatas on pdf files through pypdf
                            
                                PDF Previewing and viewing
                            
                                Java Pdf Diff library
                            
                                Text overflowing tables when generating PDF with dompdf
                            
                                Generating PDF documents in ASP.NET [duplicate]
                            
                                Scroll PDF embedded in HTML
                            
                                Import PDF file into XeLaTeX gives "pdf_link_obj(): passed invalid object" error
                            
                                How do I retrieve a page number or page reference for an Outline destination in a PDF on iOS?
                            
                                Crop Stamps.com PDF label
                            
                                Does PDF::API2 support reading PDF 1.5+ with compressed XRef?
                            
                                Programmatically add stamp layer to PDF document
                            
                                How to Export pdf to OutputStream with wkhtmltopdf in java
                            
                                PDF export printing in Internet Explorer
                            
                                Create PDF file taking huge data from MySQL using PHP

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

How to read table from PDF using itextsharp?

Tags:

pdf

itext

Victor

People also ask

1 Answers

gustavo.a.hansen

Recent Activity

Donate For Us