Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

ITextSharp Parsing HTML with Images in it: It parses correctly but wont show images

I am trying to generate a .pdf from html using the library ITextSharp. I am able to create the pdf with the html text converted to pdf text/paragraphs

My Problem: The pdf does not show my images(my img elements from the html). All my img html elements in my html dont get displayed in the pdf? Is it possible for ITextSharp to parse HTML & display images. I really hope so otherwise I am stuffed :(

I am linking to the correct directory where the images are(using IMG_BASURL) but they are just not showing

My code:

// mainContents variable is a string containing my HTML
var document = new Document(PageSize.A4, 50, 50, 80, 100);
var output = new MemoryStream();
var writer = PdfWriter.GetInstance(document, output);
document.open();

Hashtable providers = new Hashtable();
providers.Add("img_baseurl","C:/users/xx/VisualStudio/Projects/myproject/");
var parsedHtmlElements = HTMLWorker.ParseToList(new StringReader(mainContents), null, providers);
foreach (var htmlElement in parsedHtmlElements)
   document.Add(htmlElement as IElement);

document.Close();
like image 355
sazr Avatar asked Feb 02 '23 07:02

sazr


1 Answers

Every time that I've encountered this the problem was that the image was too large for the canvas. More specifically, even a naked IMG tag internally will get wrapped in a Chunk that will get wrapped in a Paragraph, and I think that the image is overflowing the Paragraph but I'm not 100% sure.

The two easy fixes are to either enlarge the canvas or to specify image dimensions on the HTML IMG tag. The third more complex route would be to use an additional provider IMG_PROVIDER. To do this you need to implement the IImageProvider interface. Below is a very simple version of one

    public class ImageThing : IImageProvider {
        //Store a reference to the main document so that we can access the page size and margins
        private Document MainDoc;
        //Constructor
        public  ImageThing(Document doc) {
            this.MainDoc = doc;
        }
        Image IImageProvider.GetImage(string src, IDictionary<string, string> attrs, ChainedProperties chain, IDocListener doc) {
            //Prepend the src tag with our path. NOTE, when using HTMLWorker.IMG_PROVIDER, HTMLWorker.IMG_BASEURL gets ignored unless you choose to implement it on your own
            src = Environment.GetFolderPath(Environment.SpecialFolder.Desktop) + @"\" + src;
            //Get the image. NOTE, this will attempt to download/copy the image, you'd really want to sanity check here
            Image img = Image.GetInstance(src);
            //Make sure we got something
            if (img == null) return null;
            //Determine the usable area of the canvas. NOTE, this doesn't take into account the current "cursor" position so this might create a new blank page just for the image
            float usableW = this.MainDoc.PageSize.Width - (this.MainDoc.LeftMargin + this.MainDoc.RightMargin);
            float usableH = this.MainDoc.PageSize.Height - (this.MainDoc.TopMargin + this.MainDoc.BottomMargin);
            //If the downloaded image is bigger than either width and/or height then shrink it
            if (img.Width > usableW || img.Height > usableH) {
                img.ScaleToFit(usableW, usableH);
            }
            //return our image
            return img;
        }
    }

To use this provider just add it to the provider collection like you did with HTMLWorker.IMG_BASEURL:

providers.Add(HTMLWorker.IMG_PROVIDER, new ImageThing(doc));

It should be noted that if you use HTMLWorker.IMG_PROVIDER that you are responsible for figuring out everything about the image. The code above assumes that all image paths need to be prepended with a constant string, you'll probably want to update this and check for HTTP at the start. Also, because we're saying that we want to completely handle the image processing pipeline the provider HTMLWorker.IMG_BASEURL is no longer needed.

The main code loop would now look something like this:

        string html = @"<img src=""Untitled-1.png"" />";
        string outputFile = Path.Combine(Environment.GetFolderPath(Environment.SpecialFolder.Desktop), "HtmlTest.pdf");
        using (FileStream fs = new FileStream(outputFile, FileMode.Create, FileAccess.Write, FileShare.None)) {
            using (Document doc = new Document(PageSize.A4, 50, 50, 80, 100)) {
                using (PdfWriter writer = PdfWriter.GetInstance(doc, fs)) {
                    doc.Open();
                    using (StringReader sr = new StringReader(html)) {
                        System.Collections.Generic.Dictionary<string, object> providers = new System.Collections.Generic.Dictionary<string, object>();
                        providers.Add(HTMLWorker.IMG_PROVIDER, new ImageThing(doc));

                        var parsedHtmlElements = HTMLWorker.ParseToList(sr, null,  providers);
                        foreach (var htmlElement in parsedHtmlElements) {
                            doc.Add(htmlElement as IElement);
                        }
                    }
                    doc.Close();
                }
            }
        }

One last thing, make sure to specify which version of iTextSharp you are targetting when posting here. The code above targets iTextSharp 5.1.2.0 but I think you might be using the 4.X series.

like image 115
Chris Haas Avatar answered Feb 05 '23 20:02

Chris Haas