Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Unable to merge 2 PDFs using MemoryStream

Tags:

c#

wkhtmltopdf

I have a c# class that takes an HTML and converts it to PDF using wkhtmltopdf.
As you will see below, I am generating 3 PDFs - Landscape, Portrait, and combined of the two.

The properties object contains the html as a string, and the argument for landscape/portrait.

System.IO.MemoryStream PDF = new WkHtmlToPdfConverter().GetPdfStream(properties);
System.IO.FileStream file = new System.IO.FileStream("abc_landscape.pdf", System.IO.FileMode.Create);
PDF.Position = 0;

properties.IsHorizontalOrientation = false;
System.IO.MemoryStream PDF_portrait = new WkHtmlToPdfConverter().GetPdfStream(properties);
System.IO.FileStream file_portrait = new System.IO.FileStream("abc_portrait.pdf", System.IO.FileMode.Create);
PDF_portrait.Position = 0;

System.IO.MemoryStream finalStream = new System.IO.MemoryStream();
PDF.CopyTo(finalStream);
PDF_portrait.CopyTo(finalStream);
System.IO.FileStream file_combined = new System.IO.FileStream("abc_combined.pdf", System.IO.FileMode.Create);

try
{
    PDF.WriteTo(file);
    PDF.Flush();

    PDF_portrait.WriteTo(file_portrait);
    PDF_portrait.Flush();

    finalStream.WriteTo(file_combined);
    finalStream.Flush();
}
catch (Exception)
{
    throw;
}
finally
{
    PDF.Close();
    file.Close();

    PDF_portrait.Close();
    file_portrait.Close();

    finalStream.Close();
    file_combined.Close();
}

The PDFs "abc_landscape.pdf" and "abc_portrait.pdf" generate correctly, as expected, but the operation fails when I try to combine the two in a third pdf (abc_combined.pdf).

I am using MemoryStream to preform the merge, and at the time of debug, I can see that the finalStream.length is equal to the sum of the previous two PDFs. But when I try to open the PDF, I see the content of just 1 of the two PDFs.
The same can be seen below: PDF sizes

Additionally, when I try to close the "abc_combined.pdf", I am prompted to save it, which does not happen with the other 2 PDFs. Saving prompt

Below are a few things that I have tried out already, to no avail:

  1. Change CopyTo() to WriteTo()
  2. Merge the same PDF (either Landscape or Portrait one) with itself

    In case it is required, below is the elaboration of the GetPdfStream() method.
var htmlStream = new MemoryStream();
var writer = new StreamWriter(htmlStream);
writer.Write(htmlString);
writer.Flush();
htmlStream.Position = 0;
return htmlStream;

Process process = Process.Start(psi);
process.EnableRaisingEvents = true;
try
{
    process.Start();
    process.BeginErrorReadLine();

    var inputTask = Task.Run(() =>
    {
        htmlStream.CopyTo(process.StandardInput.BaseStream);
        process.StandardInput.Close();
    });

    // Copy the output to a memorystream
    MemoryStream pdf = new MemoryStream();
    var outputTask = Task.Run(() =>
    {
        process.StandardOutput.BaseStream.CopyTo(pdf);
    });

    Task.WaitAll(inputTask, outputTask);

    process.WaitForExit();

    // Reset memorystream read position
    pdf.Position = 0;

    return pdf;
}
catch (Exception ex)
{
    throw ex;
}
finally
{
    process.Dispose();
}
like image 455
Sanketh. K. Jain Avatar asked Aug 23 '19 10:08

Sanketh. K. Jain


2 Answers

This answer from Stack Overflow (Combine two (or more) PDF's) by Andrew Burns works for me:

        using (PdfDocument one = PdfReader.Open("pdf 1.pdf", PdfDocumentOpenMode.Import))
        using (PdfDocument two = PdfReader.Open("pdf 2.pdf", PdfDocumentOpenMode.Import))
        using (PdfDocument outPdf = new PdfDocument())
        {
            CopyPages(one, outPdf);
            CopyPages(two, outPdf);

            outPdf.Save("file1and2.pdf");
        }

        void CopyPages(PdfDocument from, PdfDocument to)
        {
            for (int i = 0; i < from.PageCount; i++)
            {
                to.AddPage(from.Pages[i]);
            }
        }
like image 125
Alexander Bruun Avatar answered Oct 26 '22 11:10

Alexander Bruun


Merging pdf in C# or any other language is not straight forward with out using 3rd party library.

I assume your requirement for not using library is that most Free libraries, nuget packages has limitation or/and cost money for commercial use.

I have made research and found you an Open Source library called PdfClown with nuget package, it is also available for Java. It is Free with out limitation (donate if you like). The library has a lot of features. One such you can merge 2 or more documents to one document.

I supply my example that take a folder with multiple pdf files, merged it and save it to same or another folder. It is also possible to use MemoryStream, but I do not find it necessary in this case.

The code is self explaining, the key point here is using SerializationModeEnum.Incremental:

public static void MergePdf(string srcPath, string destFile)
{
    var list = Directory.GetFiles(Path.GetFullPath(srcPath));
    if (string.IsNullOrWhiteSpace(srcPath) || string.IsNullOrWhiteSpace(destFile) || list.Length <= 1)
        return;
    var files = list.Select(File.ReadAllBytes).ToList();
    using (var dest = new org.pdfclown.files.File(new org.pdfclown.bytes.Buffer(files[0])))
    {
        var document = dest.Document;
        var builder = new org.pdfclown.tools.PageManager(document);
        foreach (var file in files.Skip(1))
        {
            using (var src = new org.pdfclown.files.File(new org.pdfclown.bytes.Buffer(file)))
            { builder.Add(src.Document); }
        }

        dest.Save(destFile, SerializationModeEnum.Incremental);
    }
}

To test it

var srcPath = @"C:\temp\pdf\input";
var destFile = @"c:\temp\pdf\output\merged.pdf";
MergePdf(srcPath, destFile);

Input examples
PDF doc A and PDF doc B

PDF doc A and PDF doc B

Output example

Merged

Links to my research:

  • https://csharp-source.net/open-source/pdf-libraries
  • https://sourceforge.net/projects/clown/
  • https://www.oipapio.com/question-3526089

Disclaimer: A part of this answer is taken from my my personal web site https://itbackyard.com/merge-multiple-pdf-files-to-one-pdf-file-in-c/ with source code to github.

like image 21
Maytham Avatar answered Oct 26 '22 12:10

Maytham