Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

wkhtmltopdf relative paths in HTML with redirected in/out streams won't work

I am using wkhtmltopdf.exe (version 0.12.0 final) to generate pdf files from html files, I do this with .NET C#

My problem is getting javascript, stylesheets and images to work by only specifying relative paths in the html. Right now I have it working if I use absolute paths. But it doesn't work with relative paths, which makes the whole html generation a bit to complicated. I have boiled what I do down to the following example:

string CMDPATH = @"C:\Program Files\wkhtmltopdf\bin\wkhtmltopdf.exe";
string HTML = string.Format(
    "<div><img src=\"{0}\" /></div><div><img src=\"{1}\" /></div><div>{2}</div>",
    "./sohlogo.png",
    "./ACLASS.jpg",
    DateTime.Now.ToString());

WriteFile(HTML, "test.html");

Process p;
ProcessStartInfo psi = new ProcessStartInfo();

psi.FileName = CMDPATH;
psi.UseShellExecute = false;
psi.WorkingDirectory = AppDomain.CurrentDomain.BaseDirectory;
psi.CreateNoWindow = true;
psi.RedirectStandardInput = true;
psi.RedirectStandardOutput = true;
psi.RedirectStandardError = true;

psi.Arguments = "-q - -";

p = Process.Start(psi);

StreamWriter stdin = p.StandardInput;
stdin.AutoFlush = true;
stdin.Write(HTML);
stdin.Dispose();

MemoryStream pdfstream = new MemoryStream();
CopyStream(p.StandardOutput.BaseStream, pdfstream);
p.StandardOutput.Close();
pdfstream.Position = 0;

WriteFile(pdfstream, "test.pdf");

p.WaitForExit(10000);
int test = p.ExitCode;

p.Dispose();

I have tried relative paths like: "./sohlogo.png" and simply "sohlogo.png" both displays correctly in the browser via the html file. But none of them work in the pdf file. There is no data in the error stream.

The following commandline works like a charm with the relative paths:

"c:\Program Files\wkhtmltopdf\bin\wkhtmltopdf.exe" test.html test.pdf

I could really need some input at this stage. So any help is much appreciated!

Just for reference the WriteFile and CopyStream methods looks like this:

public static void WriteFile(MemoryStream stream, string path)
{
    using (FileStream writer = new FileStream(path, FileMode.Create))
    {
        byte[] bytes = stream.ToArray();
        writer.Write(bytes, 0, bytes.Length);
        writer.Flush();
    }
}

public static void WriteFile(string text, string path)
{
    using (StreamWriter writer = new StreamWriter(path))
    {
        writer.WriteLine(text);
        writer.Flush();
    }
}

public static void CopyStream(Stream input, Stream output)
{
    byte[] buffer = new byte[32768];
    int read;
    while ((read = input.Read(buffer, 0, buffer.Length)) > 0)
    {
        output.Write(buffer, 0, read);
    }
}

EDIT: My Workaround for Neo Nguyen.

I could not get this to work with relative paths. So what I did instead was a method that prepends all paths with a root path. It solves my problem so maybe it will solve yours:

/// <summary>
/// Prepends the basedir x in src="x" or href="x" to the input html text
/// </summary>
/// <param name="html">the initial html</param>
/// <param name="basedir">the basedir to prepend</param>
/// <returns>the new html</returns>
public static string MakeRelativePathsAbsolute(string html, string basedir)
{
    string pathpattern = "(?:href=[\"']|src=[\"'])(.*?)[\"']";

    // SM20140214: tested that both chrome and wkhtmltopdf.exe understands "C:\Dir\..\image.png" and "C:\Dir\.\image.png"
    //             Path.Combine("C:/
    html = Regex.Replace(html, pathpattern, new MatchEvaluator((match) =>
        {
            string newpath = UrlEncode(Path.Combine(basedir, match.Groups[1].Value));
            if (!string.IsNullOrEmpty(match.Groups[1].Value))
            {
                string result = match.Groups[0].Value.Replace(match.Groups[1].Value, newpath);
                return result;
            }
            else
            {
                return UrlEncode(match.Groups[0].Value);
            }
        }));

    return html;
}

private static string UrlEncode(string url)
{
    url = url.Replace(" ", "%20").Replace("#", "%23");
    return url;
}

I tried different System.Uri.Escape*** methods like System.Uri.EscapeDataString(). But they ended up doing to severe url encoding for wkhtmltopdf to understand it. Because of lack of time I just did the quick and dirty UrlEncode above.

like image 310
smerlung Avatar asked Feb 14 '14 09:02

smerlung


1 Answers

as per official docs of the command line , there is an option called --cache-dir.

seems like they meant the working directory. I use it and it works with v0.12.3

wkhtmltopdf /my/path/to/index.html test.pdf --cache-dir /my/path/to
like image 99
Abdennour TOUMI Avatar answered Sep 19 '22 02:09

Abdennour TOUMI