I want to use Google Translate in my project. I completed all the formalities with Google. I have the API key also with me. With this key I can easily translate any word with JavaScript. But how to translate the PDF file as we can do in Google Translate site? I found one thing like this:
http://translate.google.com/translate?hl=fr&sl=auto&tl=en&u=http://www.example.com/PDF.pdf
But here I cannot use my key, as a result it takes so much time to translate. So I want to use my Key and translate a PDF file. Please help me out. My approach is like this:
1. One html page I have.
2. One browse button for pdf
3. Upload the file
4. Transalte the pdf with Google API and show in the html page.
I searched it for this pdf translate with but did not find anything. Please help me out.
TL:DR: Use headless browser to render a PDF from the Google's PDF translation service.
PDF is a complex format and can include many components that are text. To translate it I will describe solution from easy one to more advanced.
If you only need the translation without the visual output, you can extract the text and give it to Google Translate.
Since you did not provide information on your project (language, environment, ...) I will redirect you to this thread on how to extract text
If you need to get text from everything in your PDF, well that's pretty hard. To avoid headache (partially) you can convert the PDF to an image (using imagemagick tools or similar) and then you have three options:
OCR the text, but saving the position (some libraries can do that, again since you did not specify your project information, see theses links: #1, #2, #3, #4).
Then translate it with google api, and write the result to the image. For great results you need to take account of text font, color and background color. Pretty difficult, but feasible.
Translate the image using google translate image service. Unfortunately this feature is not available in the public API, so unless doing some reverse engineering, this is not possible.
The solution you provide by using the translate site can be automated quite easily. The reason it's long is because it is an heavy process and you probably won't beat Google.
Using an headless browser, you can get the translation page with your pdf, then observe that the translated content is sitting in an iframe, get that iframe and finally print to PDF.
Here is a short example using SlimerJS (should be compatible for Phantomjs)
var page = require("webpage").create();
// here you may want to setup page size and options
// get the page
page.open('https://translate.google.fr/translate?hl=fr&sl=en&u=http://example.com/pdf-sample.pdf', function(status) {
if (status !== 'success') {
console.log('Unable to access network');
} else {
// find the iframe with querySelector
var iframe_src = page.evaluate(function() {
return document.querySelector('#contentframe').querySelector('iframe').src;
});
console.log('Found iframe: ' + iframe_src);
// render the iframe
page.open(iframe_src, function(status) {
// wait a bit for javascript to translate
// this can be optimized to be triggered in javascript when translation is done
setTimeout(function() {
// print the page into PDF
page.render('/tmp/test.pdf', { format: 'pdf' });
phantom.exit(0);
}, 2000);
});
}
});
Giving this file: http://www.cbu.edu.zm/downloads/pdf-sample.pdf
It produce this result (translated in French): (I posted a screenshot since I cannot embed PDF ;) )
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With