Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to copy Google Translate's Chinese transliteration using Selenium?

Tags:

java

selenium

I'm trying to extract Google Translate's pinyin transliteration of a Chinese word using Selenium but am having some trouble finding its WebElement.

For example, the word I look up is "事". My code would be as follows:

String word = "事";
WebDriver driver = new HtmlUnitDriver();
driver.get("http://translate.google.com/#zh-CN/zh-CN/" + word); 

When I go to the actual page using my browser, I can see that its pinyin is "Shì" and that its id, according to Inspect Element is src-translit. However, when I go to view source, though the id="src-translit" is present, you don't see anything resembling "Shì" nearby. It's simply empty.

Thinking that the page has had no time to load properly. I implemented a waiting period of 30 seconds (kind of a long wait, I know, but I just wanted to know if it would work).

int timeoutInSeconds = 30;
WebDriverWait wait = new WebDriverWait(driver, timeoutInSeconds); 
wait.until(ExpectedConditions.visibilityOfElementLocated(By.id("src-translit")));

Unfortunately, even with the wait time, transliteration and its text still returns as empty.

WebElement transliteration = driver.findElement(By.id("src-translit"));
String pinyin = transliteration.getText();

My question, then, is: what's happened to the src-translit? Why won't it display in the html code and how can I go about finding it and copying it from Google Translate?

like image 420
user2323030 Avatar asked Dec 06 '25 05:12

user2323030


1 Answers

Sounds like javascript isn't being executed. Looking at the docs, you can enable javascript like this

HtmlUnitDriver driver = new HtmlUnitDriver();
driver.setJavascriptEnabled(true);

or

HtmlUnitDriver driver = new HtmlUnitDriver(true);

See if that makes a difference.

EDIT:

I still think the problem is related to javascript. When I run it using FirefoxDriver, it works fine: the AJAX request is made, and src-translit element has been updated with Shi.

Workaround:

In any case, monitoring the network traffic, you can see that when you want to translate 事 , it makes an AJAX call to

http://translate.google.com/translate_a/t?client=t&sl=zh-CN&tl=zh-CN&hl=en&sc=2&ie=UTF-8&oe=UTF-8&pc=1&oc=1&otf=1&rom=1&srcrom=1&ssel=0&tsel=0&q=%E6%B2%92%E4%BA%8B

Which returns JSON:

[[["事","事","Shì","Shì"]],,"zh-CN",,[["事",,false,false,0,0,0,0]],,,,[],10]

Maybe you could parse that instead for now.

like image 61
MxLDevs Avatar answered Dec 08 '25 18:12

MxLDevs