I have a lot of text data and want to translate it to different languages.
Possible ways I know:
The problem is that all these services have limitations on text length, number of calls, etc. which makes them inconvenient in use.
What services / ways you could advice to use in this case?
If you happen to paste a long text that has more than 5000 characters, you'll get an error message ("maximum characters exceeded: X characters over 5000 maximum") and a "translate more" option that lets you translate the rest of the text.
On your computer, open a document in Google Docs. Translate document. Enter a name for the translated document and select a language. Click Translate.
I had to solve the same problem when integrating language translation with an XMPP chat server. I partitioned my payload (the text I needed to translate) into smaller subsets of complete sentences.
I can’t recall the exact number, but with Google's REST-based translation URL, I translated a set of completed sentences that collectively had a total of less than (or equal to) 1024 characters, so a large paragraph would result in multiple translation service calls.
Break your big text into tokenized strings, and then pass each token through the translator via a loop. Store the translated output in an array and once all tokens are translated and stored in the array, put them back together and you will have a completely translated document.
Just to prove a point, I threw this together :) It is rough around the edges, but it will handle a whole lot of text and it does just as good as Google for translation accuracy because it uses the Google API. I processed Apple's entire 2005 SEC 10-K filing with this code and the click of one button (took about 45 minutes).
The result was basically identical to what you would get if you copied and pasted one sentence at a time into Google Translate. It isn't perfect (ending punctuation is not accurate and I didn't write to the text file line by line), but it does show a proof of concept. It could have better punctuation if you worked with Regex some more.
Imports System.IO
Imports System.Text.RegularExpressions
Public Class Form1
Dim file As New String("Translate Me.txt")
Dim lineCount As Integer = countLines()
Private Function countLines()
If IO.File.Exists(file) Then
Dim reader As New StreamReader(file)
Dim lineCount As Integer = Split(reader.ReadToEnd.Trim(), Environment.NewLine).Length
reader.Close()
Return lineCount
Else
MsgBox(file + " cannot be found anywhere!", 0, "Oops!")
End If
Return 1
End Function
Private Sub translateText()
Dim lineLoop As Integer = 0
Dim currentLine As String
Dim currentLineSplit() As String
Dim input1 As New StreamReader(file)
Dim input2 As New StreamReader(file)
Dim filePunctuation As Integer = 1
Dim linePunctuation As Integer = 1
Dim delimiters(3) As Char
delimiters(0) = "."
delimiters(1) = "!"
delimiters(2) = "?"
Dim entireFile As String
entireFile = (input1.ReadToEnd)
For i = 1 To Len(entireFile)
If Mid$(entireFile, i, 1) = "." Then filePunctuation += 1
Next
For i = 1 To Len(entireFile)
If Mid$(entireFile, i, 1) = "!" Then filePunctuation += 1
Next
For i = 1 To Len(entireFile)
If Mid$(entireFile, i, 1) = "?" Then filePunctuation += 1
Next
Dim sentenceArraySize = filePunctuation + lineCount
Dim sentenceArrayCount = 0
Dim sentence(sentenceArraySize) As String
Dim sentenceLoop As Integer
While lineLoop < lineCount
linePunctuation = 1
currentLine = (input2.ReadLine)
For i = 1 To Len(currentLine)
If Mid$(currentLine, i, 1) = "." Then linePunctuation += 1
Next
For i = 1 To Len(currentLine)
If Mid$(currentLine, i, 1) = "!" Then linePunctuation += 1
Next
For i = 1 To Len(currentLine)
If Mid$(currentLine, i, 1) = "?" Then linePunctuation += 1
Next
currentLineSplit = currentLine.Split(delimiters)
sentenceLoop = 0
While linePunctuation > 0
Try
Dim trans As New Google.API.Translate.TranslateClient("")
sentence(sentenceArrayCount) = trans.Translate(currentLineSplit(sentenceLoop), Google.API.Translate.Language.English, Google.API.Translate.Language.German, Google.API.Translate.TranslateFormat.Text)
sentenceLoop += 1
linePunctuation -= 1
sentenceArrayCount += 1
Catch ex As Exception
sentenceLoop += 1
linePunctuation -= 1
End Try
End While
lineLoop += 1
End While
Dim newFile As New String("Translated Text.txt")
Dim outputLoopCount As Integer = 0
Using output As StreamWriter = New StreamWriter(newFile)
While outputLoopCount < sentenceArraySize
output.Write(sentence(outputLoopCount) + ". ")
outputLoopCount += 1
End While
End Using
input1.Close()
input2.Close()
End Sub
Private Sub translateButton_Click(ByVal sender As System.Object, ByVal e As System.EventArgs) Handles translateButton.Click
translateText()
End Sub
End Class
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With