Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Translating source code into a foreign language

I'm running an educational website which is teaching programming to kids (12-15 years old).

As they don't all speak English in the code source of the solutions we are using French variables and functions names. However we are planing to translate the content into other languages (German, Spanish, English). To do so I would like to translate the source code as fast as possible. We mostly have C/C++ code.

The solution I'm planning to use :

  1. extract all variables/functions names from the source-code, with their position in the file (where they are declared, used, called...)
  2. remove all language keywords and library functions
  3. ask the translator to provide translations for the remaining names
  4. replace the names in the file

Is there already some open-source code/project that can do that ? (For the points 1,2 and 4)

If there isn't, the most difficult point in the first one : using a C/C++ parser to build a syntactical tree and then extracting the variables with their position seems the way to go. Do you have others ideas ?

Thank you for any advice.

Edit : As noted in a comment I will also need to take care of the comments but there is only a few of them : the complete solution is already explained in plain-text and then we are showing the code-source with self-explained variable/function names. The source code is rarely more that 30/40 lines long and good names must make it understandable without comments if you already know what the code is doing.

Additional info : for the people interested the website is a training platform for the International Olympiads in Informatics and C/C++ (at least the minimum needed for programming contest) is not so difficult to learn by a 12 years old.

like image 710
Loïc Février Avatar asked Aug 27 '11 15:08

Loïc Février


People also ask

How do I translate a code into another language?

Compilers convert one programming language into another. Usually, compilers are used to convert code so the machine can understand it. If we want it to be human-readable, we need a subset of compilers called transpilers. Transpilers also convert code however the output is generally understandable by a human.

Can you translate one coding language to another?

Yes, it is possible to translate programming languages. You can convert the source code from one language into a code in a different language. Interpreting a programming language, however, is unnecessary and not possible at present.

What is source code translated into?

The program (source code) must be translated into machine language so that the computer can execute the program (as the computer only understands machine language). The way that this translation occurs depends on whether the programming language is a compiled language or an interpreted language.

How do I translate a code?

Try putting the code directly into google translate. It does a pretty good job of only translating words. The things it does "accidentaly" translate could be dealt with by running the code through something that replaces them with known substitutes.

Is it possible to translate from one language to another?

Translating from one language to another is definitely possible, and this is literally all a compiler is doing. The language that a compiler spits out as output is generally machine code or assembly, but this is just another language, and there are compilers (sometimes called transpilers or transcompilers) which translate between two languages.

Where to put the text in the source code?

When developing an application an application with only one language in mind, it’s common practice to put the text directly in the source code as it will appear to the end user. Let’s take an HTML element with the text “Confirm password” as an example. Even if you’re using a templating language, it’ll likely look like this in the source code:

How do you translate a Turing complete language?

If a language is Turing Complete, then you have: So to translate from language A to language B, you convert the A code into a Turing Machine, then convert that machine into B code. Of course, in practice, the practical bits get in the way, and this also requires you having the translations accessible to you.


3 Answers

Are you sure you need a full syntax tree for this? I think it would be enough to do lexical analysis to find the identifiers, which is much easier. Then exclude keywords and identifiers that also appear in the header files being included.

In principle it is possible that you want different variables with the same English name to be translated to different words in French/German -- but for educational use the risk of this arising is probably small enough to ignore at first. You could sidestep the issue by writing the original sources with some disambiguating quasi-Hungarian prefixes and then remove these with the same translation mechanism for display to English-speaking end users.

Be sure to let translators see the name they are translating with full context before they choose a translation.

like image 87
hmakholm left over Monica Avatar answered Oct 07 '22 17:10

hmakholm left over Monica


I really think you can use clang (libclang) to parse your sources and do what you want (see here for more information), the good news is that they have python bindings, which will make your life easier if you want to access a translation service or something like that.

like image 37
Tarantula Avatar answered Oct 07 '22 17:10

Tarantula


You don't really need a C/C++ parser, just a simple lexer that gives you elements of the code one by one. Then you get a lot of {, [, 213, ) etc that you simply ignore and write to the result file. You translate whatever consists of only letters (except keywords) and you put them in the output.

Now that I think about it, it's as simple as this:

bool is_letter(char c)
{
    return (c >= 'a' && c <= 'z') || (c >= 'A' && c <= 'Z');
}
bool is_keyword(string &s)
{
    return s == "if" || s == "else" || s == "void" /* rest of them */;
}
void translateCode(istream &in, ostream &out)
{
    while (!in.eof())
    {
        char c = in.get();
        if (is_letter(c))
        {
            string name = "";
            do
            {
                name += c;
                c = in.get();
            } while (is_letter(c) && !in.eof());
            if (is_keyword(name))
                out << name;
            else
                out << translate(name);
        }
        out << c;  // even if is_letter(c) was true, there is a new c from the
                   // while inside that was read (which was not letter), but
                   // not written, so would be written here.
    }
}

I wrote the code in the editor, so there may be minor errors. Tell me if there are any and I'll fix it.

Edit: Explanation:

What the code does is simply to read input character by character, outputting whatever non-letter characters it reads (including spaces, tabs and new lines). If it does see a letter though, it will start putting all the following letters in one string (until it reaches another non-letter). Then if the string was a keyword, it would output the keyword itself. If it was not, would translate it and output it.

The output would have the exact same format as the input.

like image 20
Shahbaz Avatar answered Oct 07 '22 15:10

Shahbaz