Translating source code into a foreign language

Tags:

I'm running an educational website which is teaching programming to kids (12-15 years old).

As they don't all speak English in the code source of the solutions we are using French variables and functions names. However we are planing to translate the content into other languages (German, Spanish, English). To do so I would like to translate the source code as fast as possible. We mostly have C/C++ code.

The solution I'm planning to use :

extract all variables/functions names from the source-code, with their position in the file (where they are declared, used, called...)
remove all language keywords and library functions
ask the translator to provide translations for the remaining names
replace the names in the file

Is there already some open-source code/project that can do that ? (For the points 1,2 and 4)

If there isn't, the most difficult point in the first one : using a C/C++ parser to build a syntactical tree and then extracting the variables with their position seems the way to go. Do you have others ideas ?

Thank you for any advice.

Edit : As noted in a comment I will also need to take care of the comments but there is only a few of them : the complete solution is already explained in plain-text and then we are showing the code-source with self-explained variable/function names. The source code is rarely more that 30/40 lines long and good names must make it understandable without comments if you already know what the code is doing.

Additional info : for the people interested the website is a training platform for the International Olympiads in Informatics and C/C++ (at least the minimum needed for programming contest) is not so difficult to learn by a 12 years old.

710

asked Aug 27 '11 15:08

Loïc Février

3 Answers

Are you sure you need a full syntax tree for this? I think it would be enough to do lexical analysis to find the identifiers, which is much easier. Then exclude keywords and identifiers that also appear in the header files being included.

In principle it is possible that you want different variables with the same English name to be translated to different words in French/German -- but for educational use the risk of this arising is probably small enough to ignore at first. You could sidestep the issue by writing the original sources with some disambiguating quasi-Hungarian prefixes and then remove these with the same translation mechanism for display to English-speaking end users.

Be sure to let translators see the name they are translating with full context before they choose a translation.

answered Oct 07 '22 17:10

hmakholm left over Monica

I really think you can use clang (libclang) to parse your sources and do what you want (see here for more information), the good news is that they have python bindings, which will make your life easier if you want to access a translation service or something like that.

answered Oct 07 '22 17:10

Tarantula

You don't really need a C/C++ parser, just a simple lexer that gives you elements of the code one by one. Then you get a lot of {, [, 213, ) etc that you simply ignore and write to the result file. You translate whatever consists of only letters (except keywords) and you put them in the output.

Now that I think about it, it's as simple as this:

bool is_letter(char c)
{
    return (c >= 'a' && c <= 'z') || (c >= 'A' && c <= 'Z');
}
bool is_keyword(string &s)
{
    return s == "if" || s == "else" || s == "void" /* rest of them */;
}
void translateCode(istream &in, ostream &out)
{
    while (!in.eof())
    {
        char c = in.get();
        if (is_letter(c))
        {
            string name = "";
            do
            {
                name += c;
                c = in.get();
            } while (is_letter(c) && !in.eof());
            if (is_keyword(name))
                out << name;
            else
                out << translate(name);
        }
        out << c;  // even if is_letter(c) was true, there is a new c from the
                   // while inside that was read (which was not letter), but
                   // not written, so would be written here.
    }
}

I wrote the code in the editor, so there may be minor errors. Tell me if there are any and I'll fix it.

Edit: Explanation:

What the code does is simply to read input character by character, outputting whatever non-letter characters it reads (including spaces, tabs and new lines). If it does see a letter though, it will start putting all the following letters in one string (until it reaches another non-letter). Then if the string was a keyword, it would output the keyword itself. If it was not, would translate it and output it.

The output would have the exact same format as the input.

answered Oct 07 '22 15:10

Shahbaz

Related questions
                            
                                Is moving an object into malloc'd memory valid C++?
                            
                                How to format floating point numbers with decimal comma using the fmt library?
                            
                                Non-type template parameter type changes randomly
                            
                                Passing a concept to a function
                            
                                Why is the 'simplified' code not vectorized
                            
                                What does `(i & (i + 1)) - 1` mean? (in Fenwick Trees)
                            
                                Is there a clever way of avoiding extra padding with nested classes in C++?
                            
                                Fast calculation of floating 1/N if factorization of very large integer N is known
                            
                                Absolute beginners guide to working with audio in C/C++?
                            
                                What are all of the well-known virtual folder GUIDs?
                            
                                How to add a property to a module in boost::python?
                            
                                Open source portable/cross-platform video camera capture library [closed]
                            
                                Status & Contents of TR2 W.R.T. C++ Specification
                            
                                /usr/bin/ld: warning: abc.so, needed by xyz.so not found (try using -rpath or -rpath-link)"
                            
                                Partitioning big rectangle to small ones (2D Packing)
                            
                                Any papers that explore performance issues and optimizations strategies available to C++ based COM applications?
                            
                                Lazy Parameter Evaluation
                            
                                Add member to existing struct without breaking legacy code
                            
                                algorithm to parse string with dictionary
                            
                                NEON vs Intel SSE - equivalence of certain operations

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With