Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How hard would it be to translate a programming language to another human language?

Let me explain. Suppose I want to teach Python to someone who only speaks Spanish. As you know, in most programming languages all keywords are in English. How complex would it be to create a program that will find all keywords in a given source code and translate them? Would I need to use a parser and stuff, or will a couple of regexes and string functions be enough?

If it depends on the source programming language, then Python and Javascript would be the most important.

What I mean by "how complex would it be" is that would it be enough to have a list of keywords, and parse the source code to find keywords not in quotes? Or are there enough syntactical weirdnesses that something more complicated is required?

like image 498
Javier Avatar asked Oct 31 '09 02:10

Javier


People also ask

Can you translate one programming language to another?

Yes! Compilers convert one programming language into another. Usually, compilers are used to convert code so the machine can understand it. If we want it to be human-readable, we need a subset of compilers called transpilers.

Why is it difficult for a computer system to translate between two human languages?

Because cultural nuances abound. And there is frequently no one “right” way to translate something. Interpretation is highly subjective and contextual. Because computers cannot interpret language in the same way that humans do, this makes machine translation an even trickier proposition.

Which language is the most difficult for humans to write a program in?

C++ C++ is considered to be one of the most powerful, fastest, and toughest programming languages.


1 Answers

If all you want is to translate keywords, then (while you definitely DO need a proper parser, as otherwise avoiding any change in strings, comments &c becomes a nightmare) the task is quite simple. For example, since you mentioned Python:

import cStringIO
import keyword
import token
import tokenize

samp = '''\
for x in range(8):
  if x%2:
    y = x
    while y>0:
      print y,
      y -= 3
    print
'''

translate = {'for': 'per', 'if': 'se', 'while': 'mentre', 'print': 'stampa'}

def toks(tokens):
  for tt, ts, src, erc, ll in tokens:
    if tt == token.NAME and keyword.iskeyword(ts):
      ts = translate.get(ts, ts)
    yield tt, ts

def main():
  rl = cStringIO.StringIO(samp).readline
  toki = toks(tokenize.generate_tokens(rl))
  print tokenize.untokenize(toki)

main()

I hope it's obvious how to generalize this to "translate" any Python source and in any language (I'm supplying only a very partial Italian keyword translation dict). This emits:

per x in range (8 ):
  se x %2 :
    y =x 
    mentre y >0 :
      stampa y ,
      y -=3 
    stampa 

(strange though correct whitespace, but that could be easily enough remedied). As an Italian speaker I can tell you this is terrible to read, but that's par for the course for any "programming language translation" as you desire. Worse, NON-keywords such as range remain un-translated (as per your specs) -- of course, you don't have to constrain your translation to keywords-only (it's easy enough to remove the if that does that above;-).

like image 140
Alex Martelli Avatar answered Oct 05 '22 04:10

Alex Martelli