I want to understand how Python works at a base level, and this will hopefully help me understand a bit more about the inner workings of other compiled/interpreted languages. Unfortunately, the compilers class is a bit away for now. From what I read on this site and elsewhere, people answering "What base language is Python written in" seem to convey that there's a difference between talking about the "rules" of a language versus how the language rules are implemented for usage. So, is it correct to say that Python (and other high-level languages) are all essentially just sets of rules "written" in any natural language? And then the matter of how they're actually used (where used means compiled/interpreted to actually create things) can vary, with various languages being used to implement compilers? So in this case, CPython, IronPython, and Jython would be syntactically equal languages which all follow the same set of rules, just that those rules are implemented themselves in their respective languages.
Please let me know if my understanding of this is correct, if you have anything to add that might further solidify my understanding, or if I'm blatantly wrong.
Code written in Python should be able to run on any Python interpreter. Python is essentially a specification for a programming language with a reference implementation (CPython). Whenever the Python specifications and PEPs are ambiguous, the other interpreters usually choose to implement the same behavior, unless they have reason not to.
That being said, it's entirely possible that a program written in Python will behave differently on different implementations. This is because many programmers venture into "undefined behavior." For example, CPython has a "Global Interpreter Lock" that means only one thread is actually executing at a time (modulo some conditions), but other interpreters do not have that behavior. So, for example, there is different behaviors about atomicity (e.g., each bytecode instruction is atomic in CPython) as other interpreters.
You can consider it like C. C is a language specification, but there are many compilers implementing it: GCC, LLVM, Borland, MSVC++, ICC, etc. There are programming languages and implementations of those programming languages.
You are correct when you make the distinction between what a language means and how it does what it means.
The first step to compiling a language is to parse its code to generate an Abstract Syntax Tree. That is a tree that defines what the code you wrote means, what it is supposed to do. By example if you have the following code
a = 1
if a:
print('not zero')
It would generate a tree that looks more or less like this.
code
___________|______
| |
declaration if
__|__ ___|____
| | | |
a 1 a print
|
'not zero'
This represents what the code means, but tells us nothing about how it executes it.
Edit: of course the above is far from what Python's parsers would actually generate, I made plenty of oversimplification for the purpose of readability. Luckily for us, if you are curious about what is actually generated you can import ast
that provides a Python parser.
import ast
code = """
a = 1
if a:
print('not zero')
"""
my_ast = ast.parse(code)
Enjoy inspecting my_ast
.
Once you have an AST, you can convert it back to whatver you want. It can be C, it can be machine code, you can even convert it back to Python if you wish. The most used implementation of Python is CPython which is written in C.
What is going on under the hood is thus pretty close to your understanding. First, a language is a set of rules that defines a behaviour, and only then is there an implementation to that languages that defines how it does it. And yes of course, you can have different implementations of a same language with slight difference of behaviours.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With