[DiveIn] Deciphering Python: How to use Abstract Syntax Trees (AST) to understand code
Let’s get a little “meta” about programming.
How does the Python program (better know as the interpreter) “know” how to run your code? If you’re new to programming, it may seem like magic. In fact, it still seems like magic to me after being a professional for more than a decade.
The Python interpreter is not magic (sorry to disappoint you). It follows a predictable set of steps to translate your code into instructions that a machine can run.
At a fairly high level, here’s what happens to your code:
- The code is parsed (i.e., split up) into a list of pieces usually called tokens. These tokens are based on a set of rules for things that should be treated differently. For instance, the keyword if is a different token than a numeric value like
42
. - The raw list of tokens is transformed to build an Abstract Syntax Tree, AST, which is the subject we will explore more in this post. An AST is a collection of nodes which are linked together based on the grammar of the Python language. Don’t worry if that made no sense now since we’ll shine more light on it momentarily.
- From an abstract syntax tree, the interpreter can produce a lower level form of instructions called bytecode. These instructions are things like
BINARY_ADD
and are meant to be very generic so that a computer can run them. - With the bytecode instructions available, the interpreter can finally run your code. The bytecode is used to call functions in your operating system which will ultimately interact with a CPU and memory to run the program.
Many more details could fit into that description, but that’s the rough sketch of how typed characters are executed by computer CPUs.
Read more from this blog: https://www.mattlayman.com/blog/2018/decipher-python-ast/