Xiangxiang's Personal Site

Machine Learning & Security Engineer
生命不息,折腾不止,留下一点活着的记录.

View on GitHub
26 September 2021

How Python run codes

by xiangxiang

CPython internals学习笔记

How Python run codes

0 The big picture

File Input  -----------+                           +-------------------------+                        
(python <file>)        |                           |                         |
                       |                           |                         |
                       |                      AST  |          CFG            |  Bytecode
IO Stream   -----------|--> Reader --> Parser ---> | Compiler ----> Assembler| ---------> Execution
(cat <file> | python)  |                           |                         |           (Event Loop)
                       |                           |                         |
                       |                           |    Compilation part     |           
String Input-----------+                           +-------------------------+
(python -c <str>)                                   

1 The Python Language Specification

1.1 The Grammer File Grammar/python.gram

1.2 The Parser Generator

1.3 Tokens

2 Input and configuration

To execute any Python code, the interpreter needs three elements in place:

  1. A module to execute
  2. A state to hold information such as variables
  3. A configuration, such as which options are enabled PEP 587 With these three components, the interpreter can execute code and provide an output
         +-----> Configuration -----+
         |                          |
         |                          |    
Input ---|-----> State         -----|--> Runtime --> Output
         |                          |
         |                          |
         +-----> Modules       -----+

2.1 Configuration State

2.2 Build Configuration

2.3 Code inputs

3 Lexing and Parsing with Syntax Trees

3.1 Concrete Syntax Trees (CST)

import symbol
import token
import parser

def lex(expression):
    symbols = {v: k for k, v in symbol.__dict__.items()
               if isinstance(v, int)}
    tokens = {v: k for k, v in token.__dict__.items()
              if isinstance(v, int)}
    lexicon = {**symbols, **tokens}
    st = parser.expr(expression)
    st_list = parser.st2list(st)

    def replace(l: list):
        r = []
        for i in l:
            if isinstance(i, list):
                r.append(replace(i))
            else:
                if i in lexicon:
                    r.append(lexicon[i])
                else:
                    r.append(i)
        return r
    return replace(st_list)

3.2 Abstract Syntax Trees (AST)

import ast
ast.dump(ast.parse('[ord(c) for line in file for c in line]', mode='eval'))

4 Compiler

import dis
co = compile("b+1", "test.py", mode="eval")
dis.dis(co.co_code)

4.1 Instantiating a Compiler

4.2 Compiler flags

4.3 Symbol Tables

import symtable
table = symtable.symtable("def some_func(): pass", "string", "exec")
table.get_symbols()

4.4 Core Compilation Process (PyAST_CompileObject)

4.5 Assembly

5 The evaluation loop: Execution of code

           interpreter
                |
                |
                |
    thread0             thread1      ...
(thread_state0)    (thread_state1)   ...
        |                  | 
        |                  |
        |                  |
frame object[s]     frame object[s]


  FRAME 0 --- code object                 
    | 
    |
    | fd_back_ptr 
    | (previous)
    |
    +---FRAME 1 --- code object
           |
           |
           | fd_back_ptr
           | (previous)
           |
           +---FRAME 2 --- code object 
                  |
                  |
thread_state------+


+----------------------------------+
|           Frame Object           |
+----------------------------------+
|              +----------------   |
|  Builtins    |  Code Object  |   |
|              +---------------+   |
|  Globals     |               |   |
|              |  Instructions |   |
|  Locals      |               |   |
|              |  Names        |   |
|  Values      |               |   |
|              |  Constants    |   |
|              +---------------+   |
+----------------------------------+

5.1 Thread State

// Include/pystate.h
/* struct _ts is defined in cpython/pystate.h */
typedef struct _ts PyThreadState;

// Include/cpython/pystate.h
// The PyThreadState typedef is in Include/pystate.h.
struct _ts {
    /* Unique thread state id. */
    uint64_t id;

    /* The frame object type is a PyObject*/
    PyObject *context;
    uint64_t context_ver;

    /* A linked list to the other thread states */
    struct _ts *prev;
    struct _ts *next;

    /* The interpreter state it was spawned by */
    PyInterpreterState *interp;

    /* The currently executing frame */
    /* Borrowed reference to the current frame (it can be NULL) */
    PyFrameObject *frame;

    /* The current recursion depth */
    int recursion_depth;

    char overflowed; /* The stack has overflowed. Allow 50 more calls
                        to handle the runtime error. */
    char recursion_critical; /* The current calls must not cause
                                a stack overflow. */
    int stackcheck_counter;

    /* Optional tracing functions */
    /* 'tracing' keeps track of the execution depth when tracing/profiling.
       This is to prevent the actual trace/profile code from being recorded in
       the trace/profile. */
    int tracing;
    int use_tracing;
    Py_tracefunc c_profilefunc;
    Py_tracefunc c_tracefunc;
    PyObject *c_profileobj;
    PyObject *c_traceobj;

    /* The exception currently being raised */
    PyObject *curexc_type;
    PyObject *curexc_value;
    PyObject *curexc_traceback;

    /* Any async exception currently being handled */
    /* The exception currently being handled, if no coroutines/generators
     * are present. Always last element on the stack referred to be exc_info. 
     */
    _PyErr_StackItem exc_state;

    /* A stack of exceptions raised when multiple exceptions have been raised (within an except block, for example) */
    /* Pointer to the top of the stack of the exceptions currently
     * being handled */
    _PyErr_StackItem *exc_info;

    PyObject *dict;  /* Stores per-thread state */

    /* A GIL counter */
    int gilstate_counter;

    PyObject *async_exc; /* Asynchronous exception to raise */
    unsigned long thread_id; /* Thread id where this tstate was created */

    int trash_delete_nesting;
    PyObject *trash_delete_later;

    /* Called when a thread state is deleted normally, but not when it is destroyed after fork(). */
    void (*on_delete)(void *);
    void *on_delete_data;

    int coroutine_origin_tracking_depth;

    PyObject *async_gen_firstiter;
    PyObject *async_gen_finalizer;
};

5.3 Frame Object

5.4 Frame Execution

5.5 The Value Stack

tags: python cpython