What is an Abstract Syntax Tree? Visualizing Code Like a Compiler
Understand the basics of an AST and how it relates to real-world code
Get the project source code below, and follow along with the lesson material.
Download Project Source CodeTo set up the project on your local machine, please follow the directions provided in the README.md
file. If you run into any issues with running the project source code, then feel free to reach out to the author in the course's Discord channel.
- |
Lesson Transcript
[00:00 - 01:56] What is an AST? This course is designed to provide a concrete understanding of the theory and practical uses of abstracts and text trees, or ASTs. Before we explore the practical uses, we need to gain a conceptual understanding of ASTs. At a high level, an abstract syntax tree is an intermediate representation of source code as a tree structure. But what does that actually mean? Let's break each part of the statement down. To start, let's look at the tree data structure. A tree data structure starts with a root node. The root can then point to other values, and those values to others, and so on. This begins to create an implicit hierarchy, and also happens to be a great way to represent source code in a way computers can easily interpret. Each one of these values, or circles in the tree, are referred to as nodes. The relationships between nodes are often described with terms like child node, parent node, sibling node, and so on. By convention, the root node is shown at the top. However, if it's flipped, with the root node at the bottom, and the rest of the tree heading upwards, it starts to look like an actual tree, with all its branches forking out. Tree data structures are common in computer science, and have many practical applications, such as searching and sorting data. There are also many different types of trees with different constraints. For example, a binary tree is a tree with at most two child nodes. For the purpose of working with ASTs, the important aspect is understanding how a tree can be used to represent data in the relationships between nodes. Some of the most prominent uses of ASTs are in compilers. A compiler accepts source code as input, and then outputs another language. This is often from a high-level programming language to something low-level, like machine code. In the front-end web ecosystem, this includes tools like Webpack or Parcel. These tools compile many modules into a bundle and perform other optimizations such as transpiling from modern JavaScript to an older version, or minifying the code by renaming variables and functions to shorter names.
[01:57 - 04:25] Although they are a little different to conventional compilers, they follow many of the same fundamental steps. Compilers are commonly broken down into two parts, the front-end and back-end. The front-end is responsible for scanning and parsing the source code, while the back-end is responsible for producing the output. One benefit of making this distinction is the ability to combine a different front-end and back-end, depending on the input language being compiled in the desired output language. Additionally, breaking this process down into distinct steps makes it easier to reason about. In order for this to work, the front-end and back-end need some form of protocol or an intermediate representation of the input. Typically, the output of the front-end is an abstract syntax tree. The AST represents the source code in a tree structure, hence a syntax tree. It's considered abstract because at this point it has abstracted a way syntax that is irrelevant when represented by a tree, since it can imply things like hierarchy. While ASTs can seem complex, most of the complexity is in generating them from source code. Fortunately, there are many great tools in the JavaScript ecosystem that can handle generating ASTs. For the remainder of this course, it won't be critical to know the nuances of compilers or all the different types of tree data structures or the complexities of generating ASTs. However, understanding what ASTs are will unlock a whole new set of practical skills. For example, some common applications of ASTs include counting how many times a function variable component or prop is used in source code or transforming code from one syntax to another or enforcing rules for syntax or other static analysis, for example, disallowing unused variables. Once familiar with abstracts and text trees, many of these concepts can be carried from one tool to another as they will be built on many of the same fundamentals. Now that we have a base understanding of what ASTs are and how they work, what does a basic AST look like in practice? Let's take the following line of JavaScript as an example input. This code can then be represented by the following abstracts and text tree on the right. You'll notice there is a single root node. Here that's the addition operator. In practice, the root node represents the entire file, but for the purpose of this example, unnecessary nodes have been omitted. Each child node, for example, the numeric literal 2, has one unique parent. This can be said for all other nodes in the tree except the root.
[04:26 - 05:55] Finally, the tree structure implies hierarchy, which means syntax such as parentheses can be omitted. For example, to evaluate this tree, the multiplication must occur before the addition. Looking at a visual AST like this can be a helpful way to understand the structure. However, this isn't the true representation of what an AST will be when working with it in later lessons. A more common format is to represent this abstract syntax tree in a JSON format. Using the same example code, a simplified JSON structure representing the AST can be seen on the right, most tooling that relies on ASTs operate on a JSON structure like this. As the tree becomes larger and more complex, it can become harder to visualize the structure. Take a moment to relate the visual representation and this JSON format of the same tree. Visualizing source code in a tree structure like this is one technique that can be helpful when reasoning about ASTs. Here, the entire statement is considered a binary expression node with three primary properties, left, operator, and right. The left node is then a numeric literal node representing the value 2. The right node however is yet another binary expression node again with the properties left, operator, and right. However, this time, this nested binary expressions left and right nodes are both numeric literals representing the values 4 and 10. Now that we can sexually understand ASTs, the next lesson will cover what they look like in more depth.