Day 2 with Esprima
Yesterday, we looked at how Esprima parses
2+2
. Today,
we're going to go one step further and see how Esprima parses sum
.function sum(a,b) {
return a + b;
}
Like yesterday, this post is just a polished version of my notes.
The transitions will be a bit rough and there are probably some
incomplete thoughts, but hopefully you'll be able to see how
Esprima goes about the work of parsing a function.
How do you add two numbers?
So lets start from the beginning.
esprima.parse("function sum(a,b) {return a + b;}")
When you parse the
sum
function you get an output like this:{
"type": "Program",
"body": [
{
"type": "FunctionDeclaration",
"id": {
"type": "Identifier",
"name": "sum"
},
"params": [
{
"type": "Identifier",
"name": "a"
},
{
"type": "Identifier",
"name": "b"
}
],
"defaults": [],
"body": {
"type": "BlockStatement",
"body": [
{
"type": "ReturnStatement",
"argument": {
"type": "BinaryExpression",
"operator": "+",
"left": {
"type": "Identifier",
"name": "a"
},
"right": {
"type": "Identifier",
"name": "b"
}
}
}
]
},
"generator": false,
"expression": false
}
]
}
Notes:
FunctionDeclaration
- declarations take the formfunction foo() {}
type.name
is set tosum
params
are an array of identifiers(a, b)
body
is aReturnStatement
with an argumentBinaryExpression
.
Call Stack
Alright, now that we see the output, lets take a look at how it was generated.
In this case, if we set a breakpoint at
parseFunctionDeclaration
the call stack is pretty simple.- parseFunctionDeclaration
- parseStatementListItem
- parseScriptBody
- parseProgram
- parse
This shows us that, the function declaration is just a
StatementItem
found by parseStatementListItem
.function parseStatementListItem() {
if (lookahead.type === Token.Keyword) {
switch (lookahead.value) {
case 'export': return parseExportDeclaration();
case 'import': return parseImportDeclaration();
case 'let': return parseLexicalDeclaration({inFor: false});
case 'function': return parseFunctionDeclaration(new Node());
case 'class': return parseClassDeclaration();
}
}
return parseStatement();
}
Looking at
parseStatementListItem
,
all it's doing is comparing the lookahead's value to different types.
It's not surprising that it finds the function declaration.How is a function parsed?
Here's a simplified version of
parseFunctionDeclaration
We'll look at how four things are handled: (names, params, strict mode, body).function parseFunctionDeclaration(node) {
if (!match('(')) {
token = lookahead;
var id = parseVariableIdentifier();
}
var params = parseParams();
var previousStrict = strict;
var body = parseFunctionSourceElements();
strict = previousStrict;
return node.finishFunctionDeclaration(id, params, body);
}
How is the function name parsed?
Function names are parsed the same way other variable identifiers are.
There are a couple edge cases to watch out for like
yield
and strict mode,
but basically Esprima lexes, finds the next token and returns the node.function parseVariableIdentifier() {
var token = lex();
if (token.type !== Token.Identifier) throwUnexpectedToken(token);
return node.finishIdentifier(token.value);
}
How are function params parsed?
When we're parsing params
function sum (a,b) {return a + b}
, we're looking
for the a
and b
identifiers inside of the parens. We also know that they're separated by commas.Parse params order of operations is as follows:
- first check for a '('
- check for the second ')' if it finds it return early
- parse params
function parseParams() {
options = { params: [] };
expect('(');
if (match(')')) return options;
while (parseParam(options));
return options;
}
Are you surprised that
parseParams
delegates responsiblity to parseParam
?
You shouldn't be, parsers will always break the problem into the smallest possible unit!Findings:
- Params are just variable identifiers. Remember function names? Same thing!
options
is a variable that is shared betweenparseParams
andparseParam
. Interesting choice sir...parseParam
checks for the last paren)
.
function parseParam(param) {
token = lookahead;
var param = parseVariableIdentifier();
options.params.push(param);
return !match(')');
}
What's going on with strict modes?
When
parseFunctionDeclaration
parses the body it first saves the current strict mode.
It then, parses the function body and resets the strict mode.function parseFunctionDeclaration(node) {
if (!match('(')) {
token = lookahead;
var id = parseVariableIdentifier();
}
var params = parseParams();
var previousStrict = strict;
var body = parseFunctionSourceElements();
strict = previousStrict;
return node.finishFunctionDeclaration(id, params, body);
}
The reason for this is pretty cool. Remember that each function block can define it's own strict mode
preference, well, by keeping a reference to the wrapping
block's strict mode preference we can make sure it doesn't get clobbered.
function() { 'use strict' }
How is the function body parsed?
parseFunctionSourceElements
will do a couple things to set specific function
body fields, but the gist of it is parsing statements, which is a lot like parsing a JS script body.Order of operations:
- check for a left curly
{
- start parsing statements and checking for the right curly
}
function parseFunctionSourceElements() {
expect('{');
var body = [];
while (startIndex < length) {
token = lookahead;
if (match('}')) break;
body.push(parseStatementListItem());
}
return node.finishBlockStatement(body);
}
You might expect us to jump into how Esprima parses a statement, but
I promise you that if we do that then this will be the blog post that never ends.
In the spirit of the parser, if you're curious, I point you to yesterday's post
on how
2+2
is parsed. If you've seen one statement parsed, you've seen them all :)
Not really, but ya know we got to save something for tomorrow.Overview
Alright, so that finishes today's journey into how
sum
is parsed.function sum(a,b) {return a + b;}
We looked at how sum's name, params, and body were parsed. Along the way,
we saw the importance of breaking a big problem down into small pieces and
the value of always checking for small grammatical hints ("(", ")", ",", ", ").