README.rst 24.5 KB
Newer Older
Tommy Yu's avatar
Tommy Yu committed
1
2
3
4
calmjs.parse
============

A collection of parsers and helper libraries for understanding
Tommy Yu's avatar
Tommy Yu committed
5
ECMAScript; a near feature complete fork of |slimit|_.  A CLI front-end
6
for this package is shipped separately as |crimp|_.
Tommy Yu's avatar
Tommy Yu committed
7

Tommy Yu's avatar
Tommy Yu committed
8
.. image:: https://travis-ci.org/calmjs/calmjs.parse.svg?branch=1.2.5
Tommy Yu's avatar
Tommy Yu committed
9
    :target: https://travis-ci.org/calmjs/calmjs.parse
Tommy Yu's avatar
Tommy Yu committed
10
11
12
13
.. image:: https://ci.appveyor.com/api/projects/status/5dj8dnu9gmj02msu/branch/1.2.5?svg=true
    :target: https://ci.appveyor.com/project/metatoaster/calmjs-parse/branch/1.2.5
.. image:: https://coveralls.io/repos/github/calmjs/calmjs.parse/badge.svg?branch=1.2.5
    :target: https://coveralls.io/github/calmjs/calmjs.parse?branch=1.2.5
Tommy Yu's avatar
Tommy Yu committed
14
15

.. |calmjs.parse| replace:: ``calmjs.parse``
Tommy Yu's avatar
Tommy Yu committed
16
.. |crimp| replace:: ``crimp``
Tommy Yu's avatar
Tommy Yu committed
17
.. |ply| replace:: ``ply``
Tommy Yu's avatar
Tommy Yu committed
18
.. |slimit| replace:: ``slimit``
Tommy Yu's avatar
Tommy Yu committed
19
.. _crimp: https://pypi.python.org/pypi/crimp
Tommy Yu's avatar
Tommy Yu committed
20
.. _ply: https://pypi.python.org/pypi/ply
Tommy Yu's avatar
Tommy Yu committed
21
.. _slimit: https://pypi.python.org/pypi/slimit
Tommy Yu's avatar
Tommy Yu committed
22

Tommy Yu's avatar
Tommy Yu committed
23

Tommy Yu's avatar
Tommy Yu committed
24
25
26
27
28
Introduction
------------

For any kind of build system that operates with JavaScript code in
conjunction with a module system, the ability to understand what modules
Tommy Yu's avatar
Tommy Yu committed
29
30
31
32
33
34
35
a given set of sources require or provide is paramount.  As the Calmjs
project provides a framework that produces and consume these module
definitions, the the ability to have a comprehensive understanding of
given JavaScript sources is a given.  This goal was originally achieved
using |slimit|_, a JavaScript minifier library that also provided a
comprehensive parser class that was built using Python Lex-Yacc (i.e.
|ply|_).
Tommy Yu's avatar
Tommy Yu committed
36
37

However, as of mid-2017, it was noted that |slimit| remained in a
Tommy Yu's avatar
Tommy Yu committed
38
minimum state of maintenance for more than four years (its most recent
Tommy Yu's avatar
Tommy Yu committed
39
40
release, 0.8.1, was made 2013-03-26), along with a number of serious
outstanding issues have left unattended and unresolved for the duration
41
of that time span.  As the development of the Calmjs framework require
Tommy Yu's avatar
Tommy Yu committed
42
those issues to be rectified as soon as possible, a decision to fork the
Tommy Yu's avatar
Tommy Yu committed
43
44
parser portion of |slimit| was made. This was done in order to cater to
the interests current to Calmjs project at that moment in time.
Tommy Yu's avatar
Tommy Yu committed
45
46

The fork was initial cut from another fork of |slimit| (specifically
Tommy Yu's avatar
Tommy Yu committed
47
48
49
50
51
52
53
54
`lelit/slimit <https://github.com/lelit/slimit>`_), as it introduced and
aggregated a number of bug fixes from various sources.  To ensure a
better quality control and assurance, a number of problematic changes
introduced by that fork were removed.   Also, new tests were created to
bring coverage to full, and issues reported on the |slimit| tracker were
noted and formalized into test cases where applicable.  Finally, grammar
rules were updated to ensure better conformance with the ECMA-262 (ES5)
specification.
Tommy Yu's avatar
Tommy Yu committed
55

Tommy Yu's avatar
Tommy Yu committed
56
57
58
59
60
The goal of |calmjs.parse| is to provide a similar API that |slimit| had
provided, except done in a much more extensible manner with more
correctness checks in place.  This however resulted in some operations
that might take longer than what |slimit| had achieved, such as the
pretty printing of output.
Tommy Yu's avatar
Tommy Yu committed
61

Tommy Yu's avatar
Tommy Yu committed
62
63
64
A CLI front-end that makes use of this package is provided through
|crimp|_.

Tommy Yu's avatar
Tommy Yu committed
65
66
67
68

Installation
------------

69
70
71
The following command may be executed to source the latest stable
version of |calmjs.parse| wheel from PyPI for installation into the
current Python environment.
Tommy Yu's avatar
Tommy Yu committed
72

73
.. code:: console
Tommy Yu's avatar
Tommy Yu committed
74
75
76

    $ pip install calmjs.parse

Tommy Yu's avatar
Tommy Yu committed
77
78
79
As this package uses |ply|, it requires the generation of optimization
modules for its lexer.  The wheel distribution of |calmjs.parse| does
not require this extra step as it contains these pre-generated modules
Tommy Yu's avatar
Tommy Yu committed
80
for |ply| up to version 3.11 (the latest version available at the time
Tommy Yu's avatar
Tommy Yu committed
81
82
83
84
85
86
87
88
89
of previous release), however the source tarball or if |ply| version
that is installed lies outside of the supported versions, the following
caveats will apply.

If a more recent release of |ply| becomes available and the environment
upgrades to that version, those pre-generated modules may become
incompatible, which may result in a decreased performance and/or errors.
A corrective action can be achieved through a `manual optimization`_
step if a newer version of |calmjs.parse| is not available, or |ply| may
Tommy Yu's avatar
Tommy Yu committed
90
be downgraded back to version 3.11 if possible.
Tommy Yu's avatar
Tommy Yu committed
91
92
93

Once the package is installed, the installation may be `tested`_ or be
`used directly`_.
Tommy Yu's avatar
Tommy Yu committed
94
95
96
97
98
99
100
101

Alternative installation methods (for developers, advanced users)
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Development is still ongoing with |calmjs.parse|, for the latest
features and bug fixes, the development version may be installed through
git like so:

102
.. code:: console
Tommy Yu's avatar
Tommy Yu committed
103
104
105
106
107
108
109
110
111
112
113
114
115

    $ pip install git+https://github.com/calmjs/calmjs.parse.git#egg=calmjs.parse

Alternatively, the git repository can be cloned directly and execute
``python setup.py develop`` while inside the root of the source
directory.

A manual optimization step may need to be performed for platforms and
systems that do not have utf8 as their default encoding.

Manual optimization
~~~~~~~~~~~~~~~~~~~

Tommy Yu's avatar
Tommy Yu committed
116
117
As lex and yacc require the generation of symbol tables, a way to
optimize the performance is to cache the results.  For |ply|, this is
Tommy Yu's avatar
Tommy Yu committed
118
done using an auto-generated module.  However, the generated file is
Tommy Yu's avatar
Tommy Yu committed
119
120
121
122
123
marked with a version number, as the results may be specific to the
installed version of |ply|.  In |calmjs.parse| this is handled by giving
them a name specific to the version of |ply| and the major Python
version, as both together does result in subtle differences in the
outputs and expectations of the auto-generated modules.
Tommy Yu's avatar
Tommy Yu committed
124
125
126
127
128
129
130
131
132
133
134
135

Typically, the process for this optimization is automatic and a correct
symbol table will be generated, however there are cases where this will
fail, so for this reason |calmjs.parse| provide a helper module and
executable that can be optionally invoked to ensure that the correct
encoding be used to generate that file.  Other reasons where this may be
necessary is to allow system administrators to do so for their end
users, as they may not have write privileges at that level.

To execute the optimizer from the shell, the provided helper script may
be used like so:

136
.. code:: console
Tommy Yu's avatar
Tommy Yu committed
137
138
139
140
141
142
143
144

    $ python -m calmjs.parse.parsers.optimize

If warnings appear that warn that tokens are defined but not used, they
may be safely ignored.

This step is generally optionally for users who installed this package
from PyPI via a Python wheel, provided the caveats as outlined in the
Tommy Yu's avatar
Tommy Yu committed
145
installation section are addressed.
Tommy Yu's avatar
Tommy Yu committed
146

Tommy Yu's avatar
Tommy Yu committed
147
148
.. _tested:

Tommy Yu's avatar
Tommy Yu committed
149
150
151
152
153
154
Testing the installation
~~~~~~~~~~~~~~~~~~~~~~~~

To ensure that the |calmjs.parse| installation is functioning correctly,
the built-in testsuite can be executed by the following:

155
.. code:: console
Tommy Yu's avatar
Tommy Yu committed
156
157
158

    $ python -m unittest calmjs.parse.tests.make_suite

Tommy Yu's avatar
Tommy Yu committed
159
160
161
162
163
164
165
If there are failures, please file an issue on the `issue tracker
<https://github.com/calmjs/calmjs.parse/issues>`_ with the full
traceback, and/or the method of installation.  Please also include
applicable information about the environment, such as the version of
this software, Python version, operating system environments, the
version of |ply| that was installed, plus other information related to
the issue at hand.
Tommy Yu's avatar
Tommy Yu committed
166

Tommy Yu's avatar
Tommy Yu committed
167
168
169
170

Usage
-----

Tommy Yu's avatar
Tommy Yu committed
171
172
.. _used directly:

Tommy Yu's avatar
Tommy Yu committed
173
As this is a parser library, no executable shell commands are provided.
Tommy Yu's avatar
Tommy Yu committed
174
There is however a helper callable object provided at the top level for
Tommy Yu's avatar
Tommy Yu committed
175
immediate access to the parsing feature.  It may be used like so:
Tommy Yu's avatar
Tommy Yu committed
176

177
.. code:: pycon
Tommy Yu's avatar
Tommy Yu committed
178
179

    >>> from calmjs.parse import es5
Tommy Yu's avatar
Tommy Yu committed
180
    >>> program_source = u'''
Tommy Yu's avatar
Tommy Yu committed
181
182
183
184
185
    ... // simple program
    ... var main = function(greet) {
    ...     var hello = "hello " + greet;
    ...     return hello;
    ... };
Tommy Yu's avatar
Tommy Yu committed
186
    ... console.log(main('world'));
Tommy Yu's avatar
Tommy Yu committed
187
188
    ... '''
    >>> program = es5(program_source)
189
190
    >>> # for a simple repr-like nested view of the ast
    >>> program  # equivalent to repr(program)
Tommy Yu's avatar
Tommy Yu committed
191
192
193
    <ES5Program @3:1 ?children=[
      <VarStatement @3:1 ?children=[
        <VarDecl @3:5 identifier=<Identifier ...>, initializer=<FuncExpr ...>>
Tommy Yu's avatar
Tommy Yu committed
194
      ]>,
Tommy Yu's avatar
Tommy Yu committed
195
196
      <ExprStatement @7:1 expr=<FunctionCall @7:1 args=<Arguments ...>,
        identifier=<DotAccessor ...>>>
Tommy Yu's avatar
Tommy Yu committed
197
    ]>
198
199
200
    >>> # automatic reconstruction of ast into source, without having to
    >>> # call something like `.to_ecma()`
    >>> print(program)  # equivalent to str(program)
Tommy Yu's avatar
Tommy Yu committed
201
202
203
204
205
    var main = function(greet) {
      var hello = "hello " + greet;
      return hello;
    };
    console.log(main('world'));
Tommy Yu's avatar
Tommy Yu committed
206

Tommy Yu's avatar
Tommy Yu committed
207
208
    >>>

209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
Please note the change in indentation as the default printer has its own
indentation scheme.  If comments are needed, the parser can be called
using ``with_comments=True``:

.. code:: pycon

    >>> program_wc = es5(program_source, with_comments=True)
    >>> print(program_wc)
    // simple program
    var main = function(greet) {
      var hello = "hello " + greet;
      return hello;
    };
    console.log(main('world'));

    >>>

Also note that there are limitations with the capturing of comments as
documented in the `Limitations`_ section.
Tommy Yu's avatar
Tommy Yu committed
228

Tommy Yu's avatar
Tommy Yu committed
229
230
231
The parser classes are organized under the ``calmjs.parse.parsers``
module, with each language being under their own module.  A
corresponding lexer class with the same name is also provided under the
Tommy Yu's avatar
Tommy Yu committed
232
``calmjs.parse.lexers`` module.  For the moment, only ES5 support is
Tommy Yu's avatar
Tommy Yu committed
233
234
implemented.

235
236
Pretty/minified printing
~~~~~~~~~~~~~~~~~~~~~~~~
Tommy Yu's avatar
Tommy Yu committed
237

Tommy Yu's avatar
Tommy Yu committed
238
239
240
241
242
243
244
245
There is also a set of pretty printing helpers for turning the AST back
into a string.  These are available as functions or class constructors,
and are produced by composing various lower level classes available in
the ``calmjs.parse.unparsers`` and related modules.

There is a default short-hand helper for turning the previously produced
AST back into a string, which can be manually invoked with certain
parameters, such as what characters to use for indentation: (note that
Tommy Yu's avatar
Tommy Yu committed
246
the ``__str__`` call implicitly invoked through ``print`` shown
Tommy Yu's avatar
Tommy Yu committed
247
previously is implemented through this).
Tommy Yu's avatar
Tommy Yu committed
248

249
.. code:: pycon
Tommy Yu's avatar
Tommy Yu committed
250
251
252

    >>> from calmjs.parse.unparsers.es5 import pretty_print
    >>> print(pretty_print(program, indent_str='    '))
Tommy Yu's avatar
Tommy Yu committed
253
    var main = function(greet) {
Tommy Yu's avatar
Tommy Yu committed
254
255
        var hello = "hello " + greet;
        return hello;
Tommy Yu's avatar
Tommy Yu committed
256
    };
Tommy Yu's avatar
Tommy Yu committed
257
    console.log(main('world'));
Tommy Yu's avatar
Tommy Yu committed
258

Tommy Yu's avatar
Tommy Yu committed
259
260
    >>>

261
262
263
There is also one for printing without any unneeded whitespaces, works
as a source minifier:

264
.. code:: pycon
265
266
267
268
269
270
271
272
273
274
275
276
277

    >>> from calmjs.parse.unparsers.es5 import minify_print
    >>> print(minify_print(program))
    var main=function(greet){var hello="hello "+greet;return hello;};...
    >>> print(minify_print(program, obfuscate=True, obfuscate_globals=True))
    var a=function(b){var a="hello "+b;return a;};console.log(a('world'));

Note that in the second example, the ``obfuscate_globals`` option was
only enabled to demonstrate the source obfuscation on the global scope,
and this is generally not an option that should be enabled on production
library code that is meant to be reused by other packages (other sources
referencing the original unobfuscated names will be unable to do so).

Tommy Yu's avatar
Tommy Yu committed
278
Alternatively, direct invocation on a raw string can be done using the
279
280
281
attributes provided under the same name as the above base objects that
were imported initially.  Relevant keyword arguments would be diverted
to the appropriate underlying functions, for example:
Tommy Yu's avatar
Tommy Yu committed
282

283
.. code:: pycon
Tommy Yu's avatar
Tommy Yu committed
284

285
    >>> # pretty print without comments being parsed
Tommy Yu's avatar
Tommy Yu committed
286
287
288
289
290
291
292
    >>> print(es5.pretty_print(program_source))
    var main = function(greet) {
      var hello = "hello " + greet;
      return hello;
    };
    console.log(main('world'));

293
294
295
296
297
298
299
300
301
302
    >>> # pretty print with comments parsed
    >>> print(es5.pretty_print(program_source, with_comments=True))
    // simple program
    var main = function(greet) {
      var hello = "hello " + greet;
      return hello;
    };
    console.log(main('world'));

    >>> # minify print
Tommy Yu's avatar
Tommy Yu committed
303
304
305
    >>> print(es5.minify_print(program_source, obfuscate=True))
    var main=function(b){var a="hello "+b;return a;};console.log(main('world'));

Tommy Yu's avatar
Tommy Yu committed
306
307
308
309
Source map generation
~~~~~~~~~~~~~~~~~~~~~

For the generation of source maps, a lower level unparser instance can
310
311
312
313
314
315
316
be constructed through one of the printer factory functions.  Passing
in an AST node will produce a generator which produces tuples containing
the yielded text fragment, plus other information which will aid in the
generation of source maps.  There are helper functions from the
``calmjs.parse.sourcemap`` module can be used like so to write the
regenerated source code to some stream, along with processing the
results into a sourcemap file.  An example:
Tommy Yu's avatar
Tommy Yu committed
317

318
.. code:: pycon
Tommy Yu's avatar
Tommy Yu committed
319
320
321
322
323

    >>> import json
    >>> from io import StringIO
    >>> from calmjs.parse.unparsers.es5 import pretty_printer
    >>> from calmjs.parse.sourcemap import encode_sourcemap, write
324
325
    >>> stream_p = StringIO()
    >>> print_p = pretty_printer()
326
    >>> rawmap_p, _, names_p = write(print_p(program), stream_p)
327
    >>> sourcemap_p = encode_sourcemap(
328
    ...     'demo.min.js', rawmap_p, ['custom_name.js'], names_p)
Tommy Yu's avatar
Tommy Yu committed
329
    >>> print(json.dumps(sourcemap_p, indent=2, sort_keys=True))
Tommy Yu's avatar
Tommy Yu committed
330
    {
Tommy Yu's avatar
Tommy Yu committed
331
332
333
      "file": "demo.min.js",
      "mappings": "AAEA;IACI;IACA;AACJ;AACA;",
      "names": [],
Tommy Yu's avatar
Tommy Yu committed
334
      "sources": [
335
        "custom_name.js"
Tommy Yu's avatar
Tommy Yu committed
336
      ],
Tommy Yu's avatar
Tommy Yu committed
337
      "version": 3
Tommy Yu's avatar
Tommy Yu committed
338
    }
339
    >>> print(stream_p.getvalue())
Tommy Yu's avatar
Tommy Yu committed
340
341
    var main = function(greet) {
    ...
Tommy Yu's avatar
Tommy Yu committed
342

343
Likewise, this works similarly for the minify printer, which provides
344
345
the ability to create out a minified output with unneeded whitespaces
removed and identifiers obfuscated with the shortest possible value.
346

347
348
349
350
351
352
353
354
Note that in previous example, the second return value in the write
method was not used and that a custom value was passed in.  This is
simply due to how the ``program`` was generated from a string and thus
the ``sourcepath`` attribute was not assigned with a usable value for
populating the ``"sources"`` list in the resulting source map.  For the
following example, assign a value to that attribute on the program
directly.

355
.. code:: pycon
356
357

    >>> from calmjs.parse.unparsers.es5 import minify_printer
358
    >>> program.sourcepath = 'demo.js'  # say this was opened there
359
360
361
    >>> stream_m = StringIO()
    >>> print_m = minify_printer(obfuscate=True, obfuscate_globals=True)
    >>> sourcemap_m = encode_sourcemap(
362
    ...     'demo.min.js', *write(print_m(program), stream_m))
Tommy Yu's avatar
Tommy Yu committed
363
    >>> print(json.dumps(sourcemap_m, indent=2, sort_keys=True))
364
    {
Tommy Yu's avatar
Tommy Yu committed
365
366
      "file": "demo.min.js",
      "mappings": "AAEA,IAAIA,CAAK,CAAE,SAASC,CAAK,CAAE,CACvB,...,YAAYF,CAAI",
367
368
369
370
371
      "names": [
        "main",
        "greet",
        "hello"
      ],
Tommy Yu's avatar
Tommy Yu committed
372
373
374
375
      "sources": [
        "demo.js"
      ],
      "version": 3
376
377
378
379
    }
    >>> print(stream_m.getvalue())
    var a=function(b){var a="hello "+b;return a;};console.log(a('world'));

380
381
382
383
384
385
386
A high level API for working with named streams (i.e. opened files, or
stream objects like ``io.StringIO`` assigned with a name attribute) is
provided by the ``read`` and ``write`` functions from ``io`` module.
The following example shows how to use the function to read from a
stream and write out the relevant items back out to the write only
streams:

387
.. code:: pycon
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409

    >>> from calmjs.parse import io
    >>> h4_program_src = open('/tmp/html4.js')
    >>> h4_program_min = open('/tmp/html4.min.js', 'w+')
    >>> h4_program_map = open('/tmp/html4.min.js.map', 'w+')
    >>> h4_program = io.read(es5, h4_program_src)
    >>> print(h4_program)
    var bold = function(s) {
      return '<b>' + s + '</b>';
    };
    var italics = function(s) {
      return '<i>' + s + '</i>';
    };
    >>> io.write(print_m, h4_program, h4_program_min, h4_program_map)
    >>> pos = h4_program_map.seek(0)
    >>> print(h4_program_map.read())
    {"file": "html4.min.js", "mappings": ..., "version": 3}
    >>> pos = h4_program_min.seek(0)
    >>> print(h4_program_min.read())
    var b=function(a){return'<b>'+a+'</b>';};var a=function(a){...};
    //# sourceMappingURL=html4.min.js.map

410
411
412
For a simple concatenation of multiple sources into one file, along with
inline source map (i.e. where the sourceMappingURL is a ``data:`` URL of
the base64 encoding of the JSON string), the following may be done:
413

414
.. code:: pycon
415
416
417

    >>> files = [open('/tmp/html4.js'), open('/tmp/legacy.js')]
    >>> combined = open('/tmp/combined.js', 'w+')
418
    >>> io.write(print_p, (io.read(es5, f) for f in files), combined, combined)
419
420
421
422
423
424
425
426
427
428
429
430
431
432
    >>> pos = combined.seek(0)
    >>> print(combined.read())
    var bold = function(s) {
        return '<b>' + s + '</b>';
    };
    var italics = function(s) {
        return '<i>' + s + '</i>';
    };
    var marquee = function(s) {
        return '<marquee>' + s + '</marquee>';
    };
    var blink = function(s) {
        return '<blink>' + s + '</blink>';
    };
433
434
435
436
437
438
439
440
441
442
443
444
    //# sourceMappingURL=data:application/json;base64;...

In this example, the ``io.write`` function was provided with the pretty
unparser, an generator expression that will produce the two ASTs from
the two source files, and then both the target and sourcemap argument
are identical, which forces the source map generator to generate the
base64 encoding.

Do note that if multiple ASTs were supplied to a minifying printer with
globals being obfuscated, the resulting script will have the earlier
obfuscated global names mangled by later ones, as the unparsing is done
separately by the ``io.write`` function.
445

446
447
448
449

Advanced usage
--------------

450
451
452
453
454
455
456
457
458
459
460
461
462
Lower level unparsing API
~~~~~~~~~~~~~~~~~~~~~~~~~

Naturally, the printers demonstrated previously are constructed using
the underlying Unparser class, which in turn bridges together the walk
function and the Dispatcher class found in the walker module.  The walk
function walks through the AST node with an instance of the Dispatcher
class, which provides a description of all node types for the particular
type of AST node provided, along with the relevant handlers.  These
handlers can be set up using existing rule provider functions.  For
instance, a printer for obfuscating identifier names while maintaining
indentation for the output of an ES5 AST can be constructed like so:

463
.. code:: pycon
464
465

    >>> from calmjs.parse.unparsers.es5 import Unparser
466
467
    >>> from calmjs.parse.rules import indent
    >>> from calmjs.parse.rules import obfuscate
468
    >>> pretty_obfuscate = Unparser(rules=(
469
470
471
    ...     # note that indent must come after, so that the whitespace
    ...     # handling rules by indent will shadow over the minimum set
    ...     # provided by obfuscate.
472
    ...     obfuscate(obfuscate_globals=False),
473
    ...     indent(indent_str='    '),
474
    ... ))
Tommy Yu's avatar
Tommy Yu committed
475
    >>> math_module = es5(u'''
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
    ... (function(root) {
    ...   var fibonacci = function(count) {
    ...     if (count < 2)
    ...       return count;
    ...     else
    ...       return fibonacci(count - 1) + fibonacci(count - 2);
    ...   };
    ...
    ...   var factorial = function(n) {
    ...     if (n < 1)
    ...       throw new Error('factorial where n < 1 not supported');
    ...     else if (n == 1)
    ...       return 1;
    ...     else
    ...       return n * factorial(n - 1);
    ...   }
    ...
    ...   root.fibonacci = fibonacci;
    ...   root.factorial = factorial;
    ... })(window);
    ...
    ... var value = window.factorial(5) / window.fibonacci(5);
    ... console.log('the value is ' + value);
    ... ''')
    >>> print(''.join(c.text for c in pretty_obfuscate(math_module)))
Tommy Yu's avatar
Tommy Yu committed
501
    (function(b) {
502
503
504
505
        var a = function(b) {
            if (b < 2) return b;
            else return a(b - 1) + a(b - 2);
        };
Tommy Yu's avatar
Tommy Yu committed
506
        var c = function(a) {
507
508
            if (a < 1) throw new Error('factorial where n < 1 not supported');
            else if (a == 1) return 1;
Tommy Yu's avatar
Tommy Yu committed
509
            else return a * c(a - 1);
510
        };
Tommy Yu's avatar
Tommy Yu committed
511
512
        b.fibonacci = a;
        b.factorial = c;
513
514
515
516
    })(window);
    var value = window.factorial(5) / window.fibonacci(5);
    console.log('the value is ' + value);

Tommy Yu's avatar
Tommy Yu committed
517
518
519
520
Each of the rules (functions) have specific options that are set using
specific keyword arguments, details are documented in their respective
docstrings.

521
522
523
524
525
Tree walking
~~~~~~~~~~~~

AST (Abstract Syntax Tree) generic walker classes are defined under the
appropriate named modules ``calmjs.parse.walkers``.  Two default walker
Tommy Yu's avatar
Tommy Yu committed
526
527
528
529
530
classes are supplied.  One of them is the ``ReprWalker`` class which was
previously demonstrated.  The other is the ``Walker`` class, which
supplies a collection of generic tree walking methods for a tree of AST
nodes.  The following is an example usage on how one might extract all
Object assignments from a given script file:
531

532
.. code:: pycon
533
534

    >>> from calmjs.parse import es5
535
    >>> from calmjs.parse.asttypes import Object, VarDecl, FunctionCall
536
537
    >>> from calmjs.parse.walkers import Walker
    >>> walker = Walker()
Tommy Yu's avatar
Tommy Yu committed
538
    >>> declarations = es5(u'''
539
540
541
542
543
544
545
    ... var i = 1;
    ... var s = {
    ...     a: "test",
    ...     o: {
    ...         v: "value"
    ...     }
    ... };
546
    ... foo({foo: "bar"});
547
548
549
    ... function bar() {
    ...     var t = {
    ...         foo: "bar",
550
551
    ...     };
    ...     return t;
552
    ... }
553
554
    ... foo.bar = bar;
    ... foo.bar();
555
    ... ''')
556
    >>> # print out the object nodes that were part of some assignments
557
558
559
560
561
562
563
564
565
566
567
568
569
570
    >>> for node in walker.filter(declarations, lambda node: (
    ...         isinstance(node, VarDecl) and
    ...         isinstance(node.initializer, Object))):
    ...     print(node.initializer)
    ...
    {
      a: "test",
      o: {
        v: "value"
      }
    }
    {
      foo: "bar"
    }
571
572
573
574
575
576
577
    >>> # print out all function calls
    >>> for node in walker.filter(declarations, lambda node: (
    ...         isinstance(node, FunctionCall))):
    ...     print(node.identifier)
    ...
    foo
    foo.bar
578

579
580
Further details and example usage can be consulted from the various
docstrings found within the module.
581

Tommy Yu's avatar
Tommy Yu committed
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
Limitations
-----------

Comments currently may be incomplete
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Due to the implementation of the lexer/parser along with how the ast
node types have been implemented, there are restrictions on where the
comments may be exposed if enabled.  Currently, such limitations exists
for nodes that are created by production rules that consume multiple
lexer tokens at once - only comments preceding the first token will be
captured, with all remaining comments discarded.

For example, this limitation means that any comments before the ``else``
token will be omitted (as the comment will be provided by the ``if``
token), as the production rule for an ``If`` node consumes both these
tokens and the node as implemented only provides a single slot for
comments.  Likewise, any comments before the ``:`` token in a ternary
statement will also be discarded as that is the second token consumed
by the production rule that produces a ``Conditional`` node.
Tommy Yu's avatar
Tommy Yu committed
602

Tommy Yu's avatar
Tommy Yu committed
603
604
605
Troubleshooting
---------------

606
607
Instantiation of parser classes fails with ``UnicodeEncodeError``
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Tommy Yu's avatar
Tommy Yu committed
608
609
610
611
612

For platforms or systems that do not have utf8 configured as the default
encoding, the automatic table generation may fail when constructing a
parser instance.  An example:

613
.. code:: pycon
Tommy Yu's avatar
Tommy Yu committed
614
615
616
617
618

    >>> from calmjs.parse.parsers import es5
    >>> parser = es5.Parser()
    Traceback (most recent call last):
      ...
Tommy Yu's avatar
Tommy Yu committed
619
      File "c:\python35\....\ply\lex.py", line 1043, in lex
Tommy Yu's avatar
Tommy Yu committed
620
        lexobj.writetab(lextab, outputdir)
Tommy Yu's avatar
Tommy Yu committed
621
      File "c:\python35\....\ply\lex.py", line 195, in writetab
Tommy Yu's avatar
Tommy Yu committed
622
623
624
        tf.write('_lexstatere   = %s\n' % repr(tabre))
      File "c:\python35\lib\encodings\cp1252.py", line 19, in encode
        return codecs.charmap_encode(input,self.errors,encoding_table)[0]
Tommy Yu's avatar
Tommy Yu committed
625
    UnicodeEncodeError: 'charmap' codec can't encode character '\u02c1' ...
Tommy Yu's avatar
Tommy Yu committed
626
627
628

A workaround helper script is provided, it may be executed like so:

629
.. code:: console
Tommy Yu's avatar
Tommy Yu committed
630
631

    $ python -m calmjs.parse.parsers.optimize
Tommy Yu's avatar
Tommy Yu committed
632

Tommy Yu's avatar
Tommy Yu committed
633
634
Further details on this topic may be found in the `manual optimization`_
section of this document.
Tommy Yu's avatar
Tommy Yu committed
635

Tommy Yu's avatar
Tommy Yu committed
636
637
638
639
640
641
642
643
644
645
646
647
648
Slow performance
~~~~~~~~~~~~~~~~

As this program is basically fully decomposed into very small functions,
this result in massive performance penalties as compared to other
implementations due to function calls being one of the most expensive
operations in Python.  It may be possible to further optimize the
definitions within the description in the Dispatcher by combining all
the resolved generator functions for each asttype Node type, however
this will may require both the token and layout functions not having
arguments with name collisions, and the new function will take in all
of those arguments in one go.

Tommy Yu's avatar
Tommy Yu committed
649

Tommy Yu's avatar
Tommy Yu committed
650
651
652
653
654
655
656
Contribute
----------

- Issue Tracker: https://github.com/calmjs/calmjs.parse/issues
- Source Code: https://github.com/calmjs/calmjs.parse


Tommy Yu's avatar
Tommy Yu committed
657
658
659
Legal
-----

Tommy Yu's avatar
Tommy Yu committed
660
661
662
663
The |calmjs.parse| package is copyright (c) 2017 Auckland Bioengineering
Institute, University of Auckland.  The |calmjs.parse| package is
licensed under the MIT license (specifically, the Expat License), which
is also the same license that the package |slimit| was released under.
Tommy Yu's avatar
Tommy Yu committed
664

Tommy Yu's avatar
Tommy Yu committed
665
The lexer, parser and the other types definitions portions were
Tommy Yu's avatar
Tommy Yu committed
666
667
originally imported from the |slimit| package; |slimit| is copyright (c)
Ruslan Spivak.
Tommy Yu's avatar
Tommy Yu committed
668

Tommy Yu's avatar
Tommy Yu committed
669
The Calmjs project is copyright (c) 2017 Auckland Bioengineering
Tommy Yu's avatar
Tommy Yu committed
670
Institute, University of Auckland.