Commit 0beb2b80 authored by SVN-Git Migration's avatar SVN-Git Migration

Imported Upstream version 0.7.0

parents
Copyright (c) Alex Gaynor and individual contributors.
All rights reserved.
Redistribution and use in source and binary forms, with or without modification,
are permitted provided that the following conditions are met:
1. Redistributions of source code must retain the above copyright notice,
this list of conditions and the following disclaimer.
2. Redistributions in binary form must reproduce the above copyright
notice, this list of conditions and the following disclaimer in the
documentation and/or other materials provided with the distribution.
3. Neither the name of rply nor the names of its contributors may be used
to endorse or promote products derived from this software without
specific prior written permission.
THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND
ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED
WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE FOR
ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES
(INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES;
LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON
ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
(INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
Metadata-Version: 1.0
Name: rply
Version: 0.7.0
Summary: A pure Python Lex/Yacc that works with RPython
Home-page: UNKNOWN
Author: Alex Gaynor
Author-email: alex.gaynor@gmail.com
License: UNKNOWN
Description: RPLY
====
.. image:: https://secure.travis-ci.org/alex/rply.png
:target: http://travis-ci.org/alex/rply
Welcome to RPLY! A pure python parser generator, that also works with RPython.
It is a more-or-less direct port of David Beazley's awesome PLY, with a new
public API, and RPython support.
Basic API:
.. code:: python
from rply import ParserGenerator, LexerGenerator
from rply.token import BaseBox
lg = LexerGenerator()
# Add takes a rule name, and a regular expression that defines the rule.
lg.add("PLUS", r"\+")
lg.add("MINUS", r"-")
lg.add("NUMBER", r"\d+")
lg.ignore(r"\s+")
# This is a list of the token names. precedence is an optional list of
# tuples which specifies order of operation for avoiding ambiguity.
# precedence must be one of "left", "right", "nonassoc".
# cache_id is an optional string which specifies an ID to use for
# caching. It should *always* be safe to use caching,
# RPly will automatically detect when your grammar is
# changed and refresh the cache for you.
pg = ParserGenerator(["NUMBER", "PLUS", "MINUS"],
precedence=[("left", ['PLUS', 'MINUS'])], cache_id="myparser")
@pg.production("main : expr")
def main(p):
# p is a list, of each of the pieces on the right hand side of the
# grammar rule
return p[0]
@pg.production("expr : expr PLUS expr")
@pg.production("expr : expr MINUS expr")
def expr_op(p):
lhs = p[0].getint()
rhs = p[2].getint()
if p[1].gettokentype() == "PLUS":
return BoxInt(lhs + rhs)
elif p[1].gettokentype() == "MINUS":
return BoxInt(lhs - rhs)
else:
raise AssertionError("This is impossible, abort the time machine!")
@pg.production("expr : NUMBER")
def expr_num(p):
return BoxInt(int(p[0].getstr()))
lexer = lg.build()
parser = pg.build()
class BoxInt(BaseBox):
def __init__(self, value):
self.value = value
def getint(self):
return self.value
Then you can do:
.. code:: python
parser.parse(lexer.lex("1 + 3 - 2+12-32"))
You can also substitute your own lexer. A lexer is an object with a ``next()``
method that returns either the next token in sequence, or ``None`` if the token
stream has been exhausted.
Why do we have the boxes?
-------------------------
In RPython, like other statically typed languages, a variable must have a
specific type, we take advantage of polymorphism to keep values in a box so
that everything is statically typed. You can write whatever boxes you need for
your project.
If you don't intend to use your parser from RPython, and just want a cool pure
Python parser you can ignore all the box stuff and just return whatever you
like from each production method.
Error handling
--------------
By default, when a parsing error is encountered, an ``rply.ParsingError`` is
raised, it has a method ``getsourcepos()``, which returns an
``rply.token.SourcePosition`` object.
You may also provide an error handler, which, at the moment, must raise an
exception. It receives the ``Token`` object that the parser errored on.
.. code:: python
pg = ParserGenerator(...)
@pg.error
def error_handler(token):
raise ValueError("Ran into a %s where it wasn't expected" % token.gettokentype())
Python compatibility
--------------------
RPly is tested and known to work under Python 2.6, 2.7, 3.1, and 3.2. It is
also valid RPython for PyPy checkouts from ``6c642ae7a0ea`` onwards.
Links
-----
* `Source code and issue tracker <https://github.com/alex/rply/>`_
* `PyPI releases <https://pypi.python.org/pypi/rply>`_
* `Talk at PyCon US 2013: So you want to write an interpreter? <http://pyvideo.org/video/1694/so-you-want-to-write-an-interpreter>`_
Platform: UNKNOWN
RPLY
====
.. image:: https://secure.travis-ci.org/alex/rply.png
:target: http://travis-ci.org/alex/rply
Welcome to RPLY! A pure python parser generator, that also works with RPython.
It is a more-or-less direct port of David Beazley's awesome PLY, with a new
public API, and RPython support.
Basic API:
.. code:: python
from rply import ParserGenerator, LexerGenerator
from rply.token import BaseBox
lg = LexerGenerator()
# Add takes a rule name, and a regular expression that defines the rule.
lg.add("PLUS", r"\+")
lg.add("MINUS", r"-")
lg.add("NUMBER", r"\d+")
lg.ignore(r"\s+")
# This is a list of the token names. precedence is an optional list of
# tuples which specifies order of operation for avoiding ambiguity.
# precedence must be one of "left", "right", "nonassoc".
# cache_id is an optional string which specifies an ID to use for
# caching. It should *always* be safe to use caching,
# RPly will automatically detect when your grammar is
# changed and refresh the cache for you.
pg = ParserGenerator(["NUMBER", "PLUS", "MINUS"],
precedence=[("left", ['PLUS', 'MINUS'])], cache_id="myparser")
@pg.production("main : expr")
def main(p):
# p is a list, of each of the pieces on the right hand side of the
# grammar rule
return p[0]
@pg.production("expr : expr PLUS expr")
@pg.production("expr : expr MINUS expr")
def expr_op(p):
lhs = p[0].getint()
rhs = p[2].getint()
if p[1].gettokentype() == "PLUS":
return BoxInt(lhs + rhs)
elif p[1].gettokentype() == "MINUS":
return BoxInt(lhs - rhs)
else:
raise AssertionError("This is impossible, abort the time machine!")
@pg.production("expr : NUMBER")
def expr_num(p):
return BoxInt(int(p[0].getstr()))
lexer = lg.build()
parser = pg.build()
class BoxInt(BaseBox):
def __init__(self, value):
self.value = value
def getint(self):
return self.value
Then you can do:
.. code:: python
parser.parse(lexer.lex("1 + 3 - 2+12-32"))
You can also substitute your own lexer. A lexer is an object with a ``next()``
method that returns either the next token in sequence, or ``None`` if the token
stream has been exhausted.
Why do we have the boxes?
-------------------------
In RPython, like other statically typed languages, a variable must have a
specific type, we take advantage of polymorphism to keep values in a box so
that everything is statically typed. You can write whatever boxes you need for
your project.
If you don't intend to use your parser from RPython, and just want a cool pure
Python parser you can ignore all the box stuff and just return whatever you
like from each production method.
Error handling
--------------
By default, when a parsing error is encountered, an ``rply.ParsingError`` is
raised, it has a method ``getsourcepos()``, which returns an
``rply.token.SourcePosition`` object.
You may also provide an error handler, which, at the moment, must raise an
exception. It receives the ``Token`` object that the parser errored on.
.. code:: python
pg = ParserGenerator(...)
@pg.error
def error_handler(token):
raise ValueError("Ran into a %s where it wasn't expected" % token.gettokentype())
Python compatibility
--------------------
RPly is tested and known to work under Python 2.6, 2.7, 3.1, and 3.2. It is
also valid RPython for PyPy checkouts from ``6c642ae7a0ea`` onwards.
Links
-----
* `Source code and issue tracker <https://github.com/alex/rply/>`_
* `PyPI releases <https://pypi.python.org/pypi/rply>`_
* `Talk at PyCon US 2013: So you want to write an interpreter? <http://pyvideo.org/video/1694/so-you-want-to-write-an-interpreter>`_
from rply.errors import ParsingError
from rply.lexergenerator import LexerGenerator
from rply.parsergenerator import ParserGenerator
from rply.token import Token
__all__ = [
"LexerGenerator", "ParserGenerator", "ParsingError", "Token"
]
class ParserGeneratorError(Exception):
pass
class LexingError(Exception):
def __init__(self, message, source_pos):
self.message = message
self.source_pos = source_pos
def getsourcepos(self):
return self.source_pos
class ParsingError(Exception):
def __init__(self, message, source_pos):
self.message = message
self.source_pos = source_pos
def getsourcepos(self):
return self.source_pos
class ParserGeneratorWarning(Warning):
pass
from rply.errors import ParserGeneratorError
from rply.utils import iteritems
def rightmost_terminal(symbols, terminals):
for sym in reversed(symbols):
if sym in terminals:
return sym
return None
class Grammar(object):
def __init__(self, terminals):
# A list of all the productions
self.productions = [None]
# A dictionary mapping the names of non-terminals to a list of all
# productions of that nonterminal
self.prod_names = {}
# A dictionary mapping the names of terminals to a list of the rules
# where they are used
self.terminals = dict((t, []) for t in terminals)
self.terminals["error"] = []
# A dictionary mapping names of nonterminals to a list of rule numbers
# where they are used
self.nonterminals = {}
self.first = {}
self.follow = {}
self.precedence = {}
self.start = None
def add_production(self, prod_name, syms, func, precedence):
if prod_name in self.terminals:
raise ParserGeneratorError("Illegal rule name %r" % prod_name)
if precedence is None:
precname = rightmost_terminal(syms, self.terminals)
prod_prec = self.precedence.get(precname, ("right", 0))
else:
try:
prod_prec = self.precedence[precedence]
except KeyError:
raise ParserGeneratorError(
"Precedence %r doesn't exist" % precedence
)
pnumber = len(self.productions)
self.nonterminals.setdefault(prod_name, [])
for t in syms:
if t in self.terminals:
self.terminals[t].append(pnumber)
else:
self.nonterminals.setdefault(t, []).append(pnumber)
p = Production(pnumber, prod_name, syms, prod_prec, func)
self.productions.append(p)
self.prod_names.setdefault(prod_name, []).append(p)
def set_precedence(self, term, assoc, level):
if term in self.precedence:
raise ParserGeneratorError(
"Precedence already specified for %s" % term
)
if assoc not in ["left", "right", "nonassoc"]:
raise ParserGeneratorError(
"Precedence must be one of left, right, nonassoc; not %s" % (
assoc
)
)
self.precedence[term] = (assoc, level)
def set_start(self):
start = self.productions[1].name
self.productions[0] = Production(0, "S'", [start], ("right", 0), None)
self.nonterminals[start].append(0)
self.start = start
def unused_terminals(self):
return [
t
for t, prods in iteritems(self.terminals)
if not prods and t != "error"
]
def unused_productions(self):
return [p for p, prods in iteritems(self.nonterminals) if not prods]
def build_lritems(self):
"""
Walks the list of productions and builds a complete set of the LR
items.
"""
for p in self.productions:
lastlri = p
i = 0
lr_items = []
while True:
if i > p.getlength():
lri = None
else:
try:
before = p.prod[i - 1]
except IndexError:
before = None
try:
after = self.prod_names[p.prod[i]]
except (IndexError, KeyError):
after = []
lri = LRItem(p, i, before, after)
lastlri.lr_next = lri
if lri is None:
break
lr_items.append(lri)
lastlri = lri
i += 1
p.lr_items = lr_items
def _first(self, beta):
result = []
for x in beta:
x_produces_empty = False
for f in self.first[x]:
if f == "<empty>":
x_produces_empty = True
else:
if f not in result:
result.append(f)
if not x_produces_empty:
break
else:
result.append("<empty>")
return result
def compute_first(self):
for t in self.terminals:
self.first[t] = [t]
self.first["$end"] = ["$end"]
for n in self.nonterminals:
self.first[n] = []
changed = True
while changed:
changed = False
for n in self.nonterminals:
for p in self.prod_names[n]:
for f in self._first(p.prod):
if f not in self.first[n]:
self.first[n].append(f)
changed = True
def compute_follow(self):
for k in self.nonterminals:
self.follow[k] = []
start = self.start
self.follow[start] = ["$end"]
added = True
while added:
added = False
for p in self.productions[1:]:
for i, B in enumerate(p.prod):
if B in self.nonterminals:
fst = self._first(p.prod[i + 1:])
has_empty = False
for f in fst:
if f != "<empty>" and f not in self.follow[B]:
self.follow[B].append(f)
added = True
if f == "<empty>":
has_empty = True
if has_empty or i == (len(p.prod) - 1):
for f in self.follow[p.name]:
if f not in self.follow[B]:
self.follow[B].append(f)
added = True
class Production(object):
def __init__(self, num, name, prod, precedence, func):
self.name = name
self.prod = prod
self.number = num
self.func = func
self.prec = precedence
self.unique_syms = []
for s in self.prod:
if s not in self.unique_syms:
self.unique_syms.append(s)
self.lr_items = []
self.lr_next = None
self.lr0_added = 0
self.reduced = 0
def __repr__(self):
return "Production(%s -> %s)" % (self.name, " ".join(self.prod))
def getlength(self):
return len(self.prod)
class LRItem(object):
def __init__(self, p, n, before, after):
self.name = p.name
self.prod = p.prod[:]
self.prod.insert(n, ".")
self.number = p.number
self.lr_index = n
self.lookaheads = {}
self.unique_syms = p.unique_syms
self.lr_before = before
self.lr_after = after
def __repr__(self):
return "LRItem(%s -> %s)" % (self.name, " ".join(self.prod))
def getlength(self):
return len(self.prod)
from rply.errors import LexingError
from rply.token import SourcePosition, Token
class Lexer(object):
def __init__(self, rules, ignore_rules):
self.rules = rules
self.ignore_rules = ignore_rules
def lex(self, s):
return LexerStream(self, s)
class LexerStream(object):
def __init__(self, lexer, s):
self.lexer = lexer
self.s = s
self.idx = 0
self._lineno = 1
def __iter__(self):
return self
def _update_pos(self, match):
self.idx = match.end
self._lineno += self.s.count("\n", match.start, match.end)
last_nl = self.s.rfind("\n", 0, match.start)
if last_nl < 0:
return match.start + 1
else:
return match.start - last_nl
def next(self):
if self.idx >= len(self.s):
raise StopIteration
for rule in self.lexer.ignore_rules:
match = rule.matches(self.s, self.idx)
if match:
self._update_pos(match)
return self.next()
for rule in self.lexer.rules:
match = rule.matches(self.s, self.idx)
if match:
colno = self._update_pos(match)
source_pos = SourcePosition(match.start, self._lineno, colno)
token = Token(
rule.name, self.s[match.start:match.end], source_pos
)
return token
else:
raise LexingError(None, SourcePosition(self.idx, -1, -1))
def __next__(self):
return self.next()
import re
try:
import rpython
from rpython.annotator import model
from rpython.annotator.bookkeeper import getbookkeeper
from rpython.rlib.objectmodel import instantiate, hlinvoke
from rpython.rlib.rsre import rsre_core
from rpython.rlib.rsre.rpy import get_code
from rpython.rtyper.annlowlevel import llstr, hlstr
from rpython.rtyper.extregistry import ExtRegistryEntry
from rpython.rtyper.lltypesystem import lltype
from rpython.rtyper.lltypesystem.rlist import FixedSizeListRepr
from rpython.rtyper.lltypesystem.rstr import STR, string_repr
from rpython.rtyper.rmodel import Repr
from rpython.tool.pairtype import pairtype
except ImportError:
rpython = None
from rply.lexer import Lexer
class Rule(object):
def __init__(self, name, pattern):
self.name = name
self.re = re.compile(pattern)
def _freeze_(self):
return True
def matches(self, s, pos):
m = self.re.match(s, pos)
return Match(*m.span(0)) if m is not None else None
class Match(object):
_attrs_ = ["start", "end"]
def __init__(self, start, end):
self.start = start
self.end = end
class LexerGenerator(object):
def __init__(self):
self.rules = []
self.ignore_rules = []
def add(self, name, pattern):
self.rules.append(Rule(name, pattern))
def ignore(self, pattern):
self.ignore_rules.append(Rule("", pattern))
def build(self):
return Lexer(self.rules, self.ignore_rules)
if rpython:
class RuleEntry(ExtRegistryEntry):
_type_ = Rule
def compute_annotation(self, *args):
return SomeRule()
class SomeRule(model.SomeObject):
def rtyper_makekey(self):
return (type(self),)
def rtyper_makerepr(self, rtyper):
return RuleRepr(rtyper)
def method_matches(self, s_s, s_pos):
assert model.SomeString().contains(s_s)
assert model.SomeInteger(nonneg=True).contains(s_pos)
bk = getbookkeeper()
init_pbc = bk.immutablevalue(Match.__init__)
bk.emulate_pbc_call((self, "match_init"), init_pbc, [
model.SomeInstance(bk.getuniqueclassdef(Match)),
model.SomeInteger(nonneg=True),
model.SomeInteger(nonneg=True)
])
init_pbc = bk.immutablevalue(rsre_core.StrMatchContext.__init__)
bk.emulate_pbc_call((self, "str_match_context_init"), init_pbc, [
model.SomeInstance(bk.getuniqueclassdef(rsre_core.StrMatchContext)),
bk.newlist(model.SomeInteger(nonneg=True)),
model.SomeString(),
model.SomeInteger(nonneg=True),
model.SomeInteger(nonneg=True),
model.SomeInteger(nonneg=True),
])
match_context_pbc = bk.immutablevalue(rsre_core.match_context)
bk.emulate_pbc_call((self, "match_context"), match_context_pbc, [
model.SomeInstance(bk.getuniqueclassdef(rsre_core.StrMatchContext)),
])
return model.SomeInstance(getbookkeeper().getuniqueclassdef(Match), can_be_None=True)
def getattr(self, s_attr):
if s_attr.is_constant() and s_attr.const == "name":
return model.SomeString()
return super(SomeRule, self).getattr(s_attr)
class __extend__(pairtype(SomeRule, SomeRule)):
def union(self):
return SomeRule()
class RuleRepr(Repr):
def __init__(self, rtyper):
super(RuleRepr, self).__init__()
self.ll_rule_cache = {}
self.match_init_repr = rtyper.getrepr(
rtyper.annotator.bookkeeper.immutablevalue(Match.__init__)
)
self.match_context_init_repr = rtyper.getrepr(
rtyper.annotator.bookkeeper.immutablevalue(rsre_core.StrMatchContext.__init__)
)
self.match_context_repr = rtyper.getrepr(
rtyper.annotator.bookkeeper.immutablevalue(rsre_core.match_context)
)
list_repr = FixedSizeListRepr(rtyper, rtyper.getrepr(model.SomeInteger(nonneg=True)))
list_repr._setup_repr()
self.lowleveltype = lltype.Ptr(lltype.GcStruct(
"RULE",
("name", lltype.Ptr(STR)),
("code", list_repr.lowleveltype),
))
def convert_const(self, rule):
if rule not in self.ll_rule_cache:
ll_rule = lltype.malloc(self.lowleveltype.TO)
ll_rule.name = llstr(rule.name)
code = get_code(rule.re.pattern)
ll_rule.code = lltype.malloc(self.lowleveltype.TO.code.TO, len(code))
for i, c in enumerate(code):
ll_rule.code[i] = c
self.ll_rule_cache[rule] = ll_rule
return self.ll_rule_cache[rule]
def rtype_getattr(self, hop):
s_attr = hop.args_s[1]
if s_attr.is_constant() and s_attr.const == "name":
v_rule = hop.inputarg(self, arg=0)
return hop.gendirectcall(LLRule.ll_get_name, v_rule)
return super(RuleRepr, self).rtype_getattr(hop)
def rtype_method_matches(self, hop):
[v_rule, v_s, v_pos] = hop.inputargs(self, string_repr, lltype.Signed)
c_MATCHTYPE = hop.inputconst(lltype.Void, Match)
c_MATCH_INIT = hop.inputconst(lltype.Void, self.match_init_repr)
c_MATCH_CONTEXTTYPE = hop.inputconst(lltype.Void, rsre_core.StrMatchContext)
c_MATCH_CONTEXT_INIT = hop.inputconst(lltype.Void, self.match_context_init_repr)
c_MATCH_CONTEXT = hop.inputconst(lltype.Void, self.match_context_repr)