Commit 61b63f4d authored by Tony Mancill's avatar Tony Mancill

Imported Upstream version 2.9.0

parent 579bb0b9
......@@ -23,3 +23,4 @@ grammar
writer
generator
next
rmshift
......@@ -49,5 +49,6 @@ void setLocations()
DOC = BASE + "/share/doc/bisonc++";
COMPILER = "g++";
// COMPILER = "g++-4.5";
}
GNU GENERAL PUBLIC LICENSE
Version 3, 29 June 2007
GNU GENERAL PUBLIC LICENSE
Version 3, 29 June 2007
Copyright (C) 2007 Free Software Foundation, Inc. <http://fsf.org/>
Everyone is permitted to copy and distribute verbatim copies
of this license document, but changing it is not allowed.
Preamble
Preamble
The GNU General Public License is a free, copyleft license for
software and other kinds of works.
......@@ -69,7 +68,7 @@ patents cannot be used to render the program non-free.
The precise terms and conditions for copying, distribution and
modification follow.
TERMS AND CONDITIONS
TERMS AND CONDITIONS
0. Definitions.
......@@ -77,7 +76,7 @@ modification follow.
"Copyright" also means copyright-like laws that apply to other kinds of
works, such as semiconductor masks.
"The Program" refers to any copyrightable work licensed under this
License. Each licensee is addressed as "you". "Licensees" and
"recipients" may be individuals or organizations.
......@@ -510,7 +509,7 @@ actual knowledge that, but for the patent license, your conveying the
covered work in a country, or your recipient's use of the covered work
in a country, would infringe one or more identifiable patents in that
country that you have reason to believe are valid.
If, pursuant to or in connection with a single transaction or
arrangement, you convey, or propagate by procuring conveyance of, a
covered work, and grant a patent license to some of the parties
......@@ -619,9 +618,9 @@ an absolute waiver of all civil liability in connection with the
Program, unless a warranty or assumption of liability accompanies a
copy of the Program in return for a fee.
END OF TERMS AND CONDITIONS
END OF TERMS AND CONDITIONS
How to Apply These Terms to Your New Programs
How to Apply These Terms to Your New Programs
If you develop a new program, and you want it to be of the greatest
possible use to the public, the best way to achieve this is to make it
......@@ -673,4 +672,3 @@ may consider it more useful to permit linking proprietary applications with
the library. If this is what you want to do, use the GNU Lesser General
Public License instead of this License. But first, please read
<http://www.gnu.org/philosophy/why-not-lgpl.html>.
......@@ -36,27 +36,27 @@ the classes used by Rules; RmReduction does not depend on any other class).
| | | | |
+--------+---------------+ LookaheadSet Grammar
| | |
| +--+--+ |
| | | |
| RmReduction RRData | |
| | | | |
| +-----+-----+ | |
| | | |
| StateType SRSolution StateItem | |
| | | | | |
| +-----+-----+--------+ | |
| | | | |
| Next | | |
| | | | |
| SrConflict RRConflict Item |
Parser | | | |
| +--------------+-----------+ |
| | |
| State |
| +---+---+ |
| | | |
| RmShift RmReduction RRData Item |
| | | | | |
| +------------+-----+----+-------+ |
| | |
| Writer |
| | |
+-------------+---------------+ |
| StateType EnumSolution StateItem |
| | | | |
| +-----------+----------+ |
| | | |
| Next | |
| | | |
| SrConflict RRConflict |
Parser | | |
| +-----+----+ |
| | |
| State |
| | |
| Writer |
| | |
+-------------+----------+ |
| |
Generator |
| |
......@@ -65,3 +65,5 @@ the classes used by Rules; RmReduction does not depend on any other class).
bisonc++
Output to the xxx.output file by the --verbose or --construction flags are
mostly handled from the State::allStates() function. This function calls
State's operator<<, which is initialized by a pointer pointing to either
insertStd,
insertExt
or
skipInsertion.
Inspect state/insertstd.cc for the standard insertion method and
state/insertext.cc for the extensive insertion method.
The information in this file is closely related to what's happening in
state/define.cc. Refer to define.cc for the implementation of the process
described below. All functions mentioned below are defined by the class State.
Defining states proceeds as follows:
0. The initial state is constructed. It contains the augmented grammar's
production rule. This part is realized by the static member
initialState();
1. From the state's kernel item(s) all implied rules are added as
additional state items. This results in a vector of (kernel/non-kernel)
items, as well as per item the numbers of the items that are affected
by this item. This information is used later on to propagate the
LA's. This part is realized by the member
setItems()
This fills the StateItem::Vector vector. A StateItem contains
1. an item (containing a production rule, dot position, and LA set)
2. a LA-enlarged flag, raised when an item's LA set is enlarged
3. a size_t vector of indices of `dependent' items, indicating which
items have LA sets that depend on the current item
(StateItem::d_child).
4. The size_t field `next' holds the index in d_nextVector,
allowing quick access of the d_nextVector element defining the
state having the current item as its kernel. A next value 'npos'
indicates that the item does not belong to a next-kernel.
E.g.,
StateItem:
---------------------------------------------------------------
item LA-enlarged LA-set dependent next next
stateitems state LA-enlarged
---------------------------------------------------------------
S* -> . S, false, EOF, (1, 2) 0 true/false
...
---------------------------------------------------------------
Also, State::d_nextVector vector is filled.
A Next element contains
0. The symbol on which the transition takes place
1. The number of the next state
2. The indices of the StateItem::Vector defining the next
state's kernel
E.g.,
Next:
-------------------------------
On next next kernel
Symbol state from items
-------------------------------
S ? (0, 1)
...
-------------------------------
Previously, nextOnSymbol() was called here. It simply removed the
production rules where dot appeared at the rule's end, since they will not
become part of the next state's kernels.
Empty production rules don't require special handling as they won't appear
in the Next table, since there's no transition on them. Thus, the
previously mentioned nextOnSymbol() function is now no longer required.
Next, from these facilities all states are constructed. LA propagation is
performed after the state construction since LA propagation is an
inherently recursive process, and state construction isn't. State
construction takes place (in the while loop in State::define.cc following
the initial state construction).
2. Following the state construction the lookaheads (LAs) will be propagated
over the items in the current state. This is where previous bisonc++
versions erred. LAs are distributed over and determined for each
individual item, and are then inherited by the next states. Also, LAs can
be determined during the construction of a state, instead of during a
separate cycle. LAs are propagated from the initial state over the
dependent StateItems. Lookahead propagation is performed by the member
propagateLA()
3. Then, from the Next::Vector constructed at (1) the next states
are constructed. This is realized by the member
constructNext()
A next state is only constructed if it wasn't constructed yet. For a new
state, the construct() member is called. Construct() calls setItems()
and propagateLA(). Otherwise, propagateLA will also be called for all
states having kernels whose next LA-enlarged flag is set.
4. Once all states have been constructed, conflicts are located and
solved. If a state contains conflict, they are resolved and
information about these conflicts is stored in an SRConflict::Vector
and/or RRConflict::Vector. Conflicts are identified and resolved by the
member:
(static)checkConflicts();
5. S/R conflicts are handled by the d_srConflict object. This object received
at construction time a context consisting of the state's
d_itemVector vector and d_nextVector vector as well as d_reducible
containing all indices of reducible items. Each of these indices
reducible item index together with the context consisting of the state's
d_itemVector vector and d_nextVector vector is passed to Next's
checkShiftReduceConflict() member. Here observed shift-reduce conflicts
are solved. Here is how this is done:
Assume a state's itemVector holds the following StateItems:
0: [P11 3] expression -> expression '-' expression .
{ EOLN '+' '-' '*' '/' ')' } 0, 1, () -1
1: [P10 1] expression -> expression . '+' expression
{ EOLN '+' '-' '*' '/' ')' } 0, 0, () 0
2: [P11 1] expression -> expression . '-' expression
{ EOLN '+' '-' '*' '/' ')' } 0, 0, () 1
3: [P12 1] expression -> expression . '*' expression
{ EOLN '+' '-' '*' '/' ')' } 0, 0, () 2
4: [P13 1] expression -> expression . '/' expression
{ EOLN '+' '-' '*' '/' ')' } 0, 0, () 3
and the associated nextVector is:
0: On '+' to state 15 with (1 )
1: On '-' to state 16 with (2 )
2: On '*' to state 17 with (3 )
3: On '/' to state 18 with (4 )
Conflicts are inspected for all reducible elements. Here this is the
element having index 0. Inspection involves (but see below for an
extension of this process when the LHS of a reducible item differs from
the LHS of a non-reducible item):
1. The nextVector's symbols are searched for in the LA set of the
reduction item (so, subsequently '+', '-', '*' and '/' are searched
for in the LA set of itemVector[0]).
2. In this case, all are found and depending on the token's priority
and the rule's priority either a shift or a reduce is selected.
Production rules received their priority setting either explicitly (using
%prec) or from their first terminal token. See also
rules/updateprecedences.cc
Different LHS elements of items:
As pointed out by Ramanand Mandayam, S/R conflicts may be observed when
reducible rules merely consist of non-terminals. Here is an example:
%left '*'
%token ID
%%
expr:
term
;
term:
term '*' primary
|
ID
;
primary:
'-' expr
|
ID
;
This grammar contains the following state
State 2:
0: [P1 1] expr -> term . { <EOF> } 1, () -1
1: [P2 1] term -> term . '*' primary { '*' <EOF> } 0, () 0
0: On '*' to state 4 with (1 )
Reduce item(s): 0
Here, item 0 reduces to N 'expr' and item 1 requires a shift in a
production rule of the N 'term'.
In these cases the rule 'expr -> term .' has no precedence that can be
derived from either %prec or an initial terminal. Such reductions
automatically receive the highest possible precedence and 'reduce' is
used, rather than 'shift'. Since there is no explicit basis for this
choice the choice between shift and reduce is flagged as a conflict.
#define VERSION "2.7.0"
#define VERSION "2.9.0"
#define YEARS "2005-2010"
......@@ -144,7 +144,7 @@ try
}
catch(Errno const &err)
{
cerr << err.what() << endl;
cerr << err.why() << '\n';
return err.which();
}
catch(int x)
......
......@@ -12,7 +12,10 @@ class Block: private std::string
std::string d_source; // the source in which the block
// was found. The block's text itself
// is in the Block's base class
int d_count;
int d_count; // curly braces nesting count, handled
// by clear(), close(), and open()
public:
typedef std::pair<size_t, size_t> Range;
......
bisonc++ (2.9.0)
* Changed Errno::what() call to Errno::why()
* Removed dependencies on Msg, using Mstreams and Errno::open
instead. Consequently, bisonc++ depends on at least Bobcat 2.9.0
-- Frank B. Brokken <f.b.brokken@rug.nl> Sat, 30 Oct 2010 22:05:30 +0200
bisonc++ (2.8.0)
* Grammars having states consisting of items in which a reduction from a
(series of) non-terminals is indicated have automatically a higher
precedence than items in which a shift is required. Therefore, in these
cases the shift/reduce conflict is solved by a reduce, rather than a
shift. See README.states-and-conflicts, srconflict/visitreduction.cc and
the Bisonc++ manual, section 'Rule precedence' for examples and further
information. These grammars are now showing S/R conflicts, which remained
undetected in earlier versions of Bisonc++. The example was brought to my
attention by Ramanand Mandayam (thanks, Ramanand!).
* To the set of regression tests another test was added containing a grammar
having two S/R conflicts resulting from automatically selecting reductions
rather than shifts. This test was named 'mandayam'.
* Output generated by --verbose and --construction now shows in more detail
how S/R conflicts are handled. The Bisonc++ manual also received an
additional section explaining when reduces are used with certain S/R
conflicts.
* Previously the documentation stated that --construction writes the
construction process to stdout, whereas it is written to the same file as
used by --verbose. This is now repaired.
* The meaning/use of the data members of all classes are now described at
the data members in all the classes' header files.
-- Frank B. Brokken <f.b.brokken@rug.nl> Sun, 08 Aug 2010 15:15:46 +0200
bisonc++ (2.7.0)
* $-characters appearing in strings or character constants in action blocks
......
......@@ -250,10 +250,13 @@ files already exist.
is not inspected anymore.
it() loption(construction)nl()
This option may be specified to write details about the
construction of the parsing tables to the standard output
stream. This information is primarily useful for developers, and
augments the information written to the verbose grammar output
file, produced by the tt(--verbose) option.
construction of the parsing tables to the same file as written by
the tt(--verbose) option (i.e., tt(<parse>.output), where
tt(<parse>) is the filename (without the tt(.cc) extension) of the
file containing tt(parse)'s implementation). This information is
primarily useful for developers. It augments the information
written to the verbose grammar output file, generated by the
tt(--verbose) option.
it() loption(debug)nl()
Provide tt(parse) and its support functions with debugging code,
showing the actual parsing process on the standard output
......@@ -421,15 +424,14 @@ files already exist.
Write basic usage information to the standard output stream and
terminate.
it() loption(verbose) (soption(V))nl()
Write a file containing verbose descriptions of
the parser states and what is done for each type of look-ahead
token in that state. This file also describes all conflicts
detected in the grammar, both those resolved by operator
precedence and those that remain unresolved. By default it will
not be created, but if requested it will receive the filename
tt(<parse>.output), where tt(<parse>) is the filename (without the
tt(.cc) extension) of the file containing tt(parse)'s
implementation.
Write a file containing verbose descriptions of the parser states
and what is done for each type of look-ahead token in that state.
This file also describes all conflicts detected in the grammar,
both those resolved by operator precedence and those that remain
unresolved. By default it will not be created, but if requested
it will receive the filename tt(<parse>.output), where tt(<parse>)
is the filename (without the tt(.cc) extension) of the file
containing tt(parse)'s implementation.
it() loption(version) (soption(v))nl()
Display bic()'s version number and terminate.
)
......
......@@ -22,7 +22,6 @@ sect(Analyzing A Grammar)
subsubsect(Preamble)
includefile(algorithm/pstates.yo)
includefile(algorithm/transition.yo)
lsubsect(PARSING)(Processing Input)
......@@ -46,6 +45,9 @@ includefile(algorithm/precedence.yo)
subsect(How Precedence Works)
includefile(algorithm/howprec.yo)
subsect(Rule precedence)
includefile(algorithm/ruleprec.yo)
lsect(CONDEP)(Context-Dependent Precedence)
includefile(algorithm/condep.yo)
......
......@@ -58,10 +58,34 @@ conflicts, use the tt(%expect n) directive. There will be no warning as long
as the number of shift/reduce conflicts is exactly tt(n). See section
ref(EXPECT).
The definition of if_stmt above is solely to blame for the conflict, but the
plain tt(stmnt) rule, consisting of two recursive alternatives will of course
never be able to match actual input, since there's no way for the grammar to
eventually derive a sentence this way. Adding one non-recursive alternative is
enough to convert the grammar into one that em(does) derive sentences. Here is
a complete b() input file that actually manifests the conflict:
The definition of tt(if_stmt) above is solely to blame for the conflict, but
the plain tt(stmnt) rule, consisting of two recursive alternatives will of
course never be able to match actual input, since there's no way for the
grammar to eventually derive a sentence this way. Adding one non-recursive
alternative is enough to convert the grammar into one that em(does) derive
sentences. Here is a complete b() input file that actually manifests the
conflict:
verbinclude(algorithm/examples/dangling)
Looking again at the dangling else problem note that there are multiple ways
to handle tt(stmnt) productions. Depending on the particular input that is
provided it could
either be reduced to a tt(stmt) or the parser could continue to consume input
by processing an tt(ELSE) token, eventually resulting in the recognition of
tt(IF '(' VAR ')' stmt ELSE stmt) as a tt(stmt).
There is little we can do but resorting to tt(%expect) to handle the dangling
else problem. The default handling is what most people intuitively expect and
so in this case using tt(%expect 1) is an easy way to prevent b() from
reporting a shift/reduce conflict. But shift/reduce conflicts are most often
solved by specifying disambiguating rules specifying priorities or
associations, usually in the context of arithmetic expressions, as discussed
in the next sections.
However, shift-reduce conflicts can also be observed in grammars where a state
contains items that could be reduced to a certain non-terminal and items in
which a shift is possible in an item of a production rule of a completely
different non-terminal. Here is an example of such a grammar:
verbinclude(algorithm/examples/peculiar)
Why these grammars show shift reduce conflicts and how these are solved is
discussed in the next section.
// Generated by Bisonc++ V2.7.1 on Sat, 07 Aug 2010 21:39:39 +0200
#ifndef ParserBase_h_included
#define ParserBase_h_included
#include <vector>
#include <iostream>
// $insert debugincludes
#include <iostream>
#include <sstream>
#include <string>
#include <map>
#include <iomanip>
namespace // anonymous
{
struct PI;
struct PI__;
}
......@@ -16,60 +25,101 @@ class ParserBase
// $insert tokens
// Symbolic tokens:
enum Tokens
enum Tokens__
{
WORD = 260,
ID = 257,
UNARY,
};
// $insert STYPE
typedef CHARP STYPE;
typedef int STYPE__;
private:
int d_stackIdx;
std::vector<size_t> d_stateStack;
std::vector<STYPE> d_valueStack;
int d_stackIdx__;
std::vector<size_t> d_stateStack__;
std::vector<STYPE__> d_valueStack__;
protected:
enum Return
enum Return__
{
PARSE_ACCEPT = 0, // values used as parse()'s return values
PARSE_ABORT = 1
PARSE_ACCEPT__ = 0, // values used as parse()'s return values
PARSE_ABORT__ = 1
};
enum ErrorRecovery
enum ErrorRecovery__
{
DEFAULT_RECOVERY_MODE,
UNEXPECTED_TOKEN,
DEFAULT_RECOVERY_MODE__,
UNEXPECTED_TOKEN__,
};
bool d_debug;
size_t d_nErrors;
int d_token;
size_t d_state;
STYPE *d_vsp;
STYPE d_val;
bool d_debug__;
size_t d_nErrors__;
size_t d_requiredTokens__;
size_t d_acceptedTokens__;
int d_token__;
int d_nextToken__;
size_t d_state__;
STYPE__ *d_vsp__;
STYPE__ d_val__;
STYPE__ d_nextVal__;
ParserBase();
void ABORT() const throw(Return);
void ACCEPT() const throw(Return);
void ERROR() const throw(ErrorRecovery);
// $insert debugdecl
static std::ostringstream s_out__;
std::string symbol__(int value) const;
std::string stype__(char const *pre, STYPE__ const &semVal,
char const *post = "") const;
static std::ostream &dflush__(std::ostream &out);
void ABORT() const;
void ACCEPT() const;
void ERROR() const;
void clearin();
bool debug() const;
void pop__(size_t count = 1);
void push__(size_t nextState);
void popToken__();
void pushToken__(int token);
void reduce__(PI__ const &productionInfo);
void errorVerbose__();
size_t top__() const;
bool debug() const
{
return d_debug;
}
void pop(size_t count = 1);
void push(size_t nextState);
size_t reduce(PI const &productionInfo);
void setDebug(bool mode)
{
d_debug = mode;
}
size_t top() const;
public:
void setDebug(bool mode);
};
inline bool ParserBase::debug() const
{
return d_debug__;
}
inline void ParserBase::setDebug(bool mode)
{
d_debug__ = mode;
}
inline void ParserBase::ABORT() const
{
// $insert debug
if (d_debug__)
s_out__ << "ABORT(): Parsing unsuccessful" << "\n" << dflush__;
throw PARSE_ABORT__;
}
// class ParserBase ends
};
inline void ParserBase::ACCEPT() const
{
// $insert debug
if (d_debug__)
s_out__ << "ACCEPT(): Parsing successful" << "\n" << dflush__;
throw PARSE_ACCEPT__;
}
inline void ParserBase::ERROR() const
{
// $insert debug
if (d_debug__)
s_out__ << "ERROR(): Forced error condition" << "\n" << dflush__;
throw UNEXPECTED_TOKEN__;
}
// As a convenience, when including ParserBase.h its symbols are available as
......
%debug
%token ID
%left '+' '-'
%left '*' '/'
%right UNARY
%%
expr:
expr '+' term
| expr '-' term
| term
;
term:
term '*' primary
| term '/' primary
| primary
;
primary:
'-' expr %prec UNARY
| '+' expr %prec UNARY
| ID
;