Commit 677808cf authored by Colin Watson's avatar Colin Watson

jlex: Create initial directory structure and import upstream version 1.2.5.

# A simulation of Subversion default ignores, generated by reposurgeon.
This diff is collapsed.
* JLex README Version 1.2 *
Written by Elliot Berk [edited by A. Appel] [revised by C. Scott Ananian].
Contact with any problems relating to JLex.
The following steps describe the compilation and usage
of JLex.
(1) Choose some directory that is on your CLASSPATH, where
you install Java utilities such as JLex. I will refer
to this directory as "J", for example.
(2) Make a directory "J/JLex" and put the sourcefile
in J/JLex.
(3) Compile as you would any Java source file:
This should produce a number of Java class files, including Main.class,
in the "J/JLex" directory, where "J" is in your CLASSPATH.
(4) To run JLex with a JLex specification file,
the usage is:
java JLex.Main <filename>
where <filename> is the name of the JLex
specification file. If java complains that
it can't find JLex.Main, then the directory
"J" (which contains the subdirectory "JLex"
which contains the class files) isn't in your
CLASSPATH; go back and read steps 1-3 more
carefully, please.
JLex will produce diagnostic output to inform
you of its progress and, upon completion, will
produce a Java source file that contains the
lexical analyzer. The name of the lexical
analyzer file will be the name of the JLex
specification file, with the string ".java"
added to the end. (So if the JLex specification
file is called foo.lex, the lexical analyzer source file
that JLex produces will be called
(5) The resulting lexical analyzer source file
should be compiled with the Java compiler:
javac <filename>
where <filename> is the name of the lexical analyzer
source file. This produces a lexical analyzer class file,
which can then be used in your applications.
If the default settings have not been changed,
the lexical analyzer class will be called Yylex
and the classs files will named Yylex.class and Yytoken.class.
(6) As an example, there is a sample lexical specification
on the JLex web site:
named 'sample.lex'. Transfer this to your system and use
the command:
java JLex.Main sample.lex
to generate a file named ''. Compile this
javac -d J
where "J" is the above mentioned path to a directory in
your CLASSPATH. If '.' is in your CLASSPATH, you can
use "-d .". Run the generated lexer with:
java Sample
which expects input on stdin. The lexer parses tokens
that resemble those for a typical programming language;
whitespace is generally ignored. Java buffers input from
stdin a line at a time, so you won't see any output until
you type enter. Try inputting things like:
"a string"
{ /* comment */ a := b & c; }
Look at the sample.lex input file for more information on
the operation of this example scanner.
The file in the bug report below has some errors and tries to reference
some features that are not actually in JavaLex, but it seems to illustrate
a problem with regular expression parsing.
For instance, the following line has some problems.
ONECHAR [^\\"']|(\\.)|(\[0123]?{OCTNUMBER}{0,2})|{UNICODE_CHARACTER}
1) A '\' must precede the '"' in the first [ ... ].
2) A '\' must precede the '\' before the second [ ... ].
3) The {0,2} is not supported. It means the number of occurances
of a macro (?), a feature not supported by Java-Lex.
ONECHAR [^\\\"']|(\\.)|(\\[0123]?{OCTNUMBER})|{UNICODE_CHARACTER}
But there still appear to be some problems parsing the (very complex)
FLOAT macro. This bug does not result in incorrect lexers being generated
but rather in Java-Lex incorrectly asserting a parse error during generation
of the lexer source file. This makes the bug in a sense easy to recognize,
since the processing and compilation of the lex file will not
go to completion.
-- Elliot
From glunz@zfe.siemens.deWed Sep 11 14:59:23 1996
Date: Fri, 26 Jul 1996 13:59:56 +0200 (MET DST)
From: Wolfgang Glunz <>
To: ejberk@Princeton.EDU
Subject: Problem with JavaLex
First of all thanks for the effort you put into the development of JavaLex.
Unfortunately I have a problem where I don't know how to proceed.
I tried the following .lex file:
/* this goes into the lexer class */
IDENTIFIER [A-Za-z_$][A-Za-z_$0-9]*
DIGIT [0-9]
HEXDIGIT [A-Fa-f0-9]
ONECHAR [^\\"']|(\\.)|(\[0123]?{OCTNUMBER}{0,2})|{UNICODE_CHARACTER}
CHAR_OP ([-;{},;()[\].&|!~=+*/%<>^?:])
WHITESPACE [ \n\r\t]+
<INCOMMENT>"/*" { }
"//".* { }
{FLOAT} { }
{DOUBLE} { }
I switched debugging on in JavaLex and get the following output:
(tail only)
Entering dodash [lexeme: +] [token: 17]
Lexeme: - Token: 10 Index: 52
Lexeme: ] Token: 5 Index: 53
Lexeme: ? Token: 15 Index: 54
expanded escape: [0-9]
{ }
Lexeme: [ Token: 6 Index: 55
Lexeme: 0 Token: 12 Index: 56
Lexeme: - Token: 10 Index: 57
Lexeme: 9 Token: 12 Index: 58
Lexeme: ] Token: 5 Index: 59
Leaving dodash [lexeme:]] [token:5]
Lexeme: + Token: 17 Index: 60
Leaving term [lexeme:+] [token:17]
Lexeme: ? Token: 15 Index: 61
Leaving factor [lexeme:?] [token:15]
Error: Parse error at line 54.
Error Description: + ? or * must follow an expression or subexpression.
java.lang.Error: Parse error.
at CError.parse_error(
at CMakeNfa.first_in_cat(
at CMakeNfa.cat_expr(
at CMakeNfa.expr(
at CMakeNfa.term(
at CMakeNfa.factor(
at CMakeNfa.cat_expr(
at CMakeNfa.expr(
at CMakeNfa.rule(
at CMakeNfa.machine(
at CMakeNfa.thompson(
at CLexGen.userRules(
at CLexGen.generate(
at JavaLex.main(
Two questions:
1. The error message says that a + ? or * must follow an expression. Is it not possible to
write (expression)|(expression) ?
2. JavaLex seems to expand the macros only partially. Why so ?
Any help is appreciated,
Wolfgang Glunz email:
Siemens AG, ZFE T SE 2 WWW: <URL:>
(Siemens only)
81730 Muenchen / Germany Phone: +49 89 63649492
Otto Hahn Ring 6 Fax: +49 89 63640898
Date: Mon, 28 Apr 1997 17:27:51 +0000
From: Per Velschow <>
Subject: Bug in JavaLex
I have been using JavaLex and CUP for a while with JDK 1.0.2. But then I
tried using it with the new JDK 1.1.1 with some problems. The biggest
problems where in CUP (I will mail my problem to them), but there was
also a little problem in JavaLex.
It has to do with the new API in JDK 1.1.1. SUN has "deprecated" some
constructors and methods so that you get a warning when you try to
compile a program using them. Specifically in the generated code from
JavaLex it uses the following deprecated constructor:
String(byte[], int, int, int)
The solution I have found is very easy (if it works) just let the lexer
use the constructor String(byte[], int, int) instead. It uses the
platform's default character encoding.
You can find out more about this here
I actually have made the change myself by replacing the line
m_outstream.writeBytes("\t\treturn (new java.lang.String(yy_buffer,
0, \n");
m_outstream.writeBytes("\t\treturn (new
Per Velschow
Per Velschow
Date: Mon, 24 Feb 97 14:25:55 -0500
Message-Id: <9702241925.AA02147@Princeton.EDU>
Received: from [] by PACEVM.DAC.PACE.EDU (IBM VM SMTP V2R3)
with TCP; Mon, 24 Feb 97 12:52:10 EST
From: "Joseph Bergin" <>
Reply-To: "Joseph Bergin" <>
Subject: Java compiler book and tools
Mime-Version: 1.0
Content-Type: text/plain; charset="us-ascii"
X-Mailer: POPmail 2.3b7
I recently got copies of your Java and ML compiler books. They are very nice and
I will probably adopt them in future in my compiler courses. Regarding your
tools, however. Some machines can't effectively use a command line for inputting
names of files and other things. Therefore I have developed the following class
that can be used to put an interface on the tools such as JavaLex and CUP. They
should work with any system, but are necessary on something like the Macintosh.
They may be freely distributed, but keep my authorship please.
---------cut here-------------
package AUX;
import java.awt.Frame;
import java.awt.FileDialog;
* BlowPipe can be used to put an interface on programs that normally accept
* instructions
* from the command line. Therefore it can be used to make standard unix tools
* more friendly to Macintosh and Windows systems. <p>
* To use this, put calls to these functions into your main(String [] argv)
* function at the beginning,
* before you start to decode argv. For example, to decode one argument, that is
* to be taken as the input file use: <p> <code> <pre>
* public static void main ( String argv[] ) throws
* { arg = new String[1];
* arg[0] = AUX.BlowPipe.getOldFileName(null);
* . . .
* </pre> </code><p>
* as the beginning of your main function.
* <p>
* Note that all functions in this class are static. You cannot create an object
* of type BlowPipe. It is just a code encapsulator. <p>
* @version 1.0
* @author <a href= "">Joseph Bergin, Pace
University .</a>
public class BlowPipe
* Return the fully qualified path and file reference for an input file.
* @param parent The frame of your application object (or null).
* @return file name string prepended with directory information.
public static String getOldFileName(Frame parent)
{ FileDialog fd = new FileDialog(parent, "Old File Name", FileDialog.LOAD);
System.out.println(fd.getDirectory() + fd.getFile());
return fd.getDirectory()+ fd.getFile();
* Return the fully qualified path and file reference for an output file.
* @param parent The frame of your application object (or null).
* @return file name string prepended with directory information.
public static String getNewFileName(Frame parent)
{ FileDialog fd = new FileDialog(parent, "Old File Name", FileDialog.SAVE);
System.out.println(fd.getDirectory() + fd.getFile());
return fd.getDirectory()+ fd.getFile();
* Return argument strings such as switches.
* Not implemented at this time. Returns null.
public static String getArgString()
{ return null; // not yet implemented
private BlowPipe(){};
---------cut here-------------
Joseph Bergin, Professor
Pace University, Computer Science, One Pace Plaza, NY NY 10038
Date: Wed, 10 Sep 1997 21:47:06 -0400
From: William Uther <>
Subject: JLex Bug
I'm using JLex 1.2 on a macintosh using the Metrowerks compiler and VM.
I've been having problems with the EOLN char (\r and/or \n).
Might I suggest you use java.lang.System.getProperty("line.separator") to
get the line separation string? You could even include an EOLN macro
containing the correct end of line regular expression. (On old DOS
systems, end of line is two characters, \r\n, so a single character for
EOLN wont work if you want to support these).
If you were really tricky you might be able to work the EOLN format back
into the parser (not just the generator) so that it has the
System.getProperty call in it and it will work on any system.
I've solved the problem by having my own EOLN=[\r\n] macro. (note, this is
not \r\n that I seem to get with m_unix set to false).
\x/ill :-}
William Uther "Most people would sooner die than think; In fact, they do so."
Dept. of Computer Science, - Bertrand Russell
Carnegie Mellon University
Date: Thu, 18 Jun 1998 11:22:01 -0700
From: Raimondas Lencevicius <>
Organization: Computer Science Dept., UCSB
Subject: JLex Bug: Yylex.class includes methods longer than 65K
I was successfully using JLex in my research project
for some time, but now I have a problem. I have to run some code under
Java verifier, and it throws an error saying that method
in Yylex is longer than 65K. Also, it appears that
JDK 1.2Beta3 enforces the verification more strictly, so
my program may become unusable. :-(((
Here is the warning produced by JDK 1.2Beta3 "javac":
javac This code requires generating a method with more than
64K bytes. Virtual machines may refuse the resulting class file.
{ -1, -1, -1, -1, -1, -1, -1, -1,
1 warning
Date: Thu, 25 Jun 1998 11:45:37 -0700
From: Raimondas Lencevicius <>
Subject: Re: Fixing JLex Bug: Methods longer than 65K
Dear Dr. Appel,
I have modified the JLex program in the following way.
I have fixed the yy_nxt[][] assignment to take a result of a function
that unpacks a string to an integer array.
To achieve that, I added
"private int [][] unpackFromString(int size1, int size2, String st)"
function and coded the yy_nxt[][] values into a string by printing integers
into a string and representing sequences of the same integer as "value:length"
pairs. This encoding was simpler to implement and, I believe, more compact
than encoding of each integer as a character or Unicode escape sequence.
If someone wants to apply more sophisticated compression scheme,
it's possible to do that. However, the .java file size
was reduced 2 times (104K to 52K) with current encoding, which, I
think, is reasonable. The .class file size was reduced from 193K to 32K
(6 times) for the same grammar.
The rewritten JLex compiles and runs under JDK 1.1.5 and JDK 1.2B3.
I generated .java file and Yylex.class for my grammar and for the sample
grammar available at JLex home page. I have encountered no errors including
no 64K limit error.
There are a couple of possible negatives of the new version. Some
editors and operating systems may not be able to handle the huge one-line
generated string. This could be circumvented by cutting the string into more
manageable parts while keeping in mind .class constant pool size. Also String
unpacking may be slower than a direct array initialization.
I have attached the modified version of JLex to this message.
My comments are added at the beginning of the file. You are welcome to
integrate them into the "release" comments, if you decide to use this
version as an official JLex release.
From Sun Jul 25 07:54:07 1999
Date: 11 Dec 1997 23:31:02 +0100
From: Torsten Hilbrich <>
Subject: JLex 1.2.2: Problems with 8bit characters still laying around
I previously had 1.1.1 installed and got strange errors with some HTML
files that I parsed. After looking at the home page I recognized that
these were problems with 8bit characters (such as german umlauts) and
are supposed to be fixed in 1.2.1. However, I just installed 1.2.2
and the error is still there. If I compile the example file and run
it using a simple token printer, I get the following message if
entering a character > 127:
at Yylex.yylex(
at TestLex.main(
The code just around line 416 is the following:
if (YYEOF != yy_lookahead) {
yy_next_state = yy_nxt[yy_rmap[yy_state]][yy_cmap[yy_lookahead]];
Thanks for your reply,
I haven't lost my mind -- it's backed up on tape somewhere.
Fortune Cookie
PGP Public key available
From cananian@phoenix.Princeton.EDUWed Sep 11 14:59:42 1996
Date: Sun, 11 Aug 1996 23:48:42 -0400 (EDT)
From: "C. Scott Ananian" <cananian@phoenix.Princeton.EDU>
To: "Elliot J. Berk" <ejberk@phoenix.Princeton.EDU>
Subject: JavaLex bug reports....
A couple minor cross-platform type JavaLex bugs (I'm developing Java code
on a mac right now, so I notice these types of things).
1) Around line 1600, there's the following code fragment:
if (m_lexGen.AT_BOL == m_spec.m_current_token)
start = CAlloc.newCNfa(m_spec);
start.m_edge = '\n';
anchor = anchor | CSpec.START;
I'm pretty sure that the \n should be expanded to include '\r' as well
if m_unix is true. I'm working around this and the ^ bug by manually
specifying line terminations in my regexps. I don't know your code
quite well enough to offer you a prepackaged bug fix for this one.
2) line 3706:
while ((byte) '\n' != m_buffer[m_buffer_index])
should probably be something like:
while (((byte) '\n' != m_buffer[m_buffer_index] &&
(byte) '\r' != m_buffer[m_buffer_index]) )
there may be other hacks you'd want to make, but I think because
you're throwing away empty lines, "\r\n" should make it through all right
(don't know for sure, Mac's just saying '\r')
Also, I replaced line 3770 (?)
m_line[m_line_read] = (char) m_buffer[m_buffer_index];
m_line[m_line_read] = '\n';
just to be safe.
3) Around line 5055,
/* Check for and discard empty line. */
if (0 == m_input.m_line_read
|| '\n' == m_input.m_line[0])
should probably be:
/* Check for and discard empty line. */
if (0 == m_input.m_line_read
|| '\n' == m_input.m_line[0]
|| '\r' == m_input.m_line[0])
I think that's all for now.
If you happen to dust off the code and get a chance to work on some of
your 'Unimplemented' items, I'd appreciate a fix for the
start-of-line-discarding-newline bug.
Also, I *think* there might be a bug in how quoted strings in regexps are
handled. I didn't track this one down, but I was having trouble with a
regexp that looked something like:
The problem went away when I rewrote this as
not quite sure why that was.
Anyway, kudos again for JavaLex; it seems to be clearly the
best-implemented lexer for Java available right now.
@ @
C. Scott Ananian: Declare the Truth boldly and
227 Henry Hall, Princeton University / without hindrance.
Princeton, NJ 08544 /META-PARRESIAS AKOLUTOS:Acts 28:31
-.-. .-.. .. ..-. ..-. --- .-. -.. ... -.-. --- - - .- -. .- -. .. .- -.
PGP key available via finger and from
From root@P-henryv.cs.arizona.eduWed Sep 11 14:59:33 1996
Date: Tue, 3 Sep 1996 09:29:58 -0700 (MST)
From: root <>
To: ejberk@Princeton.EDU
Subject: JavaLex
Hi, I compiled in a directory that is in my CLASSPATH. I
got this error:
P-henryv:/java/JavaLex# javac Warning: Public class JavaLex.Main must be defined in
a file
called "".
public class Main
1 error
My OS is Linux.
Is there a FAQ or a trouble shooting document that you could point me to?
Thanks, Henry
From Sep 11 14:59:14 1996
Date: Wed, 04 Sep 1996 21:27:13 GMT
From: Nik Shaylor <>
To: ejberk@Princeton.EDU
Subject: Bug of feature?
Hi Elliot, I have just come across what looks to me like small bug with your
otherwise excellent program JavaLex. Macros appear not to be expanded if they
come immediately after a quoted string e.g.
but this:
Hope this helps.
Thanks for your good work,
Nik Shaylor.
From Sep 11 14:59:48 1996
Date: Tue, 10 Sep 1996 00:28:24 GMT
From: Nik Shaylor <>
To: ejberk@Princeton.EDU
Subject: Bug of feature?
Hi Elliot,
I have a few more things for you regarding JavaLex.
1. The "%notunix" directive does not do what I thought it would. I expected it
to cause all '\r' characters to be ignored. I appears to cause '\r' characters
to be counted as newlines like '\n' is.
2. Even with the "%full" directive the output scanner cannot handle characters
with ASCII values over 127. This is because yy_lookahead, yy_advance(), and
YYEOF are all defined as being of type byte. As byte are signed in Java the
yy_next_state = yy_nxt[yy_rmap[yy_state]][yy_cmap[yy_lookahead]];
fails with an array bound exception because yy_lookahead is negative. If you
define the three above items as int and "& 0xFF" the characters output by
yy_advance() the problem will be solved.
3. I have not looked very hard at the following, but I think there may be a
problem with the 'start of line' symbol (^). It looks like it is being taken
as a 'just passed end of line' symbol.
4. The yy_getchar() routine returns the number of characters from the start of
the file. It would be far more useful (to me at any rate) for it to return the
number of characters since the last '\n'.
5. There appears to be no way to put comments into the input file?
Hope this helps make your great program even better.
Nik Shaylor.
From iw@return-online.deWed Sep 11 15:00:08 1996
Date: Wed, 11 Sep 1996 18:25:47 +0200
From: Ingo Wechsung <>
To: ejberk@Princeton.EDU
Subject: JavaLex
Dear Elliot,
congratulations for writing such a tool as JavaLex!
I downloaded and tried it today and it works great.
However, did you ever try to input a character >127 to the
generated program?
When I do this (by typing a german umlaut for example), I get an
Any idea? As far as I could see you store the input in byte
arrays which would lead to negative bytes for
beyound-ASCII-characters. I could imagine you use the byte
values as an index somewhere ...
Unfortunately I couldn't debug it yet, but will do tomorrow.
BTW, yes, I have the %full directive in my source file ...
Email: Tel.: 07121/928624 Fax.:
CAS Nord GmbH, Geschaeftsstelle Reutlingen -- Return Online
Friedrich-Ebert-Str. 3, 72762 Reutlingen, Germany
Please have a look at our Web pages:
The following JavaLex input file is in error;
an undefined macro is referenced in the rules section.
However, JavaLex generates errors into infinity,
which it shouldn't do.
-- Elliot
// (c) Copyright 1996 PrinceNet, Inc.
// All Rights Reserved.
// ------------------------------------------------------------------
%type java_cup.runtime.Symbol
int firstCharinLine;
int tabs = 0;
boolean MacroHeaderDetected = false;
private int stringstart;
private StringBuffer charBuf = null;
private java_cup.runtime.Symbol tok(int kind) {
return new java_cup.runtime.Symbol(kind, yychar, yychar+yylength());
private java_cup.runtime.Symbol tok(int kind, Object o) {
return new java_cup.runtime.Symbol(kind, yychar, yychar+yylength(), o);
private void error(int line, String text) {
System.out.println("Invalid text on line " + line + " " + text);
return tok(sym.EOF);
<YYINITIAL>'{' { return tok(sym.LBRACE); }
<YYINITIAL>\} { return tok(sym.RBRACE); }
<YYINITIAL>" " { }
<YYINITIAL>\n { firstCharinLine = yychar+1; }
error(yyline+1, yytext());
From mromeo@Adobe.COM Fri Nov 1 00:18:10 1996
Date: Wed, 30 Oct 1996 15:41:32 -0500