README.md 27.2 KB
Newer Older
1
# C Left-Right Parser (libcleri)
2
Language parser for the C/C++ programming language. Initially created for [SiriDB](https://github.com/SiriDB/siridb-server).
Jeroen van der Heijden's avatar
Jeroen van der Heijden committed
3

4 5 6
---------------------------------------
  * [Installation](#installation)
  * [Related projects](#related-projects)
7
  * [Quick usage](#quick-usage)
8
  * [API](#api)
9
    * [cleri_t](#cleri_t)
Jeroen van der Heijden's avatar
Jeroen van der Heijden committed
10 11 12 13
    * [cleri_grammar_t](#cleri_grammar_t)
    * [cleri_parse_t](#cleri_parse_t)
    * [cleri_node_t](#cleri_node_t)
    * [cleri_children_t](#cleri_children_t)
14
    * [cleri_olist_t](#cleri_olist_t)
15 16
  * [Elements](#elements)
    * [cleri_keyword_t](#cleri_keyword_t)
Jeroen van der Heijden's avatar
Jeroen van der Heijden committed
17 18 19 20
    * [cleri_regex_t](#cleri_regex_t)
    * [cleri_choice_t](#cleri_choice_t)
    * [cleri_sequence_t](#cleri_sequence_t)
    * [cleri_optional_t](#cleri_optional_t)
21 22 23 24 25 26 27
    * [cleri_prio_t](#cleri_prio_t)
    * [cleri_repeat_t](#cleri_repeat_t)
    * [cleri_list_t](#cleri_list_t)
    * [cleri_token_t](#cleri_repeat_t)
    * [cleri_tokens_t](#cleri_tokens_t)
    * [Forward reference](#forward-reference)
    * [cleri_dup_t](#cleri_dup_t)
28
  * [Miscellaneous functions](#miscellaneous-functions)
29 30 31 32

---------------------------------------

## Installation
Jeroen van der Heijden's avatar
Jeroen van der Heijden committed
33
>Note: libcleri requires [pcre2](http://www.pcre.org/)
Jeroen van der Heijden's avatar
Jeroen van der Heijden committed
34
>
35
>On Ubuntu:
36
>
Jeroen van der Heijden's avatar
Jeroen van der Heijden committed
37
>`sudo apt install libpcre2-dev`
38 39 40
>
>On MacOs:
>
Jeroen van der Heijden's avatar
Jeroen van der Heijden committed
41
>`brew install pcre2`
42
>
43
Install the release version.
44 45 46 47 48
```
$ cd Release
```

Compile libcleri
49 50 51 52
>Note: On MacOs you might need to set environment variables:
>
>`export CFLAGS="-I/usr/local/include" && export LDFLAGS="-L/usr/local/lib"`
>
53 54 55 56 57 58 59 60 61 62 63 64
```
$ make all
```

Install libcleri
```
$ sudo make install
```

> Note: run `sudo make uninstall` for removal.

## Related projects
65
- [pyleri](https://github.com/transceptor-technology/pyleri): Python parser (can export grammar to pyleri, libcleri, goleri and jsleri)
66
- [jsleri](https://github.com/transceptor-technology/jsleri): JavaScript parser
67
- [goleri](https://github.com/transceptor-technology/goleri): Go parser
68

69
## Quick usage
70 71
>The recommended way to create a grammar is to use [pyleri](https://github.com/transceptor-technology/pyleri) for
>writing the grammar and then export the grammar to libcleri or other languages.
72 73 74 75

This is a simple example using libcleri:
```c
#include <stdio.h>
76
#include <cleri/cleri.h>
77 78 79 80 81 82 83 84 85 86 87

void test_str(cleri_grammar_t * grammar, const char * str)
{
    cleri_parse_t * pr = cleri_parse(grammar, str);
    printf("Test string '%s': %s\n", str, pr->is_valid ? "true" : "false");
    cleri_parse_free(pr);
}

int main(void)
{
    /* define grammar */
88 89 90
    cleri_t * k_hi = cleri_keyword(0, "hi", 0);
    cleri_t * r_name = cleri_regex(0, "^(?:\"(?:[^\"]*)\")+");
    cleri_t * start = cleri_sequence(0, 2, k_hi, r_name);
91 92 93 94 95 96 97 98 99 100 101 102 103 104 105

    /* compile grammar */
    cleri_grammar_t * my_grammar = cleri_grammar(start, NULL);

    /* test some strings */
    test_str(my_grammar, "hi \"Iris\"");  // true
    test_str(my_grammar, "bye \"Iris\""); // false

    /* cleanup grammar */
    cleri_grammar_free(my_grammar);

    return 0;
}
```

Jeroen van der Heijden's avatar
Jeroen van der Heijden committed
106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139
Although libcleri is written for C, it can be used with C++ too:
```c++
#include <iostream>
#include <cleri/cleri.h>

void test_str(cleri_grammar_t * grammar, const char * str)
{
    cleri_parse_t * pr = cleri_parse(grammar, str);
    std::cout << "Test string " << str << ": " <<
            (pr->is_valid ? "true" : "false") << std::endl;
    cleri_parse_free(pr);
}

int main()
{
    /* define grammar */
    cleri_t * k_hi = cleri_keyword(0, "hi", 0);
    cleri_t * r_name = cleri_regex(0, "^(?:\"(?:[^\"]*)\")+");
    cleri_t * start = cleri_sequence(0, 2, k_hi, r_name);

    /* compile grammar */
    cleri_grammar_t * my_grammar = cleri_grammar(start, NULL);

    /* test some strings */
    test_str(my_grammar, "hi \"Iris\"");  // true
    test_str(my_grammar, "bye \"Iris\""); // false

    /* cleanup grammar */
    cleri_grammar_free(my_grammar);

    return 0;
}
```

140 141
## API

142 143
### `cleri_t`
Cleri type is the base object for each element.
144 145 146 147

*Public members*
- `uint32_t gid`: Global Identifier for the element. This GID is not required and
as a rule it should be set to 0 if not used. You can use the GID for identifiying
Jeroen van der Heijden's avatar
Jeroen van der Heijden committed
148
an element in a parse result. When exporting a Pyleri grammar, each *named* element
149
automatically gets a unique GID assigned. (readonly)
150
- `cleri_tp tp`: Type for the cleri object. (readonly)
151 152 153 154 155 156 157 158 159 160 161 162 163
    - `CLERI_TP_SEQUENCE`
    - `CLERI_TP_OPTIONAL`
    - `CLERI_TP_CHOICE`
    - `CLERI_TP_LIST`
    - `CLERI_TP_REPEAT`
    - `CLERI_TP_PRIO`
    - `CLERI_TP_RULE`
    - `CLERI_TP_THIS`
    - `CLERI_TP_KEYWORD`
    - `CLERI_TP_TOKEN`
    - `CLERI_TP_TOKENS`
    - `CLERI_TP_REGEX`
    - `CLERI_TP_END_OF_STATEMENT`
164
- `cleri_via_t via`: Element. (readonly)
165 166 167 168 169 170 171 172 173 174 175 176 177
    - `cleri_sequence_t * sequence`
    - `cleri_optional_t * optional`
    - `cleri_choice_t * choice`
    - `cleri_list_t * list`
    - `cleri_repeat_t * repeat`
    - `cleri_prio_t * prio`
    - `cleri_rule_t * rule`
    - `cleri_keyword_t * keyword`
    - `cleri_regex_t * regex`
    - `cleri_token_t * token`
    - `cleri_tokens_t * tokens`
    - `void * dummy` (place holder, this, eof)

178
#### `cleri_t * cleri_new(uint32_t gid, cleri_tp tp, cleri_free_object_t free_object, cleri_parse_object_t parse_object)`
Jeroen van der Heijden's avatar
Jeroen van der Heijden committed
179
Create and return a new cleri object. A unique gid is not required but can help
Peasmeets's avatar
Peasmeets committed
180
you with identifiying the element in a [parse result](#cleri_parse_t). As a rule
181 182
you should assign 0 in case no specific gid is required. This function should only
be used in case you want to create your own custom element.
183

184 185
#### `void cleri_incref(cleri_t * cl_object)`
Increment the reference counter for a cleri object. Should only be used in case you
186 187
want to write your own custom element.

188 189
#### `void cleri_decref(cleri_t * cl_object)`
Decrement the reference counter for a cleri object. If no references are left the
Jeroen van der Heijden's avatar
Jeroen van der Heijden committed
190 191 192 193
object will be destroyed. Do not use this function after the element has
successfully been added to another element or grammar. Should only be used in
case you want to write your own custom element.

194
#### `int cleri_free(cleri_t * cl_object)`
Peasmeets's avatar
Peasmeets committed
195
Decrement reference counter for a cleri object. When there are no more references
196 197 198 199 200 201 202 203
left the object will be destroyed. Use this function to cleanup after errors
have occurred. Do not use this function after the element has successfully been
added to another element or grammar.

Example strict error handling:
```c
cleri_grammar_t * compile_grammar(void)
{
204
    cleri_t * k_hello = cleri_keyword(0, "hello", 0);
205 206 207
    if (k_hello == NULL) {
        return NULL;
    }
208
    cleri_t * k_world = cleri_keyword(0, "world", 0);
209
    if (k_world == NULL) {
210
        cleri_free(k_hello); // must cleanup k_hello
211 212
        return NULL;
    }
213
    cleri_t * hello_world = cleri_sequence(0, 2, k_hello, k_world);
214
    if (start == NULL) {
215 216
        cleri_free(k_hello);
        cleri_free(k_world);
217 218
        return NULL;
    }
219
    cleri_t * opt = cleri_optional(0, hello_world);
220 221 222
    if (opt == NULL) {
        /* we now must only cleanup hello_world since this sequence will
        * cleanup both keywords too. */
223
        cleri_free(hello_world);
224 225 226 227
        return NULL;
    }
    cleri_grammar_t * grammar = cleri_grammar(opt, NULL);
    if (grammar == NULL) {
228
        cleri_free(opt);
229 230 231 232 233 234 235 236 237 238
    }
    /* when your program has finished, the grammar including all elements can
     * be destroyed using cleri_grammar_free() */
    return grammar;
}
```
>Note: Usually grammar is only compiled at the startup of your program so
>memory allocation errors during the grammar creation are unlikely to occur.
>If NULL is parsed as an argument instead of an element, then the function
>to which the argument is parsed to, will return NULL. Following this
Peasmeets's avatar
Peasmeets committed
239
>chain the final grammar returns NULL in case an error has occurred somewhere.
240
>In this case you should usually abort the program.
Jeroen van der Heijden's avatar
Jeroen van der Heijden committed
241

Jeroen van der Heijden's avatar
Jeroen van der Heijden committed
242 243 244 245 246
### `cleri_grammar_t`
Compiled libcleri grammar.

*No public members*

247
#### `cleri_grammar_t * cleri_grammar(cleri_t * start, const char * re_keywords)`
Jeroen van der Heijden's avatar
Jeroen van der Heijden committed
248 249 250 251
Create and return a compiled grammar. Argument `start` must be the entry element
for the grammar. Argument `re_keywords` should be a regular expression starting
with character `^` for matching keywords in a grammar. When a grammar is created,
each defined [keyword](#cleri_keyword_t) should match this regular expression.
252
`re_keywords` is allowed to be `NULL` in which case the default
Jeroen van der Heijden's avatar
Jeroen van der Heijden committed
253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268
`CLERI_DEFAULT_RE_KEYWORDS` is used.

#### `void cleri_grammar_free(cleri_grammar_t * grammar)`
Cleanup grammar. This will also destroy all elements which are used by the
grammar. Make sure all parse results are destroyed before destroying the grammar
because a [parse result](#cleri_parse_t) depends on elements from the grammar.

### `cleri_parse_t`
Parse result containing the parse tree and other information about the parse
result.

*Public members*
- `int cleri_parse_t.is_valid`: Boolean. Value is 1 (TRUE) in case the parse string is valid or 0 (FALSE) if not. (readonly)
- `size_t cleri_parse_t.pos`: Position in the string to where the string was successfully parsed. This value is (readonly)
equal to the length of the string in case `cleri_parse_t.is_valid` is TRUE. (readonly)
- `const char * cleri_parse_t.str`: Pointer to the provided string. (readonly)
Anja Bruls's avatar
Anja Bruls committed
269 270
- `cleri_node_t * tree`: Parse tree. Even when `is_valid` is `False` the parse tree is returned but will only contain results as far as parsing has succeeded. The tree is the root node which can include several `children` nodes. The structure will be further clarified in the example that explains a way of visualizing the parse tree. This example can be found in the "examples/tree_and_expect/tree" folder. Run this code and it will output a parse tree in JSON format. (see also [cleri_node_t](#cleri_node_t) and [cleri_children_t](#cleri_children_t)) (readonly)
- `const cleri_olist_t * expect`: Linked list to possible elements at position `cleri_parse_t.pos` in `cleri_parse_t.str`. Even if `is_valid` is true there might be elements in this set, for example when an `Optional()` element could be added to the string. Expecting is useful if you want to implement things like auto-completion, syntax error handling, auto-syntax-correction etc. An example of this can be found in the "examples/tree_and_expect/expect" folder. (see [cleri_olist_t](#cleri_olist_t) for more information)
Jeroen van der Heijden's avatar
Jeroen van der Heijden committed
271 272 273 274 275 276 277 278 279 280 281 282 283

#### `cleri_parse_t * cleri_parse(cleri_grammar_t * grammar, const char * str)`
Create and return a parse result. The parse result contains pointers to the
provided string (`str`) so make sure the string is available while using the
parse result.

#### `void cleri_parse_free(cleri_parse_t * pr)`
Cleanup a parse result.

#### `void cleri_parse_expect_start(cleri_parse_t * pr)`
Can be used to reset the expect list to start. Usually you are not required to
use this function since the expect list is already at the start position.

284
#### `void cleri_parse_strn(char * s, size_t n, cleri_parse_t * pr, cleri_translate_t * translate)`
Jeroen van der Heijden's avatar
Jeroen van der Heijden committed
285
Can be used to generate a textual parse result. The first argument `s` should be able to hold
Jeroen van der Heijden's avatar
Jeroen van der Heijden committed
286 287 288 289
the complete message and will be restricted by `n`. The return value is the number of characters which
are (or would be) written to `s`, excluding the terminator char. This behavior is similar to functions like `snprintf`.
One could for example use `NULL` for `s` with `n` equals to `0` to get the size which is required. Then you could
`malloc` the size plus one for the terminator and run the functions again. A negative value indicates an error.
Jeroen van der Heijden's avatar
Jeroen van der Heijden committed
290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307
Argument `pr` should be a parse result or `NULL` and `translate` a translation function or `NULL`.

Example:
```c
// In case a translation function returns an empty string, no text is used
const char * translate(cleri_t * o) {
    return "";  // a possible result might be: `error at position x`
}

// Text may be returned based on gid
const char * translate(cleri_t * o) {
    switch (o->gid) {
        case 1: return "A";  // error at position x, expecting: A
        case 2: return "";   // gid 2 will be ignored
    }
    return NULL;  // normal parsing for everything else
}
```
308

Jeroen van der Heijden's avatar
Jeroen van der Heijden committed
309
### `cleri_node_t`
Peasmeets's avatar
Peasmeets committed
310
Node object. A parse result has a parse tree which consists of nodes. Each node
Jeroen van der Heijden's avatar
Jeroen van der Heijden committed
311 312 313 314 315
may have children.

*Public members*
- `const char * cleri_node_t.str`: Pointer to the position in the parse string where this node starts. (readonly)
- `size_t cleri_node_t.len`: Length of the string which is applicable for this node. (readonly)
Anja Bruls's avatar
Anja Bruls committed
316
- `cleri_t * cleri_node_t.cl_obj`: Element from the grammar which matches this node. Note that the `cl_obj` is `NULL` for the root node and the first can be found in its children. (readonly)
Jeroen van der Heijden's avatar
Jeroen van der Heijden committed
317 318 319 320 321 322 323 324 325 326
- `cleri_children_t * cleri_node_t.children`: Optional children for this node. (readonly)

#### `bool cleri_node_has_children(cleri_node_t * node)`
Macro function for checking if a node has children.

### `cleri_children_t`
Children from a node in a linked list.

*Public members*
- `cleri_node_t * cleri_children_t.node`: Child node. (readonly)
327
- `struct cleri_children_s * cleri_children_t.next`: Next child node or `NULL` if there are no other children. (readonly)
Jeroen van der Heijden's avatar
Jeroen van der Heijden committed
328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345

Example looping over all children within a node:
```c
/* we asume having a node (cleri_node_t*) */
if (cleri_node_has_children(node)) {
    cleri_children_t * child = node->children;
    while (child != NULL) {
        // do something with child->node
        child = child->next;
    }
}
```

### `cleri_olist_t`
Linked list holding libcleri objects. A `cleri_olist_t` type is used for
expected elements in a parse result.

*Public members*
346
- `cleri_t * cl_obj`: Object (holding an element, readonly)
Jeroen van der Heijden's avatar
Jeroen van der Heijden committed
347 348 349 350
- `cleri_olist_t * next`: Next object. (readonly)

Example looping over `cleri_parse_t.expect`:
```c
Peasmeets's avatar
Peasmeets committed
351
/* we assume having a pr (cleri_parse_t*)
Jeroen van der Heijden's avatar
Jeroen van der Heijden committed
352 353 354 355 356 357 358 359 360 361
 *
 * Notes:
 *    pr->expect is NULL if nothing is expected and it is save to
 *    change pr->expect. If required the linked list can be reset to start
 *    using cleri_parse_expect_start(). */
while (pr->expect != NULL) {
    // do something with pr->expect->cl_obj
    pr->expect = pr->expect->next;
}
```
Anja Bruls's avatar
Anja Bruls committed
362

363 364
## Elements
Elements are objects used to define a grammar.
Jeroen van der Heijden's avatar
Jeroen van der Heijden committed
365 366 367 368

### `cleri_keyword_t`
Keyword element. The parser needs a match with the keyword.

369
*Type (`cleri_t.tp`)*: `CLERI_TP_KEYWORD`
Jeroen van der Heijden's avatar
Jeroen van der Heijden committed
370

Jeroen van der Heijden's avatar
Jeroen van der Heijden committed
371
*Public members*
Jeroen van der Heijden's avatar
Jeroen van der Heijden committed
372 373 374
- `const char * cleri_keyword_t.keyword`: Contains the keyword string. (readonly)
- `int cleri_keyword_t.ign_case`: Boolean. (readonly)
- `size_t cleri_keyword_t.len`: Length of the keyword string. (readonly)
Jeroen van der Heijden's avatar
Jeroen van der Heijden committed
375

376 377
#### `cleri_t * cleri_keyword(uint32_t gid, const char * keyword, int ign_case)`
Create and return a new [object](#cleri_t) containing a keyword element.
Jeroen van der Heijden's avatar
Jeroen van der Heijden committed
378 379 380 381 382
Argument `ign_case` can be set to 1 for a case insensitive keyword match.

Example:
```c
/* define case insensitive keyword */
383
cleri_t * k_tictactoe = cleri_keyword(
Jeroen van der Heijden's avatar
Jeroen van der Heijden committed
384 385 386
    0,                  // gid, not used in this example
    "tic-tac-toe",      // keyword
    1);                 // case insensitive
Jeroen van der Heijden's avatar
Jeroen van der Heijden committed
387 388 389 390 391 392

/* create grammar with custom keyword regular expression match */
cleri_grammar_t * grammar = cleri_grammar(k_tictactoe, "^[A-Za-z-]+");

/* parse some test string */
cleri_parse_t * pr = cleri_parse(grammar, "Tic-Tac-Toe");
Jeroen van der Heijden's avatar
Jeroen van der Heijden committed
393
printf("Valid: %s\n", pr->is_valid ? "true" : "false"); // true
Jeroen van der Heijden's avatar
Jeroen van der Heijden committed
394

Jeroen van der Heijden's avatar
Jeroen van der Heijden committed
395 396 397 398 399
/* cleanup */
cleri_parse_free(pr);
cleri_grammar_free(grammar);
```

Jeroen van der Heijden's avatar
Jeroen van der Heijden committed
400 401 402 403 404 405
### `cleri_regex_t`
Regular expression element. The parser needs a match with the regular
expression.

*No public members*

406 407
#### `cleri_t * cleri_regex(uint32_t gid, const char * pattern)`
Create and return a new [object](#cleri_t) containing a regular
Jeroen van der Heijden's avatar
Jeroen van der Heijden committed
408 409 410 411 412 413 414
expression element. Argument `pattern` should contain the regular expression.
Each pattern must start with character `^` and the pattern should be checked
before calling this function.

See [Quick usage](#quick-usage) for a `cleri_regex_t` example.

### `cleri_choice_t`
Jeroen van der Heijden's avatar
Jeroen van der Heijden committed
415 416 417 418 419 420
Choice element. The parser must choose one of the child elements.

*Public members*
- `int cleri_choice_t.most_greedy`: Boolean. (readonly)
- `cleri_olist_t * cleri_choice_t.olist`: Children. (readonly)

421 422
#### `cleri_t * cleri_choice(uint32_t gid, int most_greedy, size_t len, ...)`
Create and return a new [object](#cleri_t) containing a choice element.
Jeroen van der Heijden's avatar
Jeroen van der Heijden committed
423
Argument `most_greedy` can be set to 1 in which case the parser will select the
Peasmeets's avatar
Peasmeets committed
424
most greedy match. When 0, the parser will select the first match.
Jeroen van der Heijden's avatar
Jeroen van der Heijden committed
425 426 427 428

Example:
```c
/* define grammar */
429 430 431
cleri_t * k_hello = cleri_keyword(0, "hello", 0);
cleri_t * k_goodbye = cleri_keyword(0, "goodbye", 0);
cleri_t * choice = cleri_choice(
Jeroen van der Heijden's avatar
Jeroen van der Heijden committed
432 433 434 435
    0,                      // gid, not used in this example
    0,                      // stop at first match
    2,                      // number of elements
    k_hello, k_goodbye);    // elements
Jeroen van der Heijden's avatar
Jeroen van der Heijden committed
436 437 438 439 440 441

/* create grammar */
cleri_grammar_t * grammar = cleri_grammar(choice, NULL);

/* parse some test string */
cleri_parse_t * pr = cleri_parse(grammar, "goodbye");
Jeroen van der Heijden's avatar
Jeroen van der Heijden committed
442 443 444 445 446 447 448
printf("Valid: %s\n", pr->is_valid ? "true" : "false"); // true

/* cleanup */
cleri_parse_free(pr);
cleri_grammar_free(grammar);
```

Jeroen van der Heijden's avatar
Jeroen van der Heijden committed
449
### `cleri_sequence_t`
450
Sequence element. The parser must match each element in the specified order.
Jeroen van der Heijden's avatar
Jeroen van der Heijden committed
451 452 453

*Public members*
- `cleri_olist_t * cleri_sequence_t.olist`: Elements. (readonly)
Jeroen van der Heijden's avatar
Jeroen van der Heijden committed
454

455 456
#### `cleri_t * cleri_sequence(uint32_t gid, size_t len, ...)`
Create and return a new [object](#cleri_t) containing a sequence element.
Jeroen van der Heijden's avatar
Jeroen van der Heijden committed
457 458 459

Example:
```c
460
cleri_t * sequence = cleri_sequence(
Jeroen van der Heijden's avatar
Jeroen van der Heijden committed
461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482
    0,                              // gid, not used in the example
    3,                              // number of elements
    cleri_keyword(0, "Tic", 0),     // first element
    cleri_keyword(0, "Tac", 0),     // second element
    cleri_keyword(0, "Toe", 0));    // third element

/* create grammar */
cleri_grammar_t * grammar = cleri_grammar(sequence, NULL);

/* parse some test string */
cleri_parse_t * pr = cleri_parse(grammar, "Tic Tac Toe");
printf("Valid: %s\n", pr->is_valid ? "true" : "false"); // true

/* cleanup */
cleri_parse_free(pr);
cleri_grammar_free(grammar);
```

### `cleri_optional_t`
Optional element. The parser looks for an optional element.

*Public members*
483
- `cleri_t * cleri_optional_t.cl_obj`: Optional element. (readonly)
Jeroen van der Heijden's avatar
Jeroen van der Heijden committed
484

485 486
#### `cleri_t * cleri_optional(uint32_t gid, cleri_t * cl_obj)`
Create and return a new [object](#cleri_t) containing an optional element.
Jeroen van der Heijden's avatar
Jeroen van der Heijden committed
487 488 489 490

Example:
```c
/* define grammar */
491 492 493
cleri_t * k_hello = cleri_keyword(0, "hello", 0);
cleri_t * k_there = cleri_keyword(0, "there", 0);
cleri_t * optional = cleri_optional(
Jeroen van der Heijden's avatar
Jeroen van der Heijden committed
494 495
    0,                  // gid, not used in this example
    k_there);           // optional element
496
cleri_t * greet = cleri_sequence(
Jeroen van der Heijden's avatar
Jeroen van der Heijden committed
497 498 499 500 501 502 503 504 505 506 507 508 509 510 511
    0,                  // gid, not used in this example
    2,                  // number of elements
    k_hello, optional); // elements

/* create grammar */
cleri_grammar_t * grammar = cleri_grammar(greet, NULL);

/* parse some test string */
cleri_parse_t * pr = cleri_parse(grammar, "hello");
printf("Valid: %s\n", pr->is_valid ? "true" : "false"); // true

/* cleanup */
cleri_parse_free(pr);
cleri_grammar_free(grammar);
```
Jeroen van der Heijden's avatar
Jeroen van der Heijden committed
512

513 514 515 516 517 518 519 520 521 522 523
### `cleri_prio_t`
Prio element. The parser must match one element. Inside the prio element it
is possible to use `CLERI_THIS` which is a reference to itself.

>Note: Use a [forward reference](#forward-reference) when possible.
>A prio is required when the same position in a string is potentially checked
>more than once.

*Public members*
- `cleri_olist_t * cleri_sequence_t.olist`: Elements. (readonly)

524 525
#### `cleri_t * cleri_prio(uint32_t gid, size_t len, ...)`
Create and return a new [object](#cleri_t) containing a prio element.
526 527 528 529 530 531 532 533 534 535

Example:
```c
/*
 * define grammar.
 *
 * Note: The third and fourth element are using a reference to the prio
 *       element at the same position in the string as the prio element.
 *       This is why a forward reference cannot be used for this example.
 */
536
cleri_t * prio = cleri_prio(
537 538 539 540 541 542 543 544 545 546 547 548 549 550 551 552 553 554 555 556 557 558 559 560 561 562 563 564
    0,                              // gid, not used in the example
    4,                              // number of elements
    cleri_keyword(0, "ni", 0),      // first element
    cleri_sequence(0, 3,            // second element
        cleri_token(0, "("),
        CLERI_THIS,
        cleri_token(0, ")")),
    cleri_sequence(0, 3,            // third element
        CLERI_THIS,
        cleri_keyword(0, "or", 0),
        CLERI_THIS),
    cleri_sequence(0, 3,            // fourth element
        CLERI_THIS,
        cleri_keyword(0, "and", 0),
        CLERI_THIS));

/* create grammar */
cleri_grammar_t * grammar = cleri_grammar(prio, NULL);

/* parse some test string */
cleri_parse_t * pr = cleri_parse(grammar, "(ni or ni) and (ni or ni)");
printf("Valid: %s\n", pr->is_valid ? "true" : "false"); // true

/* cleanup */
cleri_parse_free(pr);
cleri_grammar_free(grammar);
```

Jeroen van der Heijden's avatar
Jeroen van der Heijden committed
565 566 567 568 569 570
### `cleri_repeat_t`
Repeat element. The parser must math at least `cleri_repeat_t.min` elements and
at most `cleri_repeat_t.max`. An unlimited amount is allowed in case `cleri_repeat_t.max`
is set to 0 (zero).

*Public members*
571
- `cleri_t * cleri_repeat_t.cl_obj`: Element to repeat. (readonly)
Jeroen van der Heijden's avatar
Jeroen van der Heijden committed
572 573 574
- `size_t cleri_repeat_t.min`: Minimum times an element is expected. (readonly)
- `size_t cleri_repeat_t.max`: Maximum times an element is expected or 0 for unlimited. (readonly)

575 576
#### `cleri_t * cleri_repeat(uint32_t gid, cleri_t * cl_obj, size_t min, size_t max)`
Create and return a new [object](#cleri_t) containing a repeat element.
Jeroen van der Heijden's avatar
Jeroen van der Heijden committed
577 578 579 580
Argument `max` should be greater or equal to `min` or 0.

Example:
```c
581
/* define grammar */
582
cleri_t * repeat = cleri_repeat(
583 584 585 586 587 588 589 590 591 592 593 594 595 596 597 598 599 600
    0,                          // gid, not used in this example
    cleri_keyword(0, "ni", 0),  // repeated element
    0,                          // min n times
    0);                         // max n times (0 for unlimited)

/* create grammar */
cleri_grammar_t * grammar = cleri_grammar(repeat, NULL);

/* parse some test string */
cleri_parse_t * pr = cleri_parse(grammar, "ni ni ni ni ni");
printf("Valid: %s\n", pr->is_valid ? "true" : "false"); // true

/* cleanup */
cleri_parse_free(pr);
cleri_grammar_free(grammar);
```

### `cleri_list_t`
Peasmeets's avatar
Peasmeets committed
601
List element. Like [repeat](#cleri_repeat_t) but with a delimiter.
602 603

*Public members*
604 605
- `cleri_t * cleri_list_t.cl_obj`: Element to repeat. (readonly)
- `cleri_t * cleri_list_t.delimiter`: Delimiter between repeating element. (readonly)
606 607 608 609 610
- `size_t cleri_list_t.min`: Minimum times an element is expected. (readonly)
- `size_t cleri_list_t.max`: Maximum times an element is expected or 0 for unlimited. (readonly)
- `int cleri_list_t.opt_closing`: Allow or disallow ending with a delimiter.


611 612
#### `cleri_t * cleri_list(uint32_t gid, cleri_t * cl_obj, cleri_t * delimiter, size_t min, size_t max, int opt_closing)`
Create and return a new [object](#cleri_t) containing a list element.
613 614 615 616 617 618
Argument `max` should be greater or equal to `min` or 0. Argument `opt_closing`
can be 1 (TRUE) to allow or 0 (FALSE) to disallow a list to end with a delimiter.

Example:
```c
/* define grammar */
619
cleri_t * list = cleri_list(
620 621 622 623 624 625 626 627 628 629 630 631 632 633 634 635 636 637 638 639 640 641 642 643 644 645 646
    0,                          // gid, not used in this example
    cleri_keyword(0, "ni", 0),  // repeated element
    cleri_token(0, ","),        // delimiter element
    0,                          // min n times
    0,                          // max n times (0 for unlimited)
    0);                         // disallow ending with a delimiter

/* create grammar */
cleri_grammar_t * grammar = cleri_grammar(list, NULL);

/* parse some test string */
cleri_parse_t * pr = cleri_parse(grammar, "ni, ni, ni, ni, ni");
printf("Valid: %s\n", pr->is_valid ? "true" : "false"); // true

/* cleanup */
cleri_parse_free(pr);
cleri_grammar_free(grammar);
```

### `cleri_token_t`
Token element. The parser must math a token exactly. A token can be one or more
characters and is usually used to match operators like `+`, `-`, `*` etc.

*Public members*
- `const char * cleri_token_t.token`: Token string. (readonly)
- `size_t cleri_token_t.len`: Length of the token string. (readonly)

647 648
#### `cleri_t * cleri_token(uint32_t gid, const char * token)`
Create and return a new [object](#cleri_t) containing a token element.
649 650 651 652

Example:
```c
/* define grammar */
653
cleri_t * token = cleri_token(
654 655 656
    0,          // gid, not used in this example
    "-");       // token string (dash)

657 658
cleri_t * ni =  cleri_keyword(0, "ni", 0);
cleri_t * list = cleri_list(0, ni, token, 0, 0, 0);
659 660 661 662 663 664 665 666 667 668 669 670 671 672 673 674

/* create grammar */
cleri_grammar_t * grammar = cleri_grammar(list, NULL);

/* parse some test string */
cleri_parse_t * pr = cleri_parse(grammar, "ni-ni - ni- ni -ni");
printf("Valid: %s\n", pr->is_valid ? "true" : "false"); // true

/* cleanup */
cleri_parse_free(pr);
cleri_grammar_free(grammar);
```

### `cleri_tokens_t`
Tokens element. Can be used to register multiple tokens at once.

675 676
#### `cleri_t * cleri_tokens(uint32_t gid, const char * tokens)`
Create and return a new [object](#cleri_t) containing a tokens element.
677
Argument `tokens` must be a string with tokens seperated by spaces. If given
Peasmeets's avatar
Peasmeets committed
678
tokens are different in size, the parser will try to match the longest tokens
679 680 681 682 683
first.

Example:
```c
/* define grammar */
684
cleri_t * tokens = cleri_tokens(
685 686 687
    0,              // gid, not used in this example
    "+ - -=");      // tokens string '+', '-' and '-='

688 689
cleri_t * ni =  cleri_keyword(0, "ni", 0);
cleri_t * list = cleri_list(0, ni, tokens, 0, 0, 0);
Jeroen van der Heijden's avatar
Jeroen van der Heijden committed
690

691 692 693 694 695 696 697 698 699 700 701 702 703 704 705 706 707 708 709 710 711 712
/* create grammar */
cleri_grammar_t * grammar = cleri_grammar(list, NULL);

/* parse some test string */
cleri_parse_t * pr = cleri_parse(grammar, "ni + ni -= ni - ni");
printf("Valid: %s\n", pr->is_valid ? "true" : "false"); // true

/* cleanup */
cleri_parse_free(pr);
cleri_grammar_free(grammar);
```

### `Forward reference`
Forward reference to a libcleri object. There is no specific type for a
reference.

>Warning: A reference is not protected against testing the same position in
>in a string. This could potentially lead to an infinite loop.
>For example:
>```c
>cleri_ref_set(ref, cleri_optional(0, ref)); // DON'T DO THIS
>```
Peasmeets's avatar
Peasmeets committed
713
>Use [prio](#cleri_prio_t) if such recursive construction is required.
714

715 716
#### `cleri_t * cleri_ref(void)`
Create and return a new [object](#cleri_t) as reference element.
717
Once the reference is created, it can be used as element in you grammar. Do not
Peasmeets's avatar
Peasmeets committed
718
forget to actually set the reference using `cleri_ref_set()`.
719

720
#### `void cleri_ref_set(cleri_t * ref, cleri_t * cl_obj)`
721 722 723 724 725 726 727 728
Set a reference. For every created forward reference, this function must be
called exactly once. Argument `ref` must be created with `cleri_ref()`. Argument
`cl_obj` cannot be used outside the reference. Since the reference becomes
the `cl_obj`, it is the reference you should use.

Example
```c
/* define grammar */
729 730
cleri_t * ref = cleri_ref();
cleri_t * choice = cleri_choice(
731 732 733 734 735 736 737 738 739 740 741 742 743 744 745 746 747 748 749 750 751 752
    0, 0, 2, cleri_keyword(0, "ni", 0), ref);

cleri_ref_set(ref, cleri_sequence(
    0,
    3,
    cleri_token(0, "["),
    cleri_list(0, choice, cleri_token(0, ","), 0, 0, 0),
    cleri_token(0, "]")));

/* create grammar */
cleri_grammar_t * grammar = cleri_grammar(ref, NULL);

/* parse some test string */
cleri_parse_t * pr = cleri_parse(grammar, "[ni, ni, [ni, [], [ni, ni]]]");
printf("Valid: %s\n", pr->is_valid ? "true" : "false"); // true

/* cleanup */
cleri_parse_free(pr);
cleri_grammar_free(grammar);
```

### `cleri_dup_t`
753
Duplicate an object. The type is an extension to `cleri_t`.
754

755
#### `cleri_t * cleri_dup(uint32_t gid, cleri_t * cl_obj)`
756 757
Duplicate a libcleri object with a different gid but using the same element.

758
>Note: Only the object is duplicated. The element (`cleri_via_t via`)
759 760 761 762 763 764 765 766 767 768 769
>is a pointer to the original object.

The following [pyleri](https://github.com/transceptor-technology/pyleri) code
will use `cleri_dup()` when exported to c:
```python
elem = Repeat(obj, mi=1, ma=1)
```

Use the code below if you want similar behavior without duplication:
```python
elem = Sequence(obj)
Jeroen van der Heijden's avatar
Jeroen van der Heijden committed
770 771
```

772
### Miscellaneous functions
773 774
#### `const char * cleri_version(void)`
Returns the version of libcleri.