contextual lexer accept set seems wrong

Trying to parse this AIREP:

```
UAAA01 EGRR 031514
VPFBQ 6600S 06834W 1320 F200 MS32 290/36
VPFBQ 6400S 06903W 1356 F200 MS30 280/33
VPFBQ 6200S 06928W 1433 F200 MS30 310/18
VPFBQ 6000S 06950W 1514 F200 MS30 00/38
```

with this grammar:

```
%import common.WS_INLINE
%import common.NEWLINE
%ignore WS_INLINE
%ignore NEWLINE


?start: airep_tac

// significant newline has higher prio than NEWLINE
_NL.2: /\n/

airep_tac: header_line designator_line? (airep_block | airep_block_snl)


// -------------------- WMO header --------------------
header_line: message_type issuing_office issue_time correction* _NL

message_type: TTAAII
issuing_office: CCCC
issue_time: YYGGGG
correction: BBB

TTAAII: /U[A-Z]{3}[0-9]{2}(?![A-Z0-9])/
CCCC: /[A-Z]{4}(?![A-Z])/ 
YYGGGG: /[0-9]{6}(?![0-9])/ 
BBB: /[A-Z]{3}/

// -------------------- description line --------------------
designator_line: AIREP date? _NL

AIREP.2: "AIREP"

date: DDHH
DDHH: /\d{4}/

// -------------------- airep_blocks --------------------

// airep blocks using ARP/ARS as record seperator
airep_block: airep_line+
airep_line: msg_type_designator airplane_id loc_ref REST+

REST: /(?!ARP|ARS)[^\s]+/

msg_type_designator: ARP | ARS
ARP: "ARP"
ARS: "ARS"

airplane_id: /[A-Z0-9]{4,7}/

loc_ref: latlon_ddmm

latlon_ddmm: LAT_DD LAT_MM LAT_HEM LON_DDD LON_MM LON_HEM
LAT_DD.5: /\d{2}(?=\d{2}[NS]\s*\d{5}[EW])/
LAT_MM.5: /\d{2}(?=[NS]\s*\d{5}[EW])/
LAT_HEM.5: /[NS]/
LON_DDD.5: /\d{3}(?=\d{2}[EW])/
LON_MM.5: /\d{2}(?=[EW])/
LON_HEM.5: /[EW]/

// airep blocks using significant newlines as record seperator
airep_block_snl: airep_line_snl+
airep_line_snl: airplane_id loc_ref REST_SNL+ _NL

REST_SNL: /[^\n]+/
```

the lexer still has the `REST` terminal in his accept set, which is what I don't understand? Here's the concrete error:

```
E               lark.exceptions.UnexpectedToken: Unexpected token Token('REST', '1320') at line 2, column 20.
E               Expected one of:
E                       * REST_SNL
```

But this seems crazy to me, after all the parse table should not even consider the `REST` token if I understand how the contextual lexer works. The `REST` token should only be a valid choice in the `airep_line` derivations, not in `airep_line_snl`-derivations. So I assumed the parse table would be built like this, but a bit of logging using the interactive parser shows:

```
Parser choices:
        - REST_SNL -> (Reduce, Rule(NonTerminal(Token('RULE', 'latlon_ddmm')), [Terminal('LAT_DD'), Terminal('LAT_MM'), Terminal('LAT_HEM'), Terminal('LON_DDD'), Terminal('LON_MM'), Terminal('LON_HEM')], None, RuleOptions(False, False, None, None)))
        - REST -> (Reduce, Rule(NonTerminal(Token('RULE', 'latlon_ddmm')), [Terminal('LAT_DD'), Terminal('LAT_MM'), Terminal('LAT_HEM'), Terminal('LON_DDD'), Terminal('LON_MM'), Terminal('LON_HEM')], None, RuleOptions(False, False, None, None)))
stack size: 9
EXPECTED: ['REST_SNL']
NEXT: 1320
LAST_OK: W
F
```

After parsing LON_HEM he still has both tokens up for the grabs. Why?


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

contextual lexer accept set seems wrong #1581

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

contextual lexer accept set seems wrong #1581

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions