Trying to parse this AIREP:
UAAA01 EGRR 031514
VPFBQ 6600S 06834W 1320 F200 MS32 290/36
VPFBQ 6400S 06903W 1356 F200 MS30 280/33
VPFBQ 6200S 06928W 1433 F200 MS30 310/18
VPFBQ 6000S 06950W 1514 F200 MS30 00/38
with this grammar:
%import common.WS_INLINE
%import common.NEWLINE
%ignore WS_INLINE
%ignore NEWLINE
?start: airep_tac
// significant newline has higher prio than NEWLINE
_NL.2: /\n/
airep_tac: header_line designator_line? (airep_block | airep_block_snl)
// -------------------- WMO header --------------------
header_line: message_type issuing_office issue_time correction* _NL
message_type: TTAAII
issuing_office: CCCC
issue_time: YYGGGG
correction: BBB
TTAAII: /U[A-Z]{3}[0-9]{2}(?![A-Z0-9])/
CCCC: /[A-Z]{4}(?![A-Z])/
YYGGGG: /[0-9]{6}(?![0-9])/
BBB: /[A-Z]{3}/
// -------------------- description line --------------------
designator_line: AIREP date? _NL
AIREP.2: "AIREP"
date: DDHH
DDHH: /\d{4}/
// -------------------- airep_blocks --------------------
// airep blocks using ARP/ARS as record seperator
airep_block: airep_line+
airep_line: msg_type_designator airplane_id loc_ref REST+
REST: /(?!ARP|ARS)[^\s]+/
msg_type_designator: ARP | ARS
ARP: "ARP"
ARS: "ARS"
airplane_id: /[A-Z0-9]{4,7}/
loc_ref: latlon_ddmm
latlon_ddmm: LAT_DD LAT_MM LAT_HEM LON_DDD LON_MM LON_HEM
LAT_DD.5: /\d{2}(?=\d{2}[NS]\s*\d{5}[EW])/
LAT_MM.5: /\d{2}(?=[NS]\s*\d{5}[EW])/
LAT_HEM.5: /[NS]/
LON_DDD.5: /\d{3}(?=\d{2}[EW])/
LON_MM.5: /\d{2}(?=[EW])/
LON_HEM.5: /[EW]/
// airep blocks using significant newlines as record seperator
airep_block_snl: airep_line_snl+
airep_line_snl: airplane_id loc_ref REST_SNL+ _NL
REST_SNL: /[^\n]+/
the lexer still has the REST terminal in his accept set, which is what I don't understand? Here's the concrete error:
E lark.exceptions.UnexpectedToken: Unexpected token Token('REST', '1320') at line 2, column 20.
E Expected one of:
E * REST_SNL
But this seems crazy to me, after all the parse table should not even consider the REST token if I understand how the contextual lexer works. The REST token should only be a valid choice in the airep_line derivations, not in airep_line_snl-derivations. So I assumed the parse table would be built like this, but a bit of logging using the interactive parser shows:
Parser choices:
- REST_SNL -> (Reduce, Rule(NonTerminal(Token('RULE', 'latlon_ddmm')), [Terminal('LAT_DD'), Terminal('LAT_MM'), Terminal('LAT_HEM'), Terminal('LON_DDD'), Terminal('LON_MM'), Terminal('LON_HEM')], None, RuleOptions(False, False, None, None)))
- REST -> (Reduce, Rule(NonTerminal(Token('RULE', 'latlon_ddmm')), [Terminal('LAT_DD'), Terminal('LAT_MM'), Terminal('LAT_HEM'), Terminal('LON_DDD'), Terminal('LON_MM'), Terminal('LON_HEM')], None, RuleOptions(False, False, None, None)))
stack size: 9
EXPECTED: ['REST_SNL']
NEXT: 1320
LAST_OK: W
F
After parsing LON_HEM he still has both tokens up for the grabs. Why?
Trying to parse this AIREP:
with this grammar:
the lexer still has the
RESTterminal in his accept set, which is what I don't understand? Here's the concrete error:But this seems crazy to me, after all the parse table should not even consider the
RESTtoken if I understand how the contextual lexer works. TheRESTtoken should only be a valid choice in theairep_linederivations, not inairep_line_snl-derivations. So I assumed the parse table would be built like this, but a bit of logging using the interactive parser shows:After parsing LON_HEM he still has both tokens up for the grabs. Why?