You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Add white-space productions to the Selectors grammar implementation
The missing white-space broke parsing of selectors, with the latter not having any tests in place to help uncover the issue. This adds handling of white-space through explicit references in the grammar (parsing procedures don't have to be amended), to match the specified behaviour (including that defined with prose).
Copy file name to clipboardExpand all lines: README.md
+1-1Lines changed: 1 addition & 1 deletion
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -71,7 +71,7 @@ Parsing is offered only in the form of Python modules — no "command-line" prog
71
71
72
72
### Why?
73
73
74
-
We wanted a "transparent" CSS parser — one that one could be used in different configurations without it imposing limitations that would strictly speaking go beyond parsing. Put differently, we wanted a parser that does not assume any particular application, a software _library_ in the classical sense of the term, or a true _API_ if you will.
74
+
We wanted a "transparent" CSS parser — one that could be used in different configurations without it imposing limitations that would strictly speaking go beyond parsing. Put differently, we wanted a parser that does not assume any particular application, a software _library_ in the classical sense of the term, or a true _API_ if you will.
75
75
76
76
For instance, the popular [Less](http://lesscss.org) software seems to rather effortlessly parse CSS [3] text, but it invariably re-arranges white-space in the output, without giving the user any control over the latter. Less is not _transparent_ like that — there is no way to use it with recovery of the originally parsed text from the parse tree — parsing with Less is a one-way street for at least _some_ applications (specifically those that "transform" CSS but need to preserve all of the original input as-is).
Copy file name to clipboardExpand all lines: expand-macros.py
+1-1Lines changed: 1 addition & 1 deletion
Original file line number
Diff line number
Diff line change
@@ -2,7 +2,7 @@
2
2
3
3
Macro processing refers here to eager rewriting/replacement/substitution of Python code constructs decorated with the "syntactic" (no definition available normally, when the containing module is imported) decorator `macro`. The purpose of such processing is to implement the equivalent to what is usually called "pre-processing" for e.g. C/C++ language(s). As `macro`-decorated procedures (only decorating of procedures is currently effectively supported for `macro`) are encountered during processing of Python code, the entire procedure is removed and "unparsed" equivalent of the series of AST statements it returned, are inserted in its place instead.
4
4
5
-
This implements powerful and "semantically-aware" code pre-processing mechanism, for situations demanding it. Our immediate need with this was to allow type checkers like MyPy to be able to analyze as much of the project's Python code as possible, which these are normally unable to do in cases of so-called dynamically created types (and consequently object(s) of such types). And so instead of living with effectively uncheckable dynamic types created with the `type` built-in -- for e.g. `Token` subclasses -- we employ _pre-processing_ of Python code into Python code which lends to type-checking, a benefit we deemed to ba a "must-have" for the project.
5
+
This implements powerful and "semantically-aware" code pre-processing mechanism, for situations demanding it. Our immediate need with this was to allow type checkers like MyPy to be able to analyze as much of the project's Python code as possible, which these are normally unable to do in cases of so-called dynamically created types (and consequently object(s) of such types). And so instead of living with effectively uncheckable dynamic types created with the `type` built-in -- for e.g. `Token` subclasses -- we employ _pre-processing_ of Python code into Python code which lends to type-checking, a benefit we deemed to be a "must-have" for the project.
"""Variant of `parse` for productions of the ` ` combinator variety (see "juxtaposing components" at https://drafts.csswg.org/css-values-4/#component-combinators)."""
"""Variant of `parse` for productions of the `!` multiplier variety (see https://drafts.csswg.org/css-values-4/#mult-req)."""
109
-
result=cast(Product, parse(production.element, input)) # The element of a non-empty production is concatenation, and the `parse` overload for `ConcatenationProduction` never returns a `Token`, only `Product | None`
87
+
result=cast(Product|None, parse(production.element, input)) # The element of a non-empty production is concatenation, and the `parse` overload for `ConcatenationProduction` never returns a `Token`, only `Product | None`
"""The grammar defining the language of selector list expressions.
164
154
165
155
Normally a grammar would be defined as a set of rules (for deriving productions), where each rule would feature a component to the left side of the `->` operator (the "rewriting" operator) and a component to the right side of the operator. Owing to relative simplicity of the Selectors grammar -- where the left-hand side component is always a production name _reference_ (an identifying factor of context free grammars), we leverage Python's meta-programming facilities and use class attribute assignment statements to define the rules instead, where the assigned value is the right side of the rule, an arbitrary production (which may be an opaque value). Each attribute of the grammar is assigned the corresponding name automatically, owing to the `__set_name__` dunder method of the common production (super)class (where appropriate).
166
156
157
+
NOTE: Some of the productions as defined in the specification, have been rewritten below to eliminate repetition. These rewritten productions are marked accordingly, for clarity.
158
+
159
+
NOTE: `intersperse` is used to insert white-space productions as required by the specification, which otherwise doesn't include them explicitly, instead describing white-space handling "in prose".
160
+
161
+
NOTE: There is no notation (defined by the Values & Units spec.) for expressing `RepetitionProduction` productions with a `separator` attribute value other than `None` (the '[ ... ]*' variant) or that of `CommaSeparatedRepetitionProduction` (the '[ ... ]#' variant). Nevertheless, these productions are employed below to eliminate repetition as part of optimizing the grammar.
:param element: The production expressing the repeating part of this production
89
90
:param min: The minimum amount of times the parser must accept input, i.e. the minimum number of repetitions of token sequences accepted by the parser
90
91
:param max: The maximum amount of times the parser will be called, i.e. the maximum number of repetitions that may be consumed in the input; the value of `None` implies no maximum (i.e. no upper bound on repetition)
92
+
:param separator: A production expressing the "delimiting" part between any two repetitions of the `element` production; if omitted or `None`, there's _no_ delimiting part -- repetitions are _adjacent_
91
93
"""
92
94
assertmin>=0
93
95
assertmaxisNoneormax>0
94
96
assertmaxisNoneormin<=max
95
97
self.min=min
96
98
self.max=max
97
99
self.element=element
100
+
ifseparator:
101
+
self.separator=separator
98
102
99
103
classOptionalProduction(RepetitionProduction):
100
104
"""Class of productions equivalent to `RepetitionProduction` with no lower bound and accepting no repetition of the element, meaning the element is expressed at most once.
whitespace=RepetitionProduction(TokenProduction(WhitespaceToken), min=1) # The white-space production; presence of white-space expressed with this production, is _mandatory_ (`min=1`); the definition was "hoisted" here because a) it depends on `RepetitionProduction` and `TokenProduction` definitions, which must thus precede it, and b) because the `CommaSeparatedRepetitionParser` definition that follows, depends on it, in turn
"""Class of productions that express a non-empty comma-separated repetition (CSR) of a production element.
124
129
125
-
Unlike `RepetitionProduction` which permits arbitrary number of the production element, this class does not currently implement arbitrary repetition bounds. The delimiting part (a comma optionally surrounded by white-space) is mandatory, which implies at least one repetition (two expressions of the element). Disregarding the delimiting behaviour, productions of this class thus behave like those of `RepetitionProduction` with `2` for `min` and `None` for `max` property values.
126
-
127
130
Implements the `#` notation as defined at http://drafts.csswg.org/css-values-4/#mult-comma.
128
131
"""
129
-
delimiter=ConcatenationProduction(OptionalProduction(AlternativesProduction(whitespace, TokenProduction(CommentToken))), TokenProduction(CommaToken), OptionalProduction(AlternativesProduction(whitespace, TokenProduction(CommentToken)))) # The production expressing the delimiter to use with the repetition, a comma with [optional] white-space around it
130
-
element: Production
131
-
def__init__(self, element: Production):
132
-
"""
133
-
:param element: A production to use for expressing the repeating part in this production
134
-
"""
135
-
self.element=element
132
+
separator=ConcatenationProduction(OWS, TokenProduction(CommaToken), OWS) # A comma with [optional] white-space around it
assertmin>=1# "one or more times" (ref. definition); the spec. does not define whether a minimum of zero is permitted, so we err on the safer side
135
+
super().__init__(element, min, max)
136
136
137
137
classFormatter:
138
138
"""Class of objects that offer procedures for serializing productions into streams of text formatted per the [value definition syntax](https://drafts.csswg.org/css-values-4/#value-defs)."""
0 commit comments