SYN-2558: implement compiled storm#4772
Open
invisig0th wants to merge 30 commits into
Open
Conversation
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #4772 +/- ##
==========================================
+ Coverage 97.75% 97.76% +0.01%
==========================================
Files 299 300 +1
Lines 63400 63720 +320
==========================================
+ Hits 61978 62298 +320
Misses 1422 1422
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
Remove extraneous f-string prefixes on messages with no placeholders and drop the unused s_stormbin import in test_tools_storm_pkg_gen.py.
Drop the stormlib compile/decompile functions and their tests. The stormbin binary format remains available via the genpkg --compiled option and is still accepted by the cortex storm APIs.
Compiled queries no longer store the source text. The ``source`` metadata key is gone and ``un()`` no longer threads a ``text`` argument through child deserialization.
Adds ext code 2 in s_msgpack for decimal.Decimal so Decimals can be serialized transparently. stormbin no longer needs a tagged-tuple value encoding layer; AST attribute values are written directly into the envelope. decompile() uses use_list=True so list-typed values stay as lists when round-tripped.
Drop the optional addpos parameter from en() and compile(); position metadata is now written for every node and required by decompile(). A missing pos key raises BadArg.
Move the stormbin isCompiled check into _getStormQuery so decompiled queries land in self.querycache and get reused on subsequent calls instead of being decompiled fresh each time. Also make pos optional in un() again: decompile no longer rejects envelopes that lack per-node position info, restoring the prior behavior for hand-built test envelopes.
Decompiled AST nodes must carry per-node position info; a missing pos key raises BadArg.
Exercise the previously uncovered branches in stormbin: en() rejecting an AST class without _bin_id, un() rejecting a non-dict node meta or missing pos info, the attr default-fallback when an attr key is absent from the encoded attrs dict, and _initRegistry raising on duplicate _bin_id. Marks the unreachable seen-set guard with pragma no cover.
stormbinHash had no callers outside its own test; s_common.queryhash already covers the bytes case for log extras. Drop the function, drop the test, and pare down the changelog entry.
The function takes bytes and returns an AST node, not source text, so the load/dump pairing (msgpack/json/pickle idiom) matches the actual semantics better than compile/decompile.
Cisphyx
reviewed
May 28, 2026
Cisphyx
reviewed
May 29, 2026
EmbedQuery / SubQuery / Query each carry a .valu or .text field that
some consumers feed back into getStormQuery() — most notably the
background, batch, and view.exec commands, whose subquery argument is
the inner Query.text returned by ArgvQuery.compute(). When loaded from
a stormbin payload these fields were empty strings, so re-parsing
silently produced an empty query and the command became a no-op.
At load time, populate each of these fields with a }-prefixed base64
encoded stormbin payload of the inner subtree. Consumers then detect
the compiled form via isCompiled() and load it directly, skipping the
parser entirely and getting the intended fast path for free. Drops
EmbedQuery.valu from _bin_attrs since it's now synthesized.
Adds a defensive guard in getAstText: if astinfo.text starts with the
stormbin prefix, return '' instead of slicing into compiled bytes.
The prefix constant moves to synapse.lib.const so ast and stormbin can
share it without a circular import.
New tests: background, batch, view.exec, and ${...} embedded-query
execution all round-trip correctly through compile/load.
Replace the internal _compileChild helper with a public dump(node, ascii=False) routine that mirrors load() — bytes by default, or a }-prefixed base64 string when ascii=True. The un() sites that synthesize compiled-form text on EmbedQuery / SubQuery / Query now call dump(..., ascii=True).
compile() now mirrors dump() — bytes by default, or a }-prefixed base64 string when ascii=True. The genpkg compiled-field helper becomes a single call to s_stormbin.compile(text, ascii=True).
OCBender
reviewed
May 29, 2026
The parser sets reverse=True after construction via reverseLift() when it sees the reverse(...) sigil. Without a _bin_attrs entry, reverse lifts silently became forward lifts after compile/load. Add the attribute to LiftOper._bin_attrs and accept it as a kwarg on __init__ so un()'s generic kwarg passing picks it up. Adds a round-trip test for reverse-lift queries.
OCBender
reviewed
May 29, 2026
Drops the special-case post-init assignment in stormbin.un() — the generic kwargs path now handles hasyield like every other _bin_attrs entry. SubQuery's branch in un() stays only because of the compiled-form text synthesis.
OCBender
reviewed
May 29, 2026
… un() Const.__init__ now defaults valu to novalu, so Bool, VarList, and Cmpr (which inherit Const) all flow through un()'s generic else branch with valu coming in as a kwarg. EmbedQuery keeps its branch because it still synthesizes valu from kids at load time.
OCBender
reviewed
May 29, 2026
Use attrs.get(key, default) instead of the explicit `if key in attrs` branch.
Query/SubQuery/EmbedQuery each override a new hydrate() method that knows which attribute (.text or .valu) to populate with the compiled form of which subtree (self or kids[0]). The base AstNode.hydrate() no-ops. stormbin.un() drops the per-class branch cascade and just calls node.hydrate() after construction. The hydrate overrides do a function-local import of stormbin to avoid the circular import at module scope; sys.modules has the module cached by the time hydrate() is called, so the cost is a dict lookup.
Replace stormbin._initRegistry() and its subclass-walk with an AstRegistry metaclass on AstNode that registers each subclass at class-definition time. classToId / idToClass move to synapse.lib.ast and stormbin re-exports them for back-compat. A duplicate _bin_id is now caught the moment the offending class is defined, so the test simplifies to an in-line class declaration that should raise.
The AST class -> _bin_id direction was only consumed by tests; en() reads cls._bin_id directly and un() uses idToClass. Remove the dict, the AstRegistry's assignment to it, and the stormbin re-export. Tests that needed an id now read it from the class itself (Class._bin_id).
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
No description provided.