Skip to content
This repository was archived by the owner on Apr 1, 2025. It is now read-only.

Commit 76eba98

Browse files
authored
Merge pull request #664 from github/json-and-the-argowats
Interpret syntax imported from JSON
2 parents e4281d5 + 266b581 commit 76eba98

39 files changed

Lines changed: 318 additions & 259 deletions

.bazelignore

Lines changed: 0 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -2,7 +2,6 @@ semantic/dist-newstyle
22
semantic-analysis/dist-newstyle
33
semantic-ast/dist-newstyle
44
semantic-codeql/dist-newstyle
5-
semantic-core/dist-newstyle
65
semantic-go/dist-newstyle
76
semantic-java/dist-newstyle
87
semantic-json/dist-newstyle

.github/workflows/haskell.yml

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -14,14 +14,14 @@ jobs:
1414
runs-on: ubuntu-latest
1515
strategy:
1616
matrix:
17-
ghc: ["8.10.1"]
18-
cabal: ["3.2.0.0"]
17+
ghc: ["8.10"]
18+
cabal: ["3.2"]
1919

2020
steps:
2121
- uses: actions/checkout@v2
2222
if: github.event.action == 'opened' || github.event.action == 'synchronize' || github.event.ref == 'refs/heads/master'
2323

24-
- uses: actions/setup-haskell@v1
24+
- uses: haskell/actions/setup@v1.2.2
2525
name: Setup Haskell
2626
with:
2727
ghc-version: ${{ matrix.ghc }}

build/common.bzl

Lines changed: 0 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -119,7 +119,6 @@ def semantic_language_library(language, name, srcs, ts_package = "", nodetypes =
119119
"//:text",
120120
"//semantic-analysis",
121121
"//semantic-ast",
122-
"//semantic-core",
123122
"//semantic-proto",
124123
"//semantic-scope-graph",
125124
"//semantic-source",

cabal.project

Lines changed: 0 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -5,7 +5,6 @@ packages: semantic
55
semantic-analysis
66
semantic-ast
77
semantic-codeql
8-
semantic-core
98
semantic-go
109
semantic-java
1110
semantic-json

cabal.project.ci

Lines changed: 0 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -5,7 +5,6 @@ packages: semantic
55
semantic-analysis
66
semantic-ast
77
semantic-codeql
8-
semantic-core
98
semantic-go
109
semantic-java
1110
semantic-json
@@ -34,9 +33,6 @@ package semantic-ast
3433
package semantic-codeql
3534
ghc-options: -Werror
3635

37-
package semantic-core
38-
ghc-options: -Werror
39-
4036
package semantic-go
4137
ghc-options: -Werror
4238

docs/adding-new-languages.md

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -8,14 +8,14 @@ Note that we recently transitioned the system to auto-generate strongly-typed AS
88

99
1. **Find or write a [tree-sitter](https://tree-sitter.github.io) parser for your language.** The tree-sitter [organization page](https://github.com/tree-sitter) has a number of parsers beyond those we currently support in Semantic; look there first to make sure you're not duplicating work. The tree-sitter [documentation on creating parsers](http://tree-sitter.github.io/tree-sitter/creating-parsers) provides an exhaustive look at the process of developing and debugging tree-sitter parsers. Though we do not support grammars written with other toolkits such as [ANTLR](https://www.antlr.org), translating an ANTLR or other BNF-style grammar into a tree-sitter grammar is usually straightforward.
1010
2. **Create a Haskell library providing an interface to that C source.** The [`haskell-tree-sitter`](https://github.com/tree-sitter/haskell-tree-sitter) repository provides a Cabal package for each supported language. You can find an example of a pull request to add such a package [here](https://github.com/tree-sitter/haskell-tree-sitter/pull/276/files), and a file providing:
11-
- A bridged (via the FFI) reference to the toplevel parser in the generated file must be provided ([example](https://github.com/tree-sitter/haskell-tree-sitter/blob/master/tree-sitter-json/TreeSitter/JSON.hs#L11)).
12-
- A way to retrieve [`tree-sitter` data](https://github.com/tree-sitter/haskell-tree-sitter/blob/master/tree-sitter-json/TreeSitter/JSON.hs#L13-L14) used to auto-generate syntax datatypes using the following steps. During parser generation, tree-sitter produces a `node-types.json` file that captures the structure of a language's grammar. The autogeneration described below in Step 4 derives datatypes based on this structural representation. The `node-types.json` is a data file in `haskell-tree-sitter` that gets installed with the package. The function `getNodeTypesPath :: IO FilePath` is defined to access in the contents of this file, using `getDataFileName :: FilePath -> IO FilePath`, which is defined in the autogenerated `Paths_` module.
11+
- A bridged (via the FFI) reference to the toplevel parser in the generated file must be provided ([example](https://github.com/tree-sitter/haskell-tree-sitter/blob/master/tree-sitter-json/TreeSitter/JSON.hs#L11)).
12+
- A way to retrieve [`tree-sitter` data](https://github.com/tree-sitter/haskell-tree-sitter/blob/master/tree-sitter-json/TreeSitter/JSON.hs#L13-L14) used to auto-generate syntax datatypes using the following steps. During parser generation, tree-sitter produces a `node-types.json` file that captures the structure of a language's grammar. The autogeneration described below in Step 4 derives datatypes based on this structural representation. The `node-types.json` is a data file in `haskell-tree-sitter` that gets installed with the package. The function `getNodeTypesPath :: IO FilePath` is defined to access in the contents of this file, using `getDataFileName :: FilePath -> IO FilePath`, which is defined in the autogenerated `Paths_` module.
1313
3. **Create a Haskell library in Semantic to auto-generate precise ASTs.** Create a `semantic-[LANGUAGE]` package. This is an example of [`semantic-python`](https://github.com/github/semantic/tree/master/semantic-python)). Each package needs to provide the following API surfaces:
1414
- `Language.[LANGUAGE].AST` - Derives Haskell datatypes from a language and its `node-types.json` file ([example](https://github.com/github/semantic/blob/master/semantic-python/src/Language/Python/AST.hs)).
1515
- `Language.[LANGUAGE].Grammar` - Provides statically-known rules corresponding to symbols in the grammar for each syntax node, generated with the `mkStaticallyKnownRuleGrammarData` Template Haskell splice ([example](https://github.com/github/semantic/blob/master/semantic-python/src/Language/Python/Grammar.hs)).
1616
- `Language.[LANGUAGE]` - Semantic functionality for programs in a language ([example](https://github.com/github/semantic/blob/master/semantic-python/src/Language/Python.hs)).
1717
- `Language.[LANGUAGE].Tags` - Computes tags for code nav definitions and references found in source ([example](https://github.com/github/semantic/blob/master/semantic-python/src/Language/Python/Tags.hs)).
18-
5. **Add tests for precise ASTs, tagging and graphing, and evaluating code written in that language.** Because tree-sitter grammars often change, we require extensive testing so as to avoid the unhappy situation of bitrotted languages that break as soon as a new grammar comes down the line. Here are examples of tests for [precise ASTs](https://github.com/github/semantic/blob/master/semantic-python/test/PreciseTest.hs), [tagging](https://github.com/github/semantic/blob/master/test/Tags/Spec.hs), and [graphing](https://github.com/github/semantic/blob/master/semantic-python/test-graphing/GraphTest.hs).
18+
5. **Add tests for precise ASTs, tagging and graphing, and evaluating code written in that language.** Because tree-sitter grammars often change, we require extensive testing so as to avoid the unhappy situation of bitrotted languages that break as soon as a new grammar comes down the line. Here are examples of tests for [precise ASTs](https://github.com/github/semantic/blob/master/semantic-python/test/PreciseTest.hs), [tagging](https://github.com/github/semantic/blob/master/test/Tags/Spec.hs), and [graphing](https://github.com/github/semantic/blob/master/semantic-python/test-graphing/GraphTest.hs).
1919

2020
To summarize, each interaction made possible by the Semantic CLI corresponds to one (or more) of the above steps:
2121

@@ -30,4 +30,4 @@ To summarize, each interaction made possible by the Semantic CLI corresponds to
3030

3131
**This sounds hard.** You're right! It is currently a lot of work: just because the Semantic architecture is extensible in the expression-problem manner does not mean that adding new support is trivial.
3232

33-
**What recent changes have been made?** The Semantic authors have introduced a new architecture for language support and parsing, one that dispenses with the [assignment](https://github.com/github/semantic/blob/master/docs/assignment.md) step altogether. The `semantic-ast` package generates Haskell data types from tree-sitter grammars; these types are then translated into the [Semantic core language](https://github.com/github/semantic/blob/master/semantic-core/src/Data/Core.hs); all evaluators will then be written in terms of the Core language. As compared with the [historic process]() used to add new languages, these changes entire obviate the process of 1) assigning types into an open-union of syntax functors, and 2) implementing `Evaluatable` instances and adding value effects to describe the control flow of your language.
33+
**What recent changes have been made?** The Semantic authors have introduced a new architecture for language support and parsing, one that dispenses with the [assignment](https://github.com/github/semantic/blob/master/docs/assignment.md) step altogether. The `semantic-ast` package generates Haskell data types from tree-sitter grammars. As compared with the [historic process]() used to add new languages, these changes entire obviate the process of 1) assigning types into an open-union of syntax functors, and 2) implementing `Evaluatable` instances and adding value effects to describe the control flow of your language.

script/ghci-flags

Lines changed: 0 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -58,7 +58,6 @@ function flags {
5858
echo "-isemantic-ast/src"
5959
echo "-isemantic-codeql/src"
6060
echo "-isemantic-codeql/test"
61-
echo "-isemantic-core/src"
6261
echo "-isemantic-go/src"
6362
echo "-isemantic-java/src"
6463
echo "-isemantic-json/src"

script/ghci-flags-dependencies

Lines changed: 0 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -10,7 +10,6 @@ echo "cabal.project"
1010
echo "semantic.cabal"
1111
echo "semantic-analysis/semantic-analysis.cabal"
1212
echo "semantic-ast/semantic-ast.cabal"
13-
echo "semantic-core/semantic-core.cabal"
1413
echo "semantic-tags/semantic-tags.cabal"
1514
echo "semantic-go/semantic-go.cabal"
1615
echo "semantic-java/semantic-java.cabal"

semantic-analysis/BUILD.bazel

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -18,9 +18,10 @@ load(
1818
haskell_library(
1919
name = "semantic-analysis",
2020
srcs = glob(["src/**/*.hs"]),
21-
compiler_flags = GHC_FLAGS + ["-XOverloadedStrings"],
21+
compiler_flags = GHC_FLAGS,
2222
deps = [
2323
"//:base",
24+
"//:bytestring",
2425
"//:containers",
2526
"//:filepath",
2627
"//:text",
@@ -30,5 +31,6 @@ haskell_library(
3031
"@stackage//:fused-effects",
3132
"@stackage//:hashable",
3233
"@stackage//:pathtype",
34+
"@stackage//:vector",
3335
],
3436
)

semantic-analysis/python.tsg

Lines changed: 82 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,82 @@
1+
; tree-sitter-graph definitions mapping Python ASTs to an analyzable IR.
2+
3+
(module) @this
4+
{
5+
node @this.node
6+
attr (@this.node) type = "module"
7+
}
8+
9+
(identifier) @id
10+
{
11+
node @id.node
12+
}
13+
14+
(string) @this
15+
{
16+
node @this.node
17+
attr (@this.node) type = "string"
18+
attr (@this.node) text = (source-text @this)
19+
}
20+
21+
(true) @this
22+
{
23+
node @this.node
24+
attr (@this.node) type = "true"
25+
}
26+
27+
(false) @this
28+
{
29+
node @this.node
30+
attr (@this.node) type = "false"
31+
}
32+
33+
(print_statement argument: (_) @arg) @this
34+
{
35+
node @this.node
36+
attr (@this.node) type = "print"
37+
edge @this.node -> @arg.node
38+
}
39+
40+
(raise_statement (_) @arg) @this
41+
{
42+
node @this.node
43+
attr (@this.node) type = "throw"
44+
edge @this.node -> @arg.node
45+
}
46+
47+
(block (_)* @children) @this
48+
{
49+
node @this.node
50+
attr (@this.node) type = "block"
51+
for child in @children {
52+
edge @this.node -> child.node
53+
}
54+
}
55+
56+
(else_clause body: (_) @body) @this
57+
{
58+
let @this.node = @body.node
59+
}
60+
61+
(if_statement (_)) @this {
62+
node @this.node
63+
attr (@this.node) type = "if"
64+
}
65+
66+
(if_statement condition: (_) @cond consequence: (_) @then) @this {
67+
edge @this.node -> @cond.node
68+
attr (@this.node -> @cond.node) type = "condition"
69+
edge @this.node -> @then.node
70+
attr (@this.node -> @then.node) type = "consequence"
71+
}
72+
73+
(if_statement alternative: (_) @else) @this
74+
{
75+
edge @this.node -> @else.node
76+
attr (@this.node -> @else.node) type = "alternative"
77+
}
78+
79+
(module (_) @child) @this
80+
{
81+
edge @this.node -> @child.node
82+
}

0 commit comments

Comments
 (0)