Skip to content

Commit ded1869

Browse files
authored
Finish Get started section (#274)
1 parent 94f795a commit ded1869

8 files changed

Lines changed: 569 additions & 26 deletions

File tree

doc/_quarto.yml

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -121,6 +121,12 @@ website:
121121
href: get_started/first_plot.qmd
122122
- text: Grammar of graphics
123123
href: get_started/grammar.qmd
124+
- text: Anatomy of ggsql
125+
href: get_started/anatomy.qmd
126+
- text: Tooling
127+
href: get_started/tooling.qmd
128+
- text: The rest of the owl
129+
href: get_started/the_rest.qmd
124130

125131
format:
126132
html:

doc/assets/how_to_owl.png

206 KB
Loading

doc/get_started/anatomy.qmd

Lines changed: 101 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,101 @@
1+
---
2+
title: The anatomy of ggsql
3+
---
4+
5+
With a slight bit of knowledge about the grammar of graphics, let's dive into how the concepts are present in ggsql, starting with some key concepts and moving on to how it is reflected in the syntax.
6+
7+
## Layers
8+
ggsql is composable, allowing you to create arbitrarily complex visualizations. Central to this is the concept of layers. A layer is a single visual encoding of some underlying data, e.g. [points](../syntax/layer/type/point.qmd) for a scatterplot, or [bars](../syntax/layer/type/bar.qmd) for a barplot. You can have multiple layers in which case they are stacked on top of each other in the order they are declared (i.e. a layer declared last will be on top and overlap any layer declared before it). A scatterplot with a regression line consist of two layers: A [point](../syntax/layer/type/point.qmd) layer and a [smooth](../syntax/layer/type/smooth.qmd) layer.
9+
10+
Layers may show data directly, e.g. a [point](../syntax/layer/type/point.qmd) layer will show each observation as a point, or it may apply a statistical transformation and show the result of that, e.g. a [histogram](../syntax/layer/type/histogram.qmd) layer will bin and count your data before showing the result as bars.
11+
12+
## Aesthetics
13+
You will encounter aesthetics throughout the documentation and it is arguably one of the most important concepts to get right. Aesthetics are the things that describe the visual entities that makes up a layer, e.g. the [color](../syntax/scale/aesthetic/1_color.qmd) of a point, the [linewidth](../syntax/scale/aesthetic/linewidth.qmd) of a line, and the [opacity](../syntax/scale/aesthetic/2_opacity.qmd) of a polygon.
14+
15+
There are two types of aesthetics: position aesthetics and material aesthetics. The former are related to *where* an entity is *placed* and is deeply connected to the coordinate system of the plot. The latter are related to *how* the entity *looks*.
16+
17+
Aesthetics can either be *mapped* or *set*. You use mapping if you want the aesthetic to be related to values in your data, e.g. have fill color be controlled by a category column from your dataset. You use setting when you wish to fix an aesthetic to a specific value, not related to your data, e.g. you want to set linewidth to 2pt.
18+
19+
## Scales
20+
When you map data to an aesthetic it will seldom have values that are meaningful for the aesthetic. Consider mapping `region` to `fill` because you wish the fill color shows the geographical region the data pertains to. `region` might contain values such as `Asia`, `Europe`, and `South America` which are not meaningful color values. How do you translate these values into something the aesthetic understands?
21+
22+
The answer is using a scale. When mapping an aesthetic it will automatically be scaled by a default scale to ensure that the aesthetic receives values it understands, but you can take control of the scaling and e.g. use a different color palette.
23+
24+
## The syntax
25+
Before we move on, let's examine how the concepts we have just described are reflected in the ggsql syntax. Often these will be enough for your basic visualization needs.
26+
27+
### `VISUALISE`
28+
Every ggsql query starts with a [`VISUALISE`](../syntax/clause/visualise.qmd) (or `VISUALIZE`) clause. It denotes that we are exiting regular SQL syntax and entering ggsql.
29+
30+
While `VISUALISE` can stand on its own as a demarcation line between the regular and the visual query, you can also pass it a list of aesthetic mappings which will define the default mapping for the layers so that you don't have to repeat it for every layer. Lastly, if you do not have a initial SQL query you can name a data source for your plot.
31+
32+
Bringing all of these things together, a `VISUALISE` clause could look like this:
33+
34+
```ggsql
35+
-- |---------- mapping ----------|--- data source ---|
36+
VISUALISE body_mass AS x, bill_len AS y FROM ggsql:penguins
37+
```
38+
39+
### `DRAW`
40+
Following `VISUALISE` you'd usually provide one or more [`DRAW`](../syntax/clause/draw.qmd) clauses which will define your layer. The `DRAW` clause is arguably the most complex clause, but the basic usage is straightforward: You provide the type of the layer, any additional mapping if needed, and perhaps modify the settings of the layer. To achieve this we employ the `MAPPING` and `SETTING` clauses.
41+
42+
The input to the `MAPPING` clause looks exactly like what we saw above for the `VISUALISE` clause. You can provide mappings and optionally a data source if you want the layer to use a data source different from the global data. The `SETTING` clause allows you to both *set* aesthetics as well as set parameters specific to the layer (e.g. number of bins in a histogram).
43+
44+
Bringing all of this together a `DRAW` clause could look like this:
45+
46+
```ggsql
47+
-- |- type --|
48+
DRAW histogram
49+
-- |-- mapping --|
50+
MAPPING bill_len AS x
51+
-- |-- setting ---|- parameter -|
52+
SETTING stroke => null, bins => 20
53+
```
54+
55+
but, if mappings and data source have already been taken care of, it can be as simple as
56+
57+
```ggsql
58+
DRAW point
59+
```
60+
61+
### `SCALE`
62+
As [described above](#scales), ggsql automatically creates a default for mapped aesthetics and if those suit your needs there is no reason to modify them. However, if change is needed you do it with the [`SCALE`](../syntax/clause/scale.qmd) clause.
63+
64+
The clause allows you to set the type of scale, the input range, the output range, the transformation, and lets you control breaks and label formatting. So, the clause can end up with a lot of information but the syntax has been designed so it reads very natural. Further, every part is optional and can be left out if the default fits. An example of a rather complex `SCALE` clause could be:
65+
66+
```ggsql
67+
SCALE ORDINAL fill FROM ['Low', 'Mid', 'High'] TO viridis
68+
SETTING breaks => 6
69+
```
70+
71+
But, if you are only interested in changing e.g. the palette it can be as simple as:
72+
73+
```ggsql
74+
SCALE fill TO viridis
75+
```
76+
77+
## Example
78+
Using the things we have just learned we can combine it all to a complete query consisting of multiple layers and custom scales:
79+
80+
```{ggsql}
81+
VISUALISE bill_len AS x, bill_dep AS y, species AS stroke FROM ggsql:penguins
82+
DRAW point
83+
MAPPING body_mass AS size
84+
SETTING fill => null
85+
DRAW smooth
86+
SETTING method => 'ols'
87+
SCALE stroke TO dark2
88+
SCALE BINNED size TO [4, 15]
89+
SETTING breaks => 4
90+
```
91+
92+
In the above we create a global mapping of bill_len to the `x` aesthetic and bill_dep to the `y` aesthetic using the built-in penguins dataset. We use `DRAW` to create two layers: A point layer for a scatter plot and a smooth layer for regression lines. For the point layer we _map_ the body_mass to size to create a bubble chart and _set_ the fill aesthetic to be empty (`null`) so only the outline is shown. For the smooth layer we set the layer parameter `method` to `'ols'` to estimate a straight regression line. Lastly, we modify the stroke scale to use the dark2 palette from the ColorBrewer project and apply a binned scale to `size` that goes from 4pt to 15pt with 4 breaks (resulting in 3 bins).
93+
94+
While the query above may feel like a mouthful, remember that most visualizations are much simpler:
95+
96+
```{ggsql}
97+
VISUALISE body_mass AS x FROM ggsql:penguins
98+
DRAW histogram
99+
```
100+
101+
In the next section we will introduce the remaining parts of the grammar and the related syntax, but the parts covered here will already take you a very long way.

doc/get_started/the_rest.qmd

Lines changed: 123 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,123 @@
1+
---
2+
title: The rest of the owl
3+
---
4+
5+
We have covered the three most important concepts of the ggsql syntax: `VISUALISE`, `DRAW`, and `SCALE`. Now it's time to learn how to draw the rest of the owl.
6+
7+
![](../assets/how_to_owl.png){style="max-width:500px; display:block; margin:auto;"}
8+
9+
Thankfully, we will give you a bit more help than the illustration above in understanding the last bits of ggsql.
10+
11+
## Coordinate systems
12+
In the earlier section we talked about position aesthetics being special because they are being orchestrated by the coordinate system. The coordinate system is the entity that takes care of the spatial arrangement of graphic objects based on their position aesthetic mapping. When thinking about a coordinate system we tend to think about a Cartesian coordinate system which has a horizontal x-axis and a vertical y-axis. There are others though, like polar systems, cartographic maps, and ternary systems.
13+
14+
At the most basics a coordinate system is a projection function that takes the position aesthetic and projects them into a 2 dimensional plane on the screen or paper. While we commonly have 2 position aesthetics that gets projected to a 2 dimensional plane, this is not a necessity. 3 positional aesthetics could be projected to 2 dimensions using a perspective transform or by using a special coordinate system such as a ternary layout.
15+
16+
## Faceting
17+
Faceting is the process of dividing your data by one or more variables and visualizing each group as a small version next to the other group. This technique is also known as creating small multiples. Often, each single plot will share the same position scales so that it is very easy to compare the small representations against each other.
18+
19+
Using faceting is a very powerful way of comparing groups against each other as the sense of distribution within the group is not impaired by the presence of other data in the view.
20+
21+
## Labelling and annotation
22+
While we all want our data to speak for itself, it is impossible to understand a visualization without context. If the visualization is embedded in some text then the context is often given there, but you are never in control of how your visualization is being shared. Because of this you should strive for your plots to be self-explanatory, both in what it represents and what main points it provides. For the former, you will often use title, subtitle, and proper naming of the axes and legends. For the latter you may want to add elements to the plot area that highlights certain aspects of what is shown.
23+
24+
## Syntax
25+
With the remaining part of the grammar under our belt let's examine how it is reflected in the syntax.
26+
27+
### `PROJECT`
28+
We use the [`PROJECT`](../syntax/clause/project.qmd) clause to control the coordinate system of the plot. It both allows you to control the naming of the position aesthetics in the coordinate system, as well as set various parameters that control the behavior of the coordinate system.
29+
30+
The above alludes to the fact that coordinate systems have different position aesthetics. Often you expect `x` and `y` as position aesthetics and while these are indeed the default name for the [`cartesian`](../syntax/coord/cartesian.qmd) coordinate system they would be nonsensical for a [`polar`](../syntax/coord/polar.qmd) system which uses `radius` and `angle` as defaults. You can, however, freely define your own names, e.g. `r` and `a` for a polar system if you value brevity over comprehension.
31+
32+
`PROJECT` also takes a `SETTING` clause which works much like the `SETTING` clause in `DRAW` and `SCALE`, allowing you to modify the behavior of the coordinate system. An example of a full `PROJECT` clause could be:
33+
34+
```ggsql
35+
PROJECT r, a TO polar
36+
SETTING start => -90, end => 90
37+
```
38+
39+
However, you may not need to specify anything at all. ggsql will automatically detect the use of Cartesian or polar coordinate system from your mapping. If you map to the x or y aesthetics you implicitly use a Cartesian coordinate system, and if you map to radius or angle you implicitly use a polar coordinate system.
40+
41+
### `FACET`
42+
Faceting is applied with the [`FACET`](../syntax/clause/facet.qmd) clause. It allows you to either facet by a single variable (`FACET var`) or by a combination of two variables `FACET var1 BY var2`. In the former case the small multiples are laid out in a row-wise manner, wrapping to the next row if there are more multiples than the number of column. In the latter case the first variable is related to the rows and the second is related to the columns.
43+
44+
There is an alternative to using the `FACET` variable, which is to map the variables directly to the facet aesthetics. There are three of these: `panel` is used when faceting by a single variable and `row` and `column` is used when faceting by two variables. `FACET var` is thus equivalent to `VISUALISE var AS panel`. Whichever you choose to use is thus a matter of personal preference, as well as whether you also need to modify faceting behavior (in which case you'd need a `FACET` clause anyway).
45+
46+
### `LABEL`
47+
ggsql automatically labels the axes and legends in your plot by the column name of the data mapped to it. However, you often want to provide more descriptive names as well as a title to give context to the plot. All of this is accomplished with the [`LABEL`](../syntax/clause/label.qmd) clause by setting the label text for both titles, subtitles, etc. as well as any
48+
aesthetic you have mapped. A `LABEL` clause may end up looking like this:
49+
50+
```ggsql
51+
LABEL
52+
title => "Average wingspan of a cartoon owl"
53+
x => "Radius of first circle (cm)"
54+
y => "Wingspan (cm)"
55+
```
56+
57+
### `PLACE`
58+
When we want to add graphical objects to the plot that do not directly relate to data in your dataset we can use [`PLACE`](../syntax/clause/place.qmd). The clause works much like the `DRAW` clause except it doesn't take mappings or a data source. Instead you provide the data to place as literal values in the `SETTING` part of the clause. While you can place any type of layer, some are more useful than others and you will probably find yourself placing more text, segments, and rectangles than boxplots and histograms.
59+
60+
A standard `PLACE` query could look like this:
61+
62+
```ggsql
63+
PLACE text
64+
SETTING x => 30, y => 45, label => "Very long wings, right!"
65+
```
66+
67+
You may wonder why you wouldn't just do this using `DRAW` since that would also be legal query. The reason is the `DRAW` clauses expand their literals to be the same length as their data source. So if the plot is visualizing a table of 100 rows you will end up with 100 labels stacked on top of each other.
68+
69+
## Examples
70+
Let's apply what we have learned to a couple of plots. First, we will create a pie chart by projecting a stacked bar chart to a polar coordinate system:
71+
72+
```{ggsql}
73+
VISUALISE species AS fill FROM ggsql:penguins
74+
DRAW bar
75+
PROJECT TO polar
76+
```
77+
78+
It may be easier to see how the bar chart turns into a pie by looking at it unstacked:
79+
80+
```{ggsql}
81+
VISUALISE species AS radius, species AS fill FROM ggsql:penguins
82+
DRAW bar
83+
```
84+
85+
See how we didn't have to specify the polar coordinate system in the last example because we have a mapping to radius, allowing ggsql to deduce the coordinate system automatically.
86+
87+
If we instead map the species to angle we end up with a rose plot
88+
89+
```{ggsql}
90+
VISUALISE species AS angle, species AS fill FROM ggsql:penguins
91+
DRAW bar
92+
```
93+
94+
Moving back to the regular pie chart, we might be interested in comparing how the species distribution varies by sex. We can do this with faceting:
95+
96+
```{ggsql}
97+
VISUALISE species AS fill FROM ggsql:penguins
98+
DRAW bar
99+
PROJECT TO polar
100+
FACET island
101+
SETTING free => 'angle'
102+
SCALE panel FROM ['Biscoe', 'Dream']
103+
```
104+
105+
Above, we use the `free` parameter of facet to allow each facet to have their own angle scale. Further, we use `SCALE` on the panel aesthetic to only show panels for the Biscoe and Dream islands.
106+
107+
We can use `LABEL` to add a bit more context to our final plot:
108+
109+
```{ggsql}
110+
VISUALISE species AS fill FROM ggsql:penguins
111+
DRAW bar
112+
PROJECT TO polar
113+
FACET island
114+
SETTING free => 'angle'
115+
SCALE panel FROM ['Biscoe', 'Dream']
116+
LABEL
117+
title => 'Distribution of penguin species between islands',
118+
subtitle => 'Compared across 344 penguins',
119+
fill => 'Species'
120+
```
121+
122+
## The rest of the rest of the owl
123+
While we have now taken a quick tour through the main features of ggsql along with the theoretical backbone that underpins it there is still a lot to learn. The next step is to browse the [syntax documentation](../syntax/index.qmd), begin to build some visualizations on your own, and get some experience with ggsql.

doc/get_started/tooling.qmd

Lines changed: 50 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,50 @@
1+
---
2+
title: Tooling
3+
---
4+
5+
Now that we understand some of the most important parts of the syntax let's spend a bit of time on where and how to apply it. All the examples on this page are interactive and runs directly in the browser, which is obviously useful for teaching, but it will not suffice for your day-to-day work where you need to interact with your own data. ggsql is a general tool you can use in a multitude of ways and we'll go over the most important below.
6+
7+
## VS Code extension
8+
We provide an extension for VS Code/Positron that brings language support to the IDE. Positron is generally superior for data analysis and the ggsql integration is deeper there, which we will showcase below. Still, using the extension with VS Code should provide you with a good developer experience. You can grab the [ggsql extension](https://open-vsx.org/extension/ggsql/ggsql) directly from the marketplace.
9+
10+
Once installed you will get access to ggsql as a language at the same level as R and Python. You can open and edit `.gsql` files with syntax highlighting, autocomplete, you can open up a REPL in the console pane and executing queries and you can see the resulting visualization appear in the plot pane. If you have any database connections in the connection pane you can directly attach these to your ggsql runtime and begin to visualize the tables in there.
11+
12+
## Jupyter kernel
13+
Once the Jupyter kernel is installed you can use ggsql as an engine in your Jupyter notebooks and Quarto documents. For a Jupyter notebook you can select the kernel when you start a new notebook. For a Quarto document you use the ggsql language name to tell the renderer to use the ggsql kernel e.g.
14+
15+
```{{ggsql}}
16+
VISUALISE ...
17+
```
18+
19+
Each block in the document uses the same session, so tables created in one block will be available in subsequent blocks.
20+
21+
## Python package
22+
We have a [python package](https://pypi.org/project/ggsql/) which you can install through pip (`pip install ggsql`). The package provides binding to ggsql and allows you to plot with ggsql directly from within python and register alternative data backends.
23+
24+
A simple example could be
25+
26+
```python
27+
import ggsql
28+
import polars as pl
29+
30+
# Create a DataFrame
31+
df = pl.DataFrame({
32+
"x": [1, 2, 3, 4, 5],
33+
"y": [10, 20, 15, 30, 25],
34+
"category": ["A", "B", "A", "B", "A"]
35+
})
36+
37+
# Render to Altair chart
38+
chart = ggsql.render_altair(df, "VISUALISE x, y DRAW point")
39+
40+
# Display or save
41+
chart.display() # In Jupyter
42+
chart.save("chart.html") # Save to file
43+
```
44+
45+
## Command line interface
46+
While maybe not the most ergonomic way to interact directly with ggsql, there is a CLI interface if you need to build tools around ggsql. The CLI tool allows you to execute a file or string and validate a query without executing it. A simple example of executing a query looks like this:
47+
48+
```bash
49+
ggsql --exec "VISUALISE species AS fill FROM ggsql:penguins DRAW bar"
50+
```

src/writer/vegalite/encoding.rs

Lines changed: 7 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -899,8 +899,13 @@ fn build_column_encoding(
899899
(serde_json::Map::new(), false)
900900
};
901901

902-
// Position scales don't include zero by default
903-
if aesthetic_ctx.is_primary_internal(aesthetic) {
902+
// Position scales don't include zero by default — but only when we set
903+
// an explicit domain. With free facet scales (no domain), VL computes
904+
// the domain from data values. Setting zero:false in that case can exclude
905+
// 0 from the domain, breaking charts with pre-computed stacking (y2/theta2
906+
// starts at 0). Let VL's defaults handle it instead.
907+
let is_free = is_position_free_for_aesthetic(aesthetic, ctx.free_scales);
908+
if aesthetic_ctx.is_primary_internal(aesthetic) && !is_free {
904909
scale_obj.insert("zero".to_string(), json!(false));
905910
}
906911

0 commit comments

Comments
 (0)