Skip to content

Commit 0f8ffff

Browse files
authored
Additional gallery examples (#293)
* additional scatterplot variations * heatmap examples * density * violin * boxplot * pie charts fix #231 * add minard example * thumbnails and such * change data order/location * add bit about inspecting data
1 parent 1b3a265 commit 0f8ffff

20 files changed

Lines changed: 829 additions & 1 deletion

.gitignore

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -95,6 +95,8 @@ docs/_build/
9595
*.sqlite
9696
*.sqlite3
9797
!src/data/*.parquet
98+
!doc/assets/data/*.csv
99+
!doc/gallery/examples/*.csv
98100

99101
# Configuration files
100102
.env

doc/_quarto.yml

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -2,6 +2,7 @@ project:
22
type: website
33
resources:
44
- wasm/**
5+
- assets/data/**
56

67
website:
78
title: "ggsql"

doc/assets/data/minard_cities.csv

Lines changed: 21 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,21 @@
1+
"long","lat","city"
2+
24,55,"Kowno"
3+
25.3,54.7,"Wilna"
4+
26.4,54.4,"Smorgoni"
5+
26.8,54.3,"Moiodexno"
6+
27.7,55.2,"Gloubokoe"
7+
27.6,53.9,"Minsk"
8+
28.5,54.3,"Studienska"
9+
28.7,55.5,"Polotzk"
10+
29.2,54.4,"Bobr"
11+
30.2,55.3,"Witebsk"
12+
30.4,54.5,"Orscha"
13+
30.4,53.9,"Mohilow"
14+
32,54.8,"Smolensk"
15+
33.2,54.9,"Dorogobouge"
16+
34.3,55.2,"Wixma"
17+
34.4,55.5,"Chjat"
18+
36,55.5,"Mojaisk"
19+
37.6,55.8,"Moscou"
20+
36.6,55.3,"Tarantino"
21+
36.5,55,"Malo-Jarosewii"

doc/assets/data/minard_troops.csv

Lines changed: 52 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,52 @@
1+
"long","lat","survivors","direction","group"
2+
37.7,55.7,100000,"R",1
3+
37.5,55.7,98000,"R",1
4+
37,55,97000,"R",1
5+
36.8,55,96000,"R",1
6+
35.4,55.3,87000,"R",1
7+
34.3,55.2,55000,"R",1
8+
33.3,54.8,37000,"R",1
9+
32,54.6,24000,"R",1
10+
30.4,54.4,20000,"R",1
11+
29.2,54.3,20000,"R",1
12+
28.5,54.2,20000,"R",1
13+
28.3,54.3,20000,"R",1
14+
27.5,54.5,20000,"R",1
15+
26.8,54.3,12000,"R",1
16+
26.4,54.4,14000,"R",1
17+
25,54.4,8000,"R",1
18+
24.4,54.4,4000,"R",1
19+
24.2,54.4,4000,"R",1
20+
24.1,54.4,4000,"R",1
21+
28.7,55.5,33000,"R",2
22+
29.2,54.2,30000,"R",2
23+
28.5,54.1,30000,"R",2
24+
28.3,54.2,28000,"R",2
25+
24.6,55.8,6000,"R",3
26+
24.2,54.4,6000,"R",3
27+
24.1,54.4,6000,"R",3
28+
24,54.9,340000,"A",1
29+
24.5,55,340000,"A",1
30+
25.5,54.5,340000,"A",1
31+
26,54.7,320000,"A",1
32+
27,54.8,300000,"A",1
33+
28,54.9,280000,"A",1
34+
28.5,55,240000,"A",1
35+
29,55.1,210000,"A",1
36+
30,55.2,180000,"A",1
37+
30.3,55.3,175000,"A",1
38+
32,54.8,145000,"A",1
39+
33.2,54.9,140000,"A",1
40+
34.4,55.5,127100,"A",1
41+
35.5,55.4,100000,"A",1
42+
36,55.5,100000,"A",1
43+
37.6,55.8,100000,"A",1
44+
24,55.1,60000,"A",2
45+
24.5,55.2,60000,"A",2
46+
25.5,54.7,60000,"A",2
47+
26.6,55.7,40000,"A",2
48+
27.4,55.6,33000,"A",2
49+
28.7,55.5,33000,"A",2
50+
24,55.2,22000,"A",3
51+
24.5,55.3,22000,"A",3
52+
24.6,55.8,6000,"A",3

doc/assets/minard.png

671 KB
Loading

doc/gallery/examples/boxplot.qmd

Lines changed: 84 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,84 @@
1+
---
2+
title: "Box plots"
3+
description: "Showing groups of distributions of single numeric variables"
4+
image: thumbnails/boxplot.svg
5+
categories: [basic, boxplot, distribution]
6+
order: 3
7+
---
8+
9+
Boxplots are a popular way to display a summary of a distribution of single continuous variables.
10+
It is good to keep in mind boxplots hide the actual distribution of the data behind a summary, for example when the data is bi- or multi-modal.
11+
For every group, a boxplot displays the following 6 things:
12+
13+
1. The 25^th^ percentile, or Q1, as the start of the box.
14+
2. The 50^th^ percentile, i.e. median or Q2, as a line across the box.
15+
3. The 75^th^ percentile, or Q3, as the end of the box. Together with Q1 we can compute the interquartile range: IQR = Q3 - Q1.
16+
4. The minimum data value or Q1 - 1.5 * IQR, whichever is larger. This is displayed as the lower whisker.
17+
5. The maximum data value or Q3 + 1.5 * IQR, whichever is smaller. This is displayed as the upper whisker.
18+
6. Outliers outside the whiskers, if present. These are drawn as individual points.
19+
20+
## Code
21+
22+
```{ggsql}
23+
VISUALISE species AS x, bill_len AS y FROM ggsql:penguins
24+
DRAW boxplot
25+
```
26+
27+
## Explanation
28+
29+
* The `VISUALISE ... FROM ggsql:penguins` loads the built-in penguins dataset.
30+
* `species AS x` sets a categorical variable to separate different groups.
31+
* `bill_len AS y` sets the numeric variable to summarise.
32+
* `DRAW boxplot` gives instructions to draw the boxplot layer.
33+
34+
## Variations
35+
36+
### Dodging
37+
38+
You can refine groups beyond the axis categorical variable, and the boxplots will be displayed in a dodged way.
39+
40+
```{ggsql}
41+
VISUALISE species AS x, bill_len AS y, island AS fill FROM ggsql:penguins
42+
DRAW boxplot
43+
```
44+
45+
However, dodging might be unproductive or counterintuitive in some cases.
46+
For example if we double-encode groups, like `species` as both `x` *and* `fill` in the plot below, dodging looks bad.
47+
48+
```{ggsql}
49+
VISUALISE species AS x, bill_len AS y, species AS fill FROM ggsql:penguins
50+
DRAW boxplot
51+
```
52+
53+
We can disable the dodging by setting `position => 'identity'`.
54+
55+
```{ggsql}
56+
VISUALISE species AS x, bill_len AS y, species AS fill FROM ggsql:penguins
57+
DRAW boxplot SETTING position => 'identity'
58+
```
59+
60+
### Horizontal
61+
62+
To draw the boxplots horizontally, simply swap the `x` and `y` mapping.
63+
The orientation is detected automatically based on which variable is continuous and which is discrete.
64+
65+
```{ggsql}
66+
VISUALISE bill_len AS x, species AS y, island AS fill FROM ggsql:penguins
67+
DRAW boxplot
68+
```
69+
70+
### With individual datapoints
71+
72+
Because a boxplot is a summary, it may be a good idea to supplement them with individual datapoints so that you can't be accused of 'hiding' the distribution.
73+
The datapoints can be jittered by setting `position => 'jitter'`.
74+
When you do this, make sure to turn `outliers => false` to not draw the outlier points twice across the two layers.
75+
76+
<!-- TODO: Figure out why the boxplot width is so small -->
77+
78+
```{ggsql}
79+
VISUALISE species AS x, bill_len AS y FROM ggsql:penguins
80+
DRAW point SETTING position => 'jitter'
81+
DRAW boxplot SETTING outliers => false
82+
```
83+
84+

doc/gallery/examples/density.qmd

Lines changed: 94 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,94 @@
1+
---
2+
title: "Density plots"
3+
description: "Showing smooth distributions of single numeric variables"
4+
image: thumbnails/density-plot.svg
5+
categories: [basic, density, distribution]
6+
order: 3
7+
---
8+
9+
Like histograms, density plots show the distribution of a numeric variable.
10+
Instead of binning, density plots use [kernel density estimation](https://en.wikipedia.org/wiki/Kernel_density_estimation) to estimate a smooth, continuous probability density.
11+
A kernel (like a Gaussian) is placed on each point and summed.
12+
The level of smoothing is controlled via the bandwidth which affects the width of the kernel.
13+
14+
## Code
15+
16+
The x-axis gives the value of the numerical variable, whereas the y-axis gives the estimated probability density.
17+
18+
```{ggsql}
19+
VISUALISE bill_len AS x, species AS colour FROM ggsql:penguins
20+
DRAW density
21+
```
22+
23+
## Explanation
24+
25+
* The `VISUALISE ... FROM ggsql:penguins` loads the built-in penguins dataset.
26+
* `bill_len AS x` sets the numeric variable to use for density estimation.
27+
* `species AS colour` sets implicit groups indicated by colour.
28+
* `DRAW density` gives instructions to draw the density layer.
29+
30+
## Variations
31+
32+
### Group contributions
33+
34+
Using the density gives all groups equal area that integrates to 1.
35+
This masks differences between the sizes of groups.
36+
Instead of using density, one can use the `intensity` that also encompasses differences in group size.
37+
38+
```{ggsql}
39+
VISUALISE bill_len AS x, species AS colour FROM ggsql:penguins
40+
DRAW density REMAPPING intensity AS y
41+
```
42+
43+
### Stacking
44+
45+
Instead of having independent groups, the density can also be stacked.
46+
Note that stacking alone does not account for relative contributions per group.
47+
For that reason, you may want to show the intensity instead.
48+
49+
```{ggsql}
50+
VISUALISE bill_len AS x, species AS colour FROM ggsql:penguins
51+
DRAW density
52+
REMAPPING intensity AS y
53+
SETTING position => 'stack'
54+
```
55+
56+
### Annotation
57+
58+
You can use the [rule](../../syntax/layer/type/rule.qmd) layer to display precomputed summaries, like the mean.
59+
60+
<!-- TODO: This should be updated once we have aggregates working -->
61+
62+
```{ggsql}
63+
WITH mean_data AS (
64+
SELECT
65+
AVG(bill_len) AS bill_len,
66+
species
67+
FROM ggsql:penguins
68+
GROUP BY species
69+
)
70+
VISUALISE bill_len AS x, species AS colour FROM ggsql:penguins
71+
DRAW density SETTING opacity => 0.3
72+
DRAW rule MAPPING FROM mean_data
73+
```
74+
75+
### Faceting
76+
77+
Another way of comparing groups is by using facets to separate the groups into different panels.
78+
79+
```{ggsql}
80+
VISUALISE bill_len AS x, species AS colour FROM ggsql:penguins
81+
DRAW density
82+
FACET species SETTING ncol => 1
83+
```
84+
85+
### Relation to violin plots
86+
87+
Conceptually, violin plots also display densities.
88+
The similarity becomes clearer if you make a ridgeline plot by displaying the violin density on a single side.
89+
The plot below is essentially showing the same thing as the plot above, but gathered in a single panel.
90+
91+
```{ggsql}
92+
VISUALISE bill_len AS x, species AS y, species AS colour FROM ggsql:penguins
93+
DRAW violin SETTING side => 'top', width => 2
94+
```

doc/gallery/examples/heatmap.qmd

Lines changed: 56 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,56 @@
1+
---
2+
title: "Heatmap"
3+
description: "Arranging tiles on a grid"
4+
image: thumbnails/violin-plot.svg
5+
categories: [basic, heatmap]
6+
order: 3
7+
---
8+
9+
A heatmap visusalised data values as colors in a grid layout.
10+
It makes it easy to see patterns and relationships through color intensity.
11+
It works best with discrete or ordinal arrangements.
12+
13+
## Code
14+
15+
```{ggsql}
16+
VISUALISE Day AS x, Month AS y, Temp AS fill FROM ggsql:airquality
17+
DRAW rect
18+
```
19+
20+
## Explanation
21+
22+
* The `VISUALISE ... FROM ggsql:airquality` loads the built-in air quality dataset.
23+
* `Day AS x, Month AS y` defines a 2D grid 'map'. The default width and height of each cell is 1. Because these variables are contiguous whole numbers, this creates a grid.
24+
* `Temp AS fill` declares the 'heat' variable to display as colour intensity.
25+
* `DRAW rect` gives instructions to draw a rectangle layer.
26+
27+
## Variations
28+
29+
As a stylistic choice, you can set the cells to be opaque without borders.
30+
31+
```{ggsql}
32+
VISUALISE Month AS y, Day AS x, Temp AS fill FROM ggsql:airquality
33+
DRAW rect
34+
SETTING stroke => null, opacity => 1
35+
```
36+
37+
You can change the color by adapting the scale.
38+
39+
```{ggsql}
40+
VISUALISE Month AS y, Day AS x, Temp AS fill FROM ggsql:airquality
41+
DRAW rect
42+
SCALE fill TO magma
43+
SETTING reverse => true
44+
```
45+
46+
If you have centered data, you may want to use a divergent colour scale. It is important to the two extremes in `FROM` symmetrically around the midpoint.
47+
48+
```{ggsql}
49+
SELECT *,
50+
Temp * 1.0 - AVG(Temp) OVER (PARTITION BY Month) AS centered
51+
FROM ggsql:airquality
52+
53+
VISUALISE Month AS y, Day AS x, centered AS fill
54+
DRAW rect
55+
SCALE fill FROM [-20, 20] TO vik
56+
```

0 commit comments

Comments
 (0)