posit-dev
diff --git a/‎.gitignore‎
Lines changed: 2 additions & 0 deletions b/‎.gitignore‎
Lines changed: 2 additions & 0 deletions
diff --git a/‎doc/_quarto.yml‎
Lines changed: 1 addition & 0 deletions b/‎doc/_quarto.yml‎
Lines changed: 1 addition & 0 deletions
diff --git a/‎doc/assets/data/minard_cities.csv‎
Lines changed: 21 additions & 0 deletions b/‎doc/assets/data/minard_cities.csv‎
Lines changed: 21 additions & 0 deletions
diff --git a/‎doc/assets/data/minard_troops.csv‎
Lines changed: 52 additions & 0 deletions b/‎doc/assets/data/minard_troops.csv‎
Lines changed: 52 additions & 0 deletions
diff --git a/‎doc/assets/minard.png‎
671 KB b/‎doc/assets/minard.png‎
671 KB
diff --git a/‎doc/gallery/examples/boxplot.qmd‎
Lines changed: 84 additions & 0 deletions b/‎doc/gallery/examples/boxplot.qmd‎
Lines changed: 84 additions & 0 deletions
diff --git a/‎doc/gallery/examples/density.qmd‎
Lines changed: 94 additions & 0 deletions b/‎doc/gallery/examples/density.qmd‎
Lines changed: 94 additions & 0 deletions
diff --git a/‎doc/gallery/examples/heatmap.qmd‎
Lines changed: 56 additions & 0 deletions b/‎doc/gallery/examples/heatmap.qmd‎
Lines changed: 56 additions & 0 deletions
@@ -95,6 +95,8 @@ docs/_build/
 *.sqlite
 *.sqlite3
 !src/data/*.parquet
+!doc/assets/data/*.csv
+!doc/gallery/examples/*.csv
 
 # Configuration files
 .env
 
@@ -2,6 +2,7 @@ project:
   type: website
   resources:
     - wasm/**
+    - assets/data/**
 
 website:
   title: "ggsql"
 
@@ -0,0 +1,21 @@
+"long","lat","city"
+24,55,"Kowno"
+25.3,54.7,"Wilna"
+26.4,54.4,"Smorgoni"
+26.8,54.3,"Moiodexno"
+27.7,55.2,"Gloubokoe"
+27.6,53.9,"Minsk"
+28.5,54.3,"Studienska"
+28.7,55.5,"Polotzk"
+29.2,54.4,"Bobr"
+30.2,55.3,"Witebsk"
+30.4,54.5,"Orscha"
+30.4,53.9,"Mohilow"
+32,54.8,"Smolensk"
+33.2,54.9,"Dorogobouge"
+34.3,55.2,"Wixma"
+34.4,55.5,"Chjat"
+36,55.5,"Mojaisk"
+37.6,55.8,"Moscou"
+36.6,55.3,"Tarantino"
+36.5,55,"Malo-Jarosewii"
@@ -0,0 +1,52 @@
+"long","lat","survivors","direction","group"
+37.7,55.7,100000,"R",1
+37.5,55.7,98000,"R",1
+37,55,97000,"R",1
+36.8,55,96000,"R",1
+35.4,55.3,87000,"R",1
+34.3,55.2,55000,"R",1
+33.3,54.8,37000,"R",1
+32,54.6,24000,"R",1
+30.4,54.4,20000,"R",1
+29.2,54.3,20000,"R",1
+28.5,54.2,20000,"R",1
+28.3,54.3,20000,"R",1
+27.5,54.5,20000,"R",1
+26.8,54.3,12000,"R",1
+26.4,54.4,14000,"R",1
+25,54.4,8000,"R",1
+24.4,54.4,4000,"R",1
+24.2,54.4,4000,"R",1
+24.1,54.4,4000,"R",1
+28.7,55.5,33000,"R",2
+29.2,54.2,30000,"R",2
+28.5,54.1,30000,"R",2
+28.3,54.2,28000,"R",2
+24.6,55.8,6000,"R",3
+24.2,54.4,6000,"R",3
+24.1,54.4,6000,"R",3
+24,54.9,340000,"A",1
+24.5,55,340000,"A",1
+25.5,54.5,340000,"A",1
+26,54.7,320000,"A",1
+27,54.8,300000,"A",1
+28,54.9,280000,"A",1
+28.5,55,240000,"A",1
+29,55.1,210000,"A",1
+30,55.2,180000,"A",1
+30.3,55.3,175000,"A",1
+32,54.8,145000,"A",1
+33.2,54.9,140000,"A",1
+34.4,55.5,127100,"A",1
+35.5,55.4,100000,"A",1
+36,55.5,100000,"A",1
+37.6,55.8,100000,"A",1
+24,55.1,60000,"A",2
+24.5,55.2,60000,"A",2
+25.5,54.7,60000,"A",2
+26.6,55.7,40000,"A",2
+27.4,55.6,33000,"A",2
+28.7,55.5,33000,"A",2
+24,55.2,22000,"A",3
+24.5,55.3,22000,"A",3
+24.6,55.8,6000,"A",3
@@ -0,0 +1,84 @@
+---
+title: "Box plots"
+description: "Showing groups of distributions of single numeric variables"
+image: thumbnails/boxplot.svg
+categories: [basic, boxplot, distribution]
+order: 3
+---
+
+Boxplots are a popular way to display a summary of a distribution of single continuous variables.
+It is good to keep in mind boxplots hide the actual distribution of the data behind a summary, for example when the data is bi- or multi-modal.
+For every group, a boxplot displays the following 6 things:
+
+1. The 25^th^ percentile, or Q1, as the start of the box.
+2. The 50^th^ percentile, i.e. median or Q2, as a line across the box.
+3. The 75^th^ percentile, or Q3, as the end of the box. Together with Q1 we can compute the interquartile range: IQR = Q3 - Q1.
+4. The minimum data value or Q1 - 1.5 * IQR, whichever is larger. This is displayed as the lower whisker.
+5. The maximum data value or Q3 + 1.5 * IQR, whichever is smaller. This is displayed as the upper whisker.
+6. Outliers outside the whiskers, if present. These are drawn as individual points.
+
+## Code
+
+```{ggsql}
+VISUALISE species AS x, bill_len AS y FROM ggsql:penguins
+  DRAW boxplot
+```
+
+## Explanation
+
+* The `VISUALISE ... FROM ggsql:penguins` loads the built-in penguins dataset.
+* `species AS x` sets a categorical variable to separate different groups.
+* `bill_len AS y` sets the numeric variable to summarise.
+* `DRAW boxplot` gives instructions to draw the boxplot layer.
+
+## Variations
+
+### Dodging
+
+You can refine groups beyond the axis categorical variable, and the boxplots will be displayed in a dodged way.
+
+```{ggsql}
+VISUALISE species AS x, bill_len AS y, island AS fill FROM ggsql:penguins
+  DRAW boxplot
+```
+
+However, dodging might be unproductive or counterintuitive in some cases.
+For example if we double-encode groups, like `species` as both `x` *and* `fill` in the plot below, dodging looks bad.
+
+```{ggsql}
+VISUALISE species AS x, bill_len AS y, species AS fill FROM ggsql:penguins
+  DRAW boxplot
+```
+
+We can disable the dodging by setting `position => 'identity'`.
+
+```{ggsql}
+VISUALISE species AS x, bill_len AS y, species AS fill FROM ggsql:penguins
+  DRAW boxplot SETTING position => 'identity'
+```
+
+### Horizontal
+
+To draw the boxplots horizontally, simply swap the `x` and `y` mapping. 
+The orientation is detected automatically based on which variable is continuous and which is discrete.
+
+```{ggsql}
+VISUALISE bill_len AS x, species AS y, island AS fill FROM ggsql:penguins
+  DRAW boxplot
+```
+
+### With individual datapoints
+
+Because a boxplot is a summary, it may be a good idea to supplement them with individual datapoints so that you can't be accused of 'hiding' the distribution.
+The datapoints can be jittered by setting `position => 'jitter'`.
+When you do this, make sure to turn `outliers => false` to not draw the outlier points twice across the two layers.
+
+<!-- TODO: Figure out why the boxplot width is so small -->
+
+```{ggsql}
+VISUALISE species AS x, bill_len AS y FROM ggsql:penguins
+  DRAW point SETTING position => 'jitter'
+  DRAW boxplot SETTING outliers => false
+```
+
+
@@ -0,0 +1,94 @@
+---
+title: "Density plots"
+description: "Showing smooth distributions of single numeric variables"
+image: thumbnails/density-plot.svg
+categories: [basic, density, distribution]
+order: 3
+---
+
+Like histograms, density plots show the distribution of a numeric variable.
+Instead of binning, density plots use [kernel density estimation](https://en.wikipedia.org/wiki/Kernel_density_estimation) to estimate a smooth, continuous probability density.
+A kernel (like a Gaussian) is placed on each point and summed.
+The level of smoothing is controlled via the bandwidth which affects the width of the kernel.
+
+## Code
+
+The x-axis gives the value of the numerical variable, whereas the y-axis gives the estimated probability density.
+
+```{ggsql}
+VISUALISE bill_len AS x, species AS colour FROM ggsql:penguins
+  DRAW density
+```
+
+## Explanation
+
+* The `VISUALISE ... FROM ggsql:penguins` loads the built-in penguins dataset.
+* `bill_len AS x` sets the numeric variable to use for density estimation.
+* `species AS colour` sets implicit groups indicated by colour.
+* `DRAW density` gives instructions to draw the density layer.
+
+## Variations
+
+### Group contributions
+
+Using the density gives all groups equal area that integrates to 1.
+This masks differences between the sizes of groups.
+Instead of using density, one can use the `intensity` that also encompasses differences in group size.
+
+```{ggsql}
+VISUALISE bill_len AS x, species AS colour FROM ggsql:penguins
+  DRAW density REMAPPING intensity AS y
+```
+
+### Stacking
+
+Instead of having independent groups, the density can also be stacked.
+Note that stacking alone does not account for relative contributions per group.
+For that reason, you may want to show the intensity instead.
+
+```{ggsql}
+VISUALISE bill_len AS x, species AS colour FROM ggsql:penguins
+  DRAW density 
+    REMAPPING intensity AS y
+    SETTING position => 'stack'
+```
+
+### Annotation
+
+You can use the [rule](../../syntax/layer/type/rule.qmd) layer to display precomputed summaries, like the mean.
+
+<!-- TODO: This should be updated once we have aggregates working -->
+
+```{ggsql}
+WITH mean_data AS (
+  SELECT 
+    AVG(bill_len) AS bill_len, 
+    species 
+  FROM ggsql:penguins 
+  GROUP BY species
+)
+VISUALISE bill_len AS x, species AS colour FROM ggsql:penguins
+  DRAW density SETTING opacity => 0.3
+  DRAW rule MAPPING FROM mean_data
+```
+
+### Faceting
+
+Another way of comparing groups is by using facets to separate the groups into different panels.
+
+```{ggsql}
+VISUALISE bill_len AS x, species AS colour FROM ggsql:penguins
+  DRAW density
+  FACET species SETTING ncol => 1
+```
+
+### Relation to violin plots
+
+Conceptually, violin plots also display densities. 
+The similarity becomes clearer if you make a ridgeline plot by displaying the violin density on a single side.
+The plot below is essentially showing the same thing as the plot above, but gathered in a single panel.
+
+```{ggsql}
+VISUALISE bill_len AS x, species AS y, species AS colour FROM ggsql:penguins
+  DRAW violin SETTING side => 'top', width => 2
+```
@@ -0,0 +1,56 @@
+---
+title: "Heatmap"
+description: "Arranging tiles on a grid"
+image: thumbnails/violin-plot.svg
+categories: [basic, heatmap]
+order: 3
+---
+
+A heatmap visusalised data values as colors in a grid layout. 
+It makes it easy to see patterns and relationships through color intensity.
+It works best with discrete or ordinal arrangements.
+
+## Code
+
+```{ggsql}
+VISUALISE Day AS x, Month AS y, Temp AS fill FROM ggsql:airquality
+  DRAW rect
+```
+
+## Explanation
+
+* The `VISUALISE ... FROM ggsql:airquality` loads the built-in air quality dataset.
+* `Day AS x, Month AS y` defines a 2D grid 'map'. The default width and height of each cell is 1. Because these variables are contiguous whole numbers, this creates a grid.
+* `Temp AS fill` declares the 'heat' variable to display as colour intensity.
+* `DRAW rect` gives instructions to draw a rectangle layer.
+
+## Variations
+
+As a stylistic choice, you can set the cells to be opaque without borders.
+
+```{ggsql}
+VISUALISE Month AS y, Day AS x, Temp AS fill FROM ggsql:airquality
+  DRAW rect
+    SETTING stroke => null, opacity => 1
+```
+
+You can change the color by adapting the scale.
+
+```{ggsql}
+VISUALISE Month AS y, Day AS x, Temp AS fill FROM ggsql:airquality
+  DRAW rect
+  SCALE fill TO magma 
+    SETTING reverse => true
+```
+
+If you have centered data, you may want to use a divergent colour scale. It is important to the two extremes in `FROM` symmetrically around the midpoint.
+
+```{ggsql}
+SELECT *, 
+  Temp * 1.0 - AVG(Temp) OVER (PARTITION BY Month) AS centered 
+FROM ggsql:airquality
+
+VISUALISE Month AS y, Day AS x, centered AS fill
+  DRAW rect
+  SCALE fill FROM [-20, 20] TO vik
+```