|
| 1 | +--- |
| 2 | +title: "Boxplot" |
| 3 | +--- |
| 4 | +> Layers are declared with the [`DRAW` clause](../clause/draw.qmd). Read the documentation for this clause for a thorough description of how to use it. |
| 5 | +
|
| 6 | +Boxplots display a summary of a continuous distribution. In the style of Tukey, it displays the median, two hinges and two whiskers as well as outlying points. |
| 7 | + |
| 8 | +## Aesthetics |
| 9 | +The following aesthetics are recognised by the boxplot layer. |
| 10 | + |
| 11 | +### Required |
| 12 | +* `x`: Position on the x-axis |
| 13 | +* `y`: Position on the y-axis |
| 14 | + |
| 15 | +### Optional |
| 16 | +* `stroke`: The colour of the box contours, whiskers, median line and outliers. |
| 17 | +* `fill`: The colour of the box interior. |
| 18 | +* `colour`: Shorthand for setting `stroke` and `fill` simultaneously. Note that the median line will have bad visibility if `stroke` and `fill` are the same. |
| 19 | +* `opacity`: The opacity of the box interior. |
| 20 | +* `linewidth` The width of the box outline, whiskers, median line and outlier stroke. |
| 21 | +* `linetype` The linetype of the box outline, whiskers, median line and outlier stroke. |
| 22 | +* `size` The absolute size of outlier points. |
| 23 | +* `shape` The shape of outlier points. |
| 24 | + |
| 25 | +## Settings |
| 26 | +* `outliers`: Whether to display outliers as points. Defaults to `true`. |
| 27 | +* `coef`: A number indicating the length of the whiskers as a multiple of the interquartile range (IQR). Defaults to `1.5`. |
| 28 | +* `width`: Relative width of the boxes. Defaults to `0.9`. |
| 29 | + |
| 30 | +## Data transformation |
| 31 | +Per group, data will be divided into 4 quartiles and summary statistics will be derived from their extremes. |
| 32 | +Because number of observations per quartile may differ by one, the result of this approach may slightly differ from a pure quantile-based approach. |
| 33 | +The central line represents the median. |
| 34 | +The boxes are displayed from the 25th up to the 75th percentiles. |
| 35 | +The whiskers are calculated from the 25th/75th percentiles +/- the IQR times `coef`, but no more extreme than the data extrema. |
| 36 | +Observations are considered outliers when they are more extreme than the whiskers. |
| 37 | + |
| 38 | +### Calculated statistics |
| 39 | + |
| 40 | +* `type`: A string representing the type of metric (`upper`,`lower`,`q1`,`q3`,`median`,`outlier`). |
| 41 | +* `value`: The value corresponding to the metric. |
| 42 | + |
| 43 | +### Default remapping |
| 44 | + |
| 45 | +* `value AS y`: By default the values are displayed along the y-axis. |
| 46 | + |
| 47 | +### Examples |
| 48 | + |
| 49 | +A basic boxplot showing the bill length per species. |
| 50 | + |
| 51 | +```{ggsql} |
| 52 | +VISUALISE FROM ggsql:penguins |
| 53 | +DRAW boxplot |
| 54 | + MAPPING species AS x, bill_len AS y |
| 55 | +``` |
| 56 | + |
| 57 | +Additional groups will dodge the boxplots. |
| 58 | + |
| 59 | +```{ggsql} |
| 60 | +VISUALISE FROM ggsql:penguins |
| 61 | +DRAW boxplot |
| 62 | + MAPPING |
| 63 | + species AS x, |
| 64 | + bill_len AS y, |
| 65 | + island AS stroke |
| 66 | +``` |
| 67 | + |
| 68 | +Narrow boxes by shrinking the `width` parameter. |
| 69 | + |
| 70 | +```{ggsql} |
| 71 | +VISUALISE FROM ggsql:penguins |
| 72 | +DRAW boxplot |
| 73 | + MAPPING species AS x, bill_len AS y |
| 74 | + SETTING width => 0.2 |
| 75 | +``` |
| 76 | + |
| 77 | +Consider more observations as outliers by setting a smaller `coef`: |
| 78 | + |
| 79 | +```{ggsql} |
| 80 | +VISUALISE FROM ggsql:penguins |
| 81 | +DRAW boxplot |
| 82 | + MAPPING species AS x, bill_len AS y |
| 83 | + SETTING coef => 0.1 |
| 84 | +``` |
0 commit comments