Skip to content

Commit 41846fe

Browse files
teunbrandclaude
andauthored
Boxplots (#88)
* initial attempt by claude to support list-columns * initial boxplot computation * rework computation to be in long-format * boxplot writer * Add comprehensive test suite for boxplot geom - 34 tests covering SQL generation, orientation detection, parameter validation, and GeomTrait implementation - SQL snapshot tests using inline string comparison (no external dependencies) - Tests for all orientation scenarios (discrete/continuous, explicit params, ambiguous cases) - Parameter validation tests for coef and outliers parameters - Complete GeomTrait interface coverage Co-Authored-By: Claude (us.anthropic.claude-sonnet-4-5-20250929-v1:0) <noreply@anthropic.com> * Add tests for render_boxplot() in Vega-Lite writer - test_boxplot_vertical_with_outliers: Tests vertical boxplots with outlier handling, verifies 5-layer output (outliers + 4 box components), outlier data structure, and summary dataset - test_boxplot_horizontal_with_grouping: Tests horizontal orientation with multi-column grouping and dodging logic (note: orientation detection not fully implemented upstream yet) Both tests verify multi-layer composition, dataset separation, encoding correctness, and proper pivoting of boxplot statistics. Co-Authored-By: Claude (us.anthropic.claude-sonnet-4-5-20250929-v1:0) <noreply@anthropic.com> * Revert "initial attempt by claude to support list-columns" This reverts commit 6bc845a. * add docs for boxplot * inactivate outdated part of test * translate linetype linewidth * better support for point/line aesthetics * forgot name change * remove orientation logic * prognosticate @thomasp85's scales PR * remove outdated test assumptions --------- Co-authored-by: Claude (us.anthropic.claude-sonnet-4-5-20250929-v1:0) <noreply@anthropic.com>
1 parent bbbb2a0 commit 41846fe

7 files changed

Lines changed: 1153 additions & 10 deletions

File tree

Cargo.toml

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -30,6 +30,7 @@ csscolorparser = "0.8.1"
3030

3131
# Data processing
3232
polars = { version = "0.52", features = ["lazy", "sql", "ipc"] }
33+
polars-ops = { version = "0.52", features = ["pivot"] }
3334

3435
# Readers
3536
duckdb = { version = "1.4", features = ["bundled", "vtab-arrow"] }

doc/syntax/index.qmd

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -20,6 +20,7 @@ There are many different layers to choose from when visualising your data. Some
2020
- [`path`](layer/path.qmd) is like `line` above but does not sort the data but plot it according to its own order
2121
- [`bar`](layer/bar.qmd) creates a bar chart, optionally calculating y from the number of records in each bar
2222
- [`histogram`](layer/histogram.qmd) bins the data along the x axis and produces a bar for each bin showing the number of records in it
23+
- [`boxplot`](layer/boxplot.qmd) displays continuous variables as 5-number summaries
2324

2425
## Scales
2526

doc/syntax/layer/boxplot.qmd

Lines changed: 84 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,84 @@
1+
---
2+
title: "Boxplot"
3+
---
4+
> Layers are declared with the [`DRAW` clause](../clause/draw.qmd). Read the documentation for this clause for a thorough description of how to use it.
5+
6+
Boxplots display a summary of a continuous distribution. In the style of Tukey, it displays the median, two hinges and two whiskers as well as outlying points.
7+
8+
## Aesthetics
9+
The following aesthetics are recognised by the boxplot layer.
10+
11+
### Required
12+
* `x`: Position on the x-axis
13+
* `y`: Position on the y-axis
14+
15+
### Optional
16+
* `stroke`: The colour of the box contours, whiskers, median line and outliers.
17+
* `fill`: The colour of the box interior.
18+
* `colour`: Shorthand for setting `stroke` and `fill` simultaneously. Note that the median line will have bad visibility if `stroke` and `fill` are the same.
19+
* `opacity`: The opacity of the box interior.
20+
* `linewidth` The width of the box outline, whiskers, median line and outlier stroke.
21+
* `linetype` The linetype of the box outline, whiskers, median line and outlier stroke.
22+
* `size` The absolute size of outlier points.
23+
* `shape` The shape of outlier points.
24+
25+
## Settings
26+
* `outliers`: Whether to display outliers as points. Defaults to `true`.
27+
* `coef`: A number indicating the length of the whiskers as a multiple of the interquartile range (IQR). Defaults to `1.5`.
28+
* `width`: Relative width of the boxes. Defaults to `0.9`.
29+
30+
## Data transformation
31+
Per group, data will be divided into 4 quartiles and summary statistics will be derived from their extremes.
32+
Because number of observations per quartile may differ by one, the result of this approach may slightly differ from a pure quantile-based approach.
33+
The central line represents the median.
34+
The boxes are displayed from the 25th up to the 75th percentiles.
35+
The whiskers are calculated from the 25th/75th percentiles +/- the IQR times `coef`, but no more extreme than the data extrema.
36+
Observations are considered outliers when they are more extreme than the whiskers.
37+
38+
### Calculated statistics
39+
40+
* `type`: A string representing the type of metric (`upper`,`lower`,`q1`,`q3`,`median`,`outlier`).
41+
* `value`: The value corresponding to the metric.
42+
43+
### Default remapping
44+
45+
* `value AS y`: By default the values are displayed along the y-axis.
46+
47+
### Examples
48+
49+
A basic boxplot showing the bill length per species.
50+
51+
```{ggsql}
52+
VISUALISE FROM ggsql:penguins
53+
DRAW boxplot
54+
MAPPING species AS x, bill_len AS y
55+
```
56+
57+
Additional groups will dodge the boxplots.
58+
59+
```{ggsql}
60+
VISUALISE FROM ggsql:penguins
61+
DRAW boxplot
62+
MAPPING
63+
species AS x,
64+
bill_len AS y,
65+
island AS stroke
66+
```
67+
68+
Narrow boxes by shrinking the `width` parameter.
69+
70+
```{ggsql}
71+
VISUALISE FROM ggsql:penguins
72+
DRAW boxplot
73+
MAPPING species AS x, bill_len AS y
74+
SETTING width => 0.2
75+
```
76+
77+
Consider more observations as outliers by setting a smaller `coef`:
78+
79+
```{ggsql}
80+
VISUALISE FROM ggsql:penguins
81+
DRAW boxplot
82+
MAPPING species AS x, bill_len AS y
83+
SETTING coef => 0.1
84+
```

src/Cargo.toml

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -29,6 +29,7 @@ csscolorparser.workspace = true
2929

3030
# Data processing
3131
polars.workspace = true
32+
polars-ops.workspace = true
3233

3334
# Readers
3435
duckdb = { workspace = true, optional = true }

src/lib.rs

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -576,7 +576,7 @@ mod integration_tests {
576576
assert_eq!(vl_spec["layer"].as_array().unwrap().len(), 2);
577577

578578
// Verify the color aesthetic is mapped to layer-indexed synthetic columns
579-
let layer0_color = &vl_spec["layer"][0]["encoding"]["linetype"];
579+
let layer0_color = &vl_spec["layer"][0]["encoding"]["strokeDash"];
580580
let layer1_color = &vl_spec["layer"][1]["encoding"]["shape"];
581581

582582
// Constants should be field-mapped to layer-indexed columns

0 commit comments

Comments
 (0)