You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
A quick overview of total disk usage and row counts across both databases.
57
+
58
+
```sql
59
+
SELECT
60
+
if(database ='', '=== TOTAL ===', database) AS database_name,
61
+
formatReadableSize(sum(bytes_on_disk)) AS total_size_on_disk,
62
+
sum(rows) AS total_rows
63
+
FROMsystem.parts
64
+
WHERE active AND (database IN ('default', 'system'))
65
+
GROUP BY database
66
+
WITH ROLLUP
67
+
ORDER BY total_rows ASC;
68
+
```
69
+
54
70
### Top Tables by Disk Size
55
71
56
72
This query identifies which tables are consuming the most disk space.
@@ -145,15 +161,15 @@ The following queries help you understand how data is distributed over time and
145
161
146
162
Choose the appropriate query based on your needs:
147
163
148
-
-**[Compressed Size by Month (Fast)](#compressed-size-by-month-fast)** – Actual compressed disk usage and row counts by month
149
-
-**[Row Count by Day (Fast)](#by-day-row-count-fast)** – Number of records by day
150
-
-**[Uncompressed Size by Day (Heavy)](#uncompressed-size-by-day-heavy)** – Decompresses data to calculate approximate size. Not actual disk usage – use only for comparing relative data volume between days
164
+
-**[On-Disk Size by Month (Fast)](#on-disk-size-by-month-fast)** – Actual compressed disk usage and row counts by month
165
+
-**[Row Count by Day (Fast)](#row-count-by-day-fast)** – Number of records by day
166
+
-**[Approximate Size by Day (Heavy)](#approximate-size-by-day-heavy)** – Decompresses data to calculate approximate size. Not actual disk usage – use only for comparing relative data volume between days
151
167
152
168
:::note
153
-
Per-day compressed size is not available because ClickHouse partitions data by month (`PARTITION BY toYYYYMM()`).
169
+
Per-day compressed size is not available because langfuse partitions data by month (`PARTITION BY toYYYYMM()`).
154
170
:::
155
171
156
-
#### Compressed Size by Month (Fast)
172
+
#### On-Disk Size by Month (Fast)
157
173
158
174
Shows actual compressed disk usage by month. Reads partition metadata from `system.parts`.
The `blob_storage_file_log` table does not have `PARTITION BY` in its schema, so compressed size by month cannot be queried from `system.parts`. Use [Row Count by Day](#by-day-row-count-fast) to analyze this table's data distribution.
207
+
The `blob_storage_file_log` table does not have `PARTITION BY` in its schema, so compressed size by month cannot be queried from `system.parts`. Use [Row Count by Day (Fast)](#row-count-by-day-fast) to analyze this table's data distribution.
192
208
193
209
</TabItem>
194
210
<TabItemvalue="system_logs"label="System Logs">
@@ -233,7 +249,7 @@ ORDER BY partition ASC;
233
249
</TabItem>
234
250
</Tabs>
235
251
236
-
#### By Day: Row Count (Fast)
252
+
#### Row Count by Day (Fast)
237
253
238
254
Shows row count per day. Executes instantly by reading indices only.
239
255
@@ -302,65 +318,135 @@ ORDER BY day ASC;
302
318
</TabItem>
303
319
</Tabs>
304
320
305
-
#### Uncompressed Size by Day (Heavy)
321
+
#### Approximate Size by Day (Heavy)
322
+
323
+
Estimates approximate on-disk size per day using the table's real compression ratio from `system.parts` and the size of the main text fields. The result is slightly lower than actual disk usage because not all columns are measured.
The `blob_storage_file_log` table does not have `PARTITION BY` in its schema, so uncompressed size by day cannot be queried from `system.parts`. Use [Row Count by Day](#by-day-row-count-fast) to analyze this table's data distribution.
388
+
The `blob_storage_file_log` table does not have `PARTITION BY` in its schema, so approximate size by day cannot be queried from `system.parts`. Use [Row Count by Day (Fast)](#row-count-by-day-fast) to analyze this table's data distribution.
337
389
338
390
</TabItem>
339
391
<TabItemvalue="system_logs"label="System Logs">
340
392
341
393
You can replace `query_log` with a table from [this list](#system-log-tables).
342
394
343
-
```sql {5}
395
+
```sql {6,14}
396
+
WITH table_compression AS (
397
+
SELECT
398
+
`table`,
399
+
sum(data_uncompressed_bytes) /sum(data_compressed_bytes) AS ratio
400
+
FROMsystem.parts
401
+
WHERE active AND database ='system'AND`table`='query_log'
402
+
GROUP BY`table`
403
+
),
404
+
daily_payload AS (
405
+
SELECT
406
+
event_date AS day,
407
+
count() AS rows,
408
+
sum(length(query)) AS raw_text_bytes
409
+
FROMsystem.query_log
410
+
GROUP BY day
411
+
)
344
412
SELECT
345
-
event_date AS day,
346
-
count() AS rows,
347
-
formatReadableSize(sum(length(toString(query)))) AS approx_size
348
-
FROMsystem.query_log
349
-
GROUP BY day
350
-
ORDER BY day ASC;
413
+
d.day,
414
+
d.rows,
415
+
formatReadableSize(d.raw_text_bytes) AS raw_text_size,
416
+
formatReadableSize(d.raw_text_bytes/c.ratio) AS estimated_disk_usage
Copy file name to clipboardExpand all lines: docs/admin/deployment/extensions/02-assistants-evaluation/03-deployment-prerequisites.md
+14Lines changed: 14 additions & 0 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -225,6 +225,20 @@ GRANT ALL ON SCHEMA public TO langfuse_admin;
225
225
226
226
To prevent disk overflow, configure [TTL](https://clickhouse.com/docs/guides/developer/ttl) policies in `values.yaml` to automatically remove old data. Default retention: 90 days.
227
227
228
+
### Recommended TTL by Usage
229
+
230
+
The table below provides recommended TTL values based on usage level, assuming the default **100 GB** ClickHouse disk size.
231
+
232
+
| Usage Level | Active Users | Est. Ingestion | Recommended TTL |
| High usage | 3,000–4,000 | ~40 GB/day | 2 days |
235
+
| Medium usage | ~1,500 | ~10 GB/day | 9 days |
236
+
| Low usage | < 500 | ~1 GB/day | 90 days |
237
+
238
+
:::note
239
+
If your deployment does not fit within the recommended TTL for 100 GB, either lower the TTL or increase the ClickHouse disk size. To measure your actual ingestion rate, see [Data distribution by time period](../../../configuration/extensions/assistants-evaluation/data-volume-maintenance#data-distribution-by-time-period).
240
+
:::
241
+
228
242
### Langfuse Tables
229
243
230
244
Set `retention.langfuse.enabled: true` in `values.yaml`. TTL is configured for the following tables:
0 commit comments