Skip to content

Publish file sizes into both legacy and 64-bit content database columns #5987

Description

@rtibbles

This issue is not open for contribution. Visit Contributing guidelines to learn about the contributing process and how to find suitable issues.

Overview

#5974 widens Studio's own File.file_size to 64-bit so Studio can store files larger than 2.1 GB. Once stored, the publish/export step must emit those sizes into the content database in a form new Kolibri reads in full while leaving old Kolibri unaffected.

Publish writes both the legacy 32-bit file_size and a new 64-bit file_size_bigint column into the exported content database, writing NULL to the legacy column when a size exceeds 2.1 GB. New Kolibri (learningequality/kolibri#14879) reads file_size_bigint; old Kolibri reads the legacy column and skips sizes it could not represent anyway.

Complexity: Low
Target branch: hotfixes

Context

Studio generates the content database through a vendored copy of Kolibri's content schema, the kolibri_content app (kolibri_content/base_models.py). create_associated_file_objects (publish.py:661-682) writes LocalFile and File rows via those models.

Unlike Kolibri — which renames its column to file_size_bigint and eventually drops the legacy file_size (learningequality/kolibri#14879) — Studio's export model must carry both columns indefinitely: the legacy file_size for old Kolibri and file_size_bigint for new Kolibri. Both are physically present in every exported database.

MIN_SCHEMA_VERSION stays "1" (publish.py:66), so the dual-column export remains importable by every Kolibri that supports the current schema — the legacy column is what keeps that true.

The Change

Add a 64-bit file_size_bigint column to the kolibri_content export models (base_models.py:151) alongside the retained legacy file_size, mirroring the column Kolibri reads (learningequality/kolibri#14879).

In create_associated_file_objects (publish.py:661-682), write both columns on the LocalFile and File rows:

  • file_size_bigint ← the real size, always.
  • file_size ← the real size when it fits in 32 bits, else NULL.

The exported content database must advertise the schema version that carries file_size_bigint, so new Kolibri maps it.

Out of Scope

  • Widening Studio's own File.file_size storage column.
  • The channel published_size aggregate (publish.py:936).
  • Resumable upload changes.

Acceptance Criteria

General

  • The kolibri_content export models carry both the legacy 32-bit file_size and a nullable 64-bit file_size_bigint.
  • Publishing writes the real size to file_size_bigint on both LocalFile and File.
  • Publishing writes the real size to file_size when it fits in 32 bits, and NULL when it exceeds 2.1 GB.
  • The exported content database advertises the schema version that carries file_size_bigint; MIN_SCHEMA_VERSION stays "1".
  • A channel containing a >2.1 GB file publishes to a content database that imports into current Kolibri with the full size, and into pre-rename Kolibri with NULL for that file's size.

Testing

  • Test: publishing a file ≤2.1 GB writes the same value to both file_size and file_size_bigint.
  • Test: publishing a file >2.1 GB writes the real value to file_size_bigint and NULL to file_size.
  • Test: the exported database imports into current Kolibri with the full size populated.

References

  • contentcuration/contentcuration/utils/publish.pycreate_associated_file_objects, MIN_SCHEMA_VERSION
  • contentcuration/kolibri_content/base_models.pyfile_size in the export models

AI usage

I used Claude (Opus 4.8, via le-skills:writing-github-issues) to draft this issue. The design — dual-column export, skip-small-on-overflow, holding MIN_SCHEMA_VERSION so old Kolibri still imports — was mine; Claude located the relevant publish/export code and wrote it up. I edited throughout and cut over-specification.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No fields configured for Task.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions