Transcoding: Standard physical scale?

Starting this formal discussion we've alluded to in the past meetings...

Right now I'm starting to [transcode](https://github.com/CodyCBakerPhD/pozu-transcode) all videos on EMBER and DANDI into a common encoding  & resolution space

This approach ends up being multi-purpose:
- Facilitates more efficient web streaming
  - This was the original motivation, given how many videos on DANDI simply cannot be played on the web
- A side effect is that random frame seeking through [Pozu](https://codycbakerphd.github.io/pozu/) is performing much better than it was on the original videos. This would no doubt have a similar effect on the SLEAP web app
- Another side effect is reduced storage size
  - I was able to take one chronic recording, **100 GB** H.264 encoding in MP4 container, and reduce it down to **10 GB** without super noticeable quality of rendering
    - (you can maybe tell one is higher res if played side by side but doubtful it would affect coarse-grain pose estimation)

[`sleap reencode`](https://github.com/talmolab/sleap-io/blob/acac75ecb5b28687fd015cd32cb3e9963a848212/sleap_io/io/cli.py#L6139-L6152) behaves very similarly to what I'm doing (with some modifications, I might be able to directly leverage it) but is more general in how it exposes flexible options:
- The MAJOR difference is `sleap reencode` _scales_ the video space (in ways it can be reversed; that is the pose labels in the lower res can theoretically be mapped back to the original pixel space) whereas mine is mostly non-reversible (technically for some cases the math could be done, but not generally)

My main scientific question at this point is stated as such: towards the goal of training species-specific foundation models, is it more important that the video data ought to be standardized into:
1. a common **resolution space** (so that same pixel coordinates map across all training data)
2. a common **physical space** (so that all pixels in the reencoded space represent the same physical scale; this no doubt requires additional metadata about centimeteres per pixel dimension in each video, which might possibly present in the videos themselves if items such as rulers are in the frame of view, or if the full arena is in view plus knowledge of the arena size)
3. some combination of **both** (with padding or scaling used to facilitate)?

I suppose we might not know for sure unless we try all three, but wanted to get people thoughts on the matter at this point

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Transcoding: Standard physical scale? #13

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Uh oh!

Transcoding: Standard physical scale? #13

Description

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions