Skip to content

Remove unsupported education source, add keywords, and fix formatting#98

Merged
brikin01 merged 3 commits into
mainfrom
clean-up-education-sources
Jun 25, 2026
Merged

Remove unsupported education source, add keywords, and fix formatting#98
brikin01 merged 3 commits into
mainfrom
clean-up-education-sources

Conversation

@brikin01

@brikin01 brikin01 commented Jun 25, 2026

Copy link
Copy Markdown
Collaborator
  • Cleaned up embedding-generation/vector-db-sources.csv so every row follows the six-column schema and malformed comma-shifted rows parse correctly
  • Removed unsupported education resource rows, mainly raw .pptx / .ipynb sources that the current parser could chunk poorly (support follow-up added as STESOL-542)
  • Added keywords for transcript-backed education videos and remaining educational resource rows (thus fixing failing unit tests)
  • Fixed visible encoding artifacts (e.g. μVision)

Support for chunking video transcript content will be done as part of STESOL-526

Copilot AI review requested due to automatic review settings June 25, 2026 16:58

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copilot wasn't able to review any files in this pull request.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 1 out of 2 changed files in this pull request and generated 2 comments.

Comment thread embedding-generation/tests/test_vector_db_sources.py
Comment thread embedding-generation/tests/test_vector_db_sources.py
@brikin01 brikin01 requested a review from NeethuESim June 25, 2026 17:26

@NeethuESim NeethuESim left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@brikin01 brikin01 merged commit c59a731 into main Jun 25, 2026
3 checks passed
@brikin01 brikin01 deleted the clean-up-education-sources branch June 25, 2026 17:53
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants