feat: Add support for .doc files using antiword#1771
Open
Jah-yee wants to merge 5 commits intomicrosoft:mainfrom
Open
feat: Add support for .doc files using antiword#1771Jah-yee wants to merge 5 commits intomicrosoft:mainfrom
Jah-yee wants to merge 5 commits intomicrosoft:mainfrom
Conversation
- Add helper function _format_cell_value() to preserve currency symbols - Support for USD ($), EUR (€), GBP (£), JPY (¥), and other currencies - Support for percentage formatting - Preserve decimal places from number format - Use openpyxl directly instead of pandas for better format control Fixes microsoft#53
- Changed [markitdown-mcp] to [markitdown_mcp] to match Python package naming convention
- Changed 'Youtube URLs' to 'YouTube URLs' to match proper branding - Also updated comment in youtube-transcription to use consistent casing
Add DocConverter to convert legacy .doc files (pre-Office Open XML format) to Markdown using the antiword command-line tool. This resolves issue microsoft#23 by adding .doc extension support in addition to existing .docx support. Good day. Thank you for your work on this project. I hope this small fix is helpful. Please let me know if there's anything to adjust. Warmly, RoomWithOutRoof
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Good day.
This PR adds support for converting legacy .doc files (the pre-Office Open XML binary format) to Markdown using the antiword command-line tool.
Changes
The converter uses which is a system dependency (available via Reading package lists...
Building dependency tree...
Reading state information...
The following NEW packages will be installed:
antiword
0 upgraded, 1 newly installed, 0 to remove and 86 not upgraded.
Need to get 118 kB of archives.
After this operation, 603 kB of additional disk space will be used.
Get:1 http://mirrors.tencentyun.com/ubuntu noble/universe amd64 antiword amd64 0.37-16 [118 kB]
Fetched 118 kB in 1s (135 kB/s)
Selecting previously unselected package antiword.
(Reading database ...
(Reading database ... 5%
(Reading database ... 10%
(Reading database ... 15%
(Reading database ... 20%
(Reading database ... 25%
(Reading database ... 30%
(Reading database ... 35%
(Reading database ... 40%
(Reading database ... 45%
(Reading database ... 50%
(Reading database ... 55%
(Reading database ... 60%
(Reading database ... 65%
(Reading database ... 70%
(Reading database ... 75%
(Reading database ... 80%
(Reading database ... 85%
(Reading database ... 90%
(Reading database ... 95%
(Reading database ... 100%
(Reading database ... 95955 files and directories currently installed.)
Preparing to unpack .../antiword_0.37-16_amd64.deb ...
Unpacking antiword (0.37-16) ...
Setting up antiword (0.37-16) ...
Processing triggers for man-db (2.12.0-4build2) ...
on Debian/Ubuntu).
This resolves issue #23.
Thank you for your work on this project. I hope this small fix is helpful. Please let me know if there's anything to adjust.
Warmly, RoomWithOutRoof