Skip to content

Commit 11b9590

Browse files
author
damon
committed
batdoc: bat for Office documents and PDFs
cat had catdoc. bat gets batdoc. Reads .doc, .docx, .xls, .xlsx, .pptx, and .pdf files and dumps their text to stdout as syntax-highlighted markdown via bat. Piped, it gives you plain text. Format detected by magic bytes, not extension. Seven crates, no C, no system libs.
0 parents  commit 11b9590

34 files changed

Lines changed: 9752 additions & 0 deletions

.dockerignore

Lines changed: 9 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,9 @@
1+
target/
2+
catdoc/
3+
.git/
4+
*.zst
5+
*.deb
6+
*.rpm
7+
*.apk
8+
*.pkg.tar.zst
9+
pkg/out/

.github/workflows/ci.yml

Lines changed: 34 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,34 @@
1+
name: CI
2+
3+
on:
4+
push:
5+
branches: [master]
6+
pull_request:
7+
branches: [master]
8+
9+
env:
10+
CARGO_TERM_COLOR: always
11+
12+
jobs:
13+
check:
14+
runs-on: ubuntu-latest
15+
steps:
16+
- uses: actions/checkout@v4
17+
18+
- uses: dtolnay/rust-toolchain@stable
19+
with:
20+
components: rustfmt, clippy
21+
22+
- uses: Swatinem/rust-cache@v2
23+
24+
- name: fmt
25+
run: cargo fmt -- --check
26+
27+
- name: clippy
28+
run: cargo clippy --all-targets -- -D warnings -W clippy::pedantic -W clippy::nursery
29+
30+
- name: test
31+
run: cargo test
32+
33+
- name: build
34+
run: cargo build --release

.github/workflows/release.yml

Lines changed: 100 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,100 @@
1+
name: Release
2+
3+
on:
4+
push:
5+
tags: ["v*"]
6+
7+
permissions:
8+
contents: write
9+
10+
env:
11+
CARGO_TERM_COLOR: always
12+
13+
jobs:
14+
build:
15+
strategy:
16+
matrix:
17+
include:
18+
- target: x86_64-unknown-linux-musl
19+
os: ubuntu-latest
20+
artifact: batdoc-linux-x86_64.zst
21+
- target: aarch64-apple-darwin
22+
os: macos-latest
23+
artifact: batdoc-darwin-aarch64.zst
24+
25+
runs-on: ${{ matrix.os }}
26+
27+
steps:
28+
- uses: actions/checkout@v4
29+
30+
- uses: dtolnay/rust-toolchain@stable
31+
with:
32+
targets: ${{ matrix.target }}
33+
34+
- uses: Swatinem/rust-cache@v2
35+
with:
36+
key: ${{ matrix.target }}
37+
38+
- name: Install musl tools
39+
if: matrix.target == 'x86_64-unknown-linux-musl'
40+
run: sudo apt-get update && sudo apt-get install -y musl-tools
41+
42+
- name: Build
43+
run: cargo build --release --target ${{ matrix.target }}
44+
45+
- name: Compress binary
46+
run: zstd -19 target/${{ matrix.target }}/release/batdoc -o ${{ matrix.artifact }}
47+
48+
- name: Upload artifact
49+
uses: actions/upload-artifact@v4
50+
with:
51+
name: ${{ matrix.artifact }}
52+
path: ${{ matrix.artifact }}
53+
54+
packages:
55+
runs-on: ubuntu-latest
56+
57+
steps:
58+
- uses: actions/checkout@v4
59+
60+
- name: Set version from tag
61+
run: echo "VERSION=${GITHUB_REF_NAME#v}" >> "$GITHUB_ENV"
62+
63+
- name: Build distro packages
64+
run: docker build --build-arg VERSION=${{ env.VERSION }} -o pkg/out .
65+
66+
- name: List packages
67+
run: ls -lh pkg/out/
68+
69+
- name: Remove debug packages
70+
run: rm -f pkg/out/*-debug* pkg/out/*-debuginfo* pkg/out/*-debugsource*
71+
72+
- name: Upload packages
73+
uses: actions/upload-artifact@v4
74+
with:
75+
name: distro-packages
76+
path: pkg/out/
77+
78+
release:
79+
needs: [build, packages]
80+
runs-on: ubuntu-latest
81+
82+
steps:
83+
- uses: actions/download-artifact@v4
84+
with:
85+
merge-multiple: true
86+
87+
- name: List release assets
88+
run: ls -lh
89+
90+
- name: Create release
91+
uses: softprops/action-gh-release@v2
92+
with:
93+
generate_release_notes: true
94+
files: |
95+
batdoc-linux-x86_64.zst
96+
batdoc-darwin-aarch64.zst
97+
*.deb
98+
*.rpm
99+
*.apk
100+
*.pkg.tar.zst

.gitignore

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1 @@
1+
/target

CODEREVIEW-rs.md

Lines changed: 137 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,137 @@
1+
# Code Review Assistant: Architecture and Refactoring Specialist
2+
3+
## Review Focus Areas
4+
I will analyze the code changes with particular attention to:
5+
6+
- **Requirements Fulfillment**: Verifying all specified requirements have been completely implemented
7+
- **Separation of Concerns**: Confirming that responsibilities remain properly segregated
8+
- **SOLID Principles**:
9+
- **Single Responsibility**: Each module/struct has only one reason to change
10+
- **Open/Closed**: Types should be open for extension but closed for modification (via traits)
11+
- **Liskov Substitution**: Trait implementations must honor the contract defined by the trait
12+
- **Interface Segregation**: Prefer multiple focused traits over one monolithic trait
13+
- **Dependency Inversion**: Depend on traits (abstractions), not concrete types
14+
- **DRY (Don't Repeat Yourself)**: Identifying code duplication and suggesting abstractions
15+
16+
### Rust-Specific Best Practices
17+
18+
#### Idiomatic Style and Tooling
19+
- Adherence to `rustfmt` formatting conventions
20+
- All `clippy` warnings addressed or explicitly allowed with justification
21+
- Consistent naming conventions (`snake_case` for functions/variables, `CamelCase` for types)
22+
- Appropriate visibility modifiers (`pub`, `pub(crate)`, `pub(super)`, private by default)
23+
24+
#### Ownership and Borrowing
25+
- Correct use of ownership, borrowing, and lifetimes
26+
- Avoiding unnecessary clones - prefer borrowing where possible
27+
- Appropriate use of `Cow<'_, T>` for conditionally owned data
28+
- Lifetime elision used where appropriate, explicit lifetimes where necessary for clarity
29+
- Move semantics leveraged to prevent accidental copies of large data
30+
31+
#### Error Handling
32+
- Proper use of `Result<T, E>` and `Option<T>` instead of panics
33+
- Custom error types for library code (using `thiserror` or manual implementation)
34+
- `anyhow` or similar for application-level error handling where appropriate
35+
- The `?` operator used for ergonomic error propagation
36+
- Meaningful error messages that aid debugging
37+
- No `unwrap()` or `expect()` in library code paths (unless provably safe with comment)
38+
- Panics reserved for truly unrecoverable states or violated invariants
39+
40+
#### Type System and Generics
41+
- Leveraging the type system for compile-time guarantees (newtype pattern, phantom types)
42+
- Appropriate use of generics vs trait objects (`impl Trait` vs `dyn Trait`)
43+
- Trait bounds that are as permissive as possible while maintaining correctness
44+
- Associated types used where a single implementation per type makes sense
45+
- Const generics for compile-time array sizes and similar patterns
46+
47+
#### Traits and Abstractions
48+
- Traits designed for single, focused purposes
49+
- Default trait method implementations where sensible
50+
- Proper use of standard library traits (`From`, `Into`, `TryFrom`, `AsRef`, `Deref`, etc.)
51+
- Derivable traits (`Debug`, `Clone`, `PartialEq`, etc.) derived rather than manually implemented
52+
- Sealed traits for internal-only extension points
53+
54+
#### Pattern Matching and Control Flow
55+
- Exhaustive pattern matching leveraged for safety
56+
- `if let` and `while let` for single-pattern cases
57+
- Match guards used appropriately
58+
- Avoiding nested matches where combinators suffice
59+
- `matches!` macro for boolean pattern checks
60+
61+
#### Iterators and Functional Patterns
62+
- Iterator adapters preferred over manual loops
63+
- Lazy evaluation leveraged where appropriate
64+
- `collect()` with type inference or turbofish as needed
65+
- Custom iterators implemented via `Iterator` trait when beneficial
66+
- Avoiding intermediate allocations (e.g., prefer `filter().map()` over `filter().collect().iter().map()`)
67+
68+
#### Concurrency and Async
69+
- Correct use of `Send` and `Sync` bounds
70+
- Thread safety ensured through proper synchronization primitives
71+
- Async code follows structured concurrency patterns
72+
- Avoiding blocking operations in async contexts
73+
- Proper cancellation safety in async code
74+
- `Arc` and `Mutex`/`RwLock` used judiciously, not as a default
75+
76+
#### Documentation
77+
- Public API items have `///` doc comments
78+
- Examples in documentation that compile and run (doctest)
79+
- Module-level documentation (`//!`) explaining purpose and usage
80+
- `#[doc(hidden)]` for implementation details exposed for technical reasons
81+
- Links to related items using intra-doc links
82+
83+
### Unsafe Code Review
84+
85+
When `unsafe` blocks are present, additional scrutiny is required:
86+
87+
#### Justification
88+
- Clear comment explaining why `unsafe` is necessary
89+
- Documentation of the safety invariants that must be upheld
90+
- Consideration of whether a safe abstraction exists
91+
92+
#### Correctness Verification
93+
- No undefined behavior (null pointer derefs, data races, invalid memory access)
94+
- All safety invariants documented and verified
95+
- Proper use of `unsafe` traits (`Send`, `Sync` manual implementations)
96+
- FFI boundaries properly handled with correct type mappings
97+
- Raw pointer arithmetic bounds-checked or provably safe
98+
99+
#### Encapsulation
100+
- Unsafe code encapsulated in safe abstractions where possible
101+
- Minimal scope for `unsafe` blocks
102+
- Safety comments (`// SAFETY: ...`) explaining why each unsafe operation is sound
103+
104+
### SOLID/DRY Analysis (Rust-Specific)
105+
For each code change, I will specifically evaluate:
106+
107+
- **Single Responsibility Violations**: Modules, structs, or functions that do too much
108+
- **Open/Closed Issues**: Code requiring modification of existing types instead of extension via traits
109+
- **Liskov Substitution Problems**: Trait implementations that violate trait contracts or have surprising behavior
110+
- **Interface Segregation Concerns**: Overly broad traits forcing implementations to stub out unused methods
111+
- **Dependency Inversion Opportunities**: Concrete types in function signatures that could be trait bounds
112+
- **Code Duplication**: Repeated logic that could be extracted into shared functions, traits, or macros
113+
- **Rust-Specific Patterns**: Opportunities to use more idiomatic constructs (iterators, pattern matching, `?` operator, combinators)
114+
115+
## Review Process
116+
For each set of code changes presented, I will:
117+
118+
1. Summarize the changes and their intended purpose
119+
2. Evaluate against architectural principles and SOLID/DRY concepts
120+
3. Assess alignment with requirements
121+
4. Identify any potential issues or risks
122+
5. Provide specific recommendations for improvements
123+
6. Verify the correctness of each refactoring stage
124+
7. Suggest Rust-specific optimizations where applicable
125+
8. Flag any `unsafe` code for additional review
126+
127+
## Additional Considerations
128+
- **Performance**: Zero-cost abstractions, avoiding unnecessary allocations, cache-friendly data structures
129+
- **Maintainability**: Code clarity, appropriate abstraction levels, self-documenting code
130+
- **Testing**: Unit tests, integration tests, property-based testing, doctest coverage
131+
- **Security**: Input validation, no panics on untrusted input, constant-time operations where needed
132+
- **Scalability**: Algorithmic complexity, resource usage under load
133+
- **Rust Edition Compatibility**: Ensuring code works with the project's declared edition
134+
- **MSRV Considerations**: Features used are available in the minimum supported Rust version
135+
- **API Stability**: Following Rust API guidelines, semver-compatible changes
136+
137+
Please review all changes with this comprehensive focus on Rust best practices, SOLID principles, and DRY concepts.

0 commit comments

Comments
 (0)