Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
17 changes: 14 additions & 3 deletions .github/workflows/ci.yml
Original file line number Diff line number Diff line change
Expand Up @@ -19,6 +19,17 @@ jobs:
- name: Lint Dockerfile
run: docker run --rm -i hadolint/hadolint < Dockerfile

quickstart:
needs:
- hadolint
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v6
- name: Make quickstart verifier executable
run: chmod +x bin/quickstart-verify
- name: Verify quickstart compose contract
run: bin/quickstart-verify

ruby:
runs-on: ubuntu-latest
steps:
Expand Down Expand Up @@ -86,12 +97,12 @@ jobs:
run: |
git config --global user.name "github-actions[bot]"
git config --global user.email "github-actions[bot]@users.noreply.github.com"

make openapi
make openapi-client

git add public/openapi.yaml frontend/src/api/generated/

if git diff --staged --quiet; then
echo "No changes to commit"
else
Expand Down
115 changes: 15 additions & 100 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,106 +2,21 @@

# html2rss-web

`html2rss-web` serves RSS/JSON feeds from website sources using a Ruby (Roda) backend and a Preact frontend.
`html2rss-web` turns website sources into RSS/JSON feeds.

## Use This Repo For
## Quickstart

- Running a self-hosted `html2rss-web` instance with Docker Compose.
- Creating signed, per-account feed URLs through `POST /api/v1/feeds`.
- Local development inside the repository Dev Container.
Test drive the app with these steps:

## Quick Links
1. Download [docker-compose.quickstart.yml](./docker-compose.quickstart.yml)
2. `docker compose -f docker-compose.quickstart.yml up -d`
3. Open [`http://localhost:4000/`](http://localhost:4000/) in your browser
4. When prompted for a token, use `CHANGE_ME_ADMIN_TOKEN`

- Public docs + feed directory: https://html2rss.github.io
- Docker Hub image: https://hub.docker.com/r/html2rss/web
- OpenAPI file in this repo: [public/openapi.yaml](public/openapi.yaml)
- Contributor guide: [docs/README.md](docs/README.md)
- Discussions: https://github.com/orgs/html2rss/discussions
- Sponsor: https://github.com/sponsors/gildesmarais

## Architecture Snapshot

- Backend: Ruby + Roda (`app.rb`, `app/web/**`)
- Frontend: Preact + Vite (built assets served from `frontend/dist`)
- Feed extraction: `html2rss` gem
- Distribution baseline: `docker-compose.yml`

For detailed architecture and contributor rules, see [docs/README.md](docs/README.md).

## Trial Run (Docker Compose)

Prerequisite: Docker Engine + Docker Compose.

Run from the repository root:

```bash
BUILD_TAG="$(date +%F)" \
GIT_SHA="trial" \
HTML2RSS_SECRET_KEY="$(openssl rand -hex 32)" \
HEALTH_CHECK_TOKEN="$(openssl rand -hex 24)" \
BROWSERLESS_IO_API_TOKEN="trial-browserless-token" \
docker compose up -d
```

Then open:

- `http://localhost:4000/` (UI)
- `http://localhost:4000/api/v1` (API metadata)
- `http://localhost:4000/openapi.yaml` (OpenAPI document)

Stop with:

```bash
docker compose down
```

## Deploy With Docker Compose
> [!IMPORTANT]
> This is a first-run demo path, not a production-ready setup.

The checked-in [`docker-compose.yml`](docker-compose.yml) requires these environment variables for `html2rss-web`:

- `BUILD_TAG`
- `GIT_SHA`
- `HTML2RSS_SECRET_KEY`
- `HEALTH_CHECK_TOKEN`
- `BROWSERLESS_IO_API_TOKEN`

Optional runtime variables:

- `SENTRY_DSN`
- `SENTRY_ENABLE_LOGS` (defaults to `false`)

Example:

```bash
export HTML2RSS_SECRET_KEY="$(openssl rand -hex 32)"
export HEALTH_CHECK_TOKEN="replace-with-a-strong-token"
export BROWSERLESS_IO_API_TOKEN="replace-with-your-browserless-token"
export BUILD_TAG="local"
export GIT_SHA="$(git rev-parse --short HEAD 2>/dev/null || echo dev)"
export AUTO_SOURCE_ENABLED=true

docker compose up -d
```

## Runtime Behavior That Affects Operations

- In production, missing `HTML2RSS_SECRET_KEY` stops startup.
- `BUILD_TAG` and `GIT_SHA` are expected in production; missing values produce a startup warning.
- `POST /api/v1/feeds` requires a bearer token and only works when `AUTO_SOURCE_ENABLED=true`.
- `AUTO_SOURCE_ENABLED` defaults to `true` in development/test and `false` otherwise.
- Strategy support comes from `Html2rss::RequestService` (`faraday` and `browserless` availability is runtime-dependent).

## Stable Integration Entry Points

- OpenAPI: `/openapi.yaml` (or [`public/openapi.yaml`](public/openapi.yaml) in-repo)
- API metadata: `/api/v1`
- Feed creation endpoint: `POST /api/v1/feeds`
- Health endpoints: `/api/v1/health`, `/api/v1/health/ready`, `/api/v1/health/live`

For feed config authoring/validation, use the `html2rss` schema:

- https://github.com/html2rss/html2rss/blob/master/schema/html2rss-config.schema.json
- `html2rss schema`
Continue with the [Getting Started](https://html2rss.github.io/web-application/getting-started) and deployment guides for real setup.

## Development (Dev Container Only)

Expand All @@ -113,9 +28,9 @@ make dev
make ready
```

See [docs/README.md](docs/README.md) for contributor workflows, verification gates, and architectural constraints.

## Contributing
## Development and Contributing

- Project guidelines: https://html2rss.github.io/get-involved/contributing
- Repo contributor guide: [docs/README.md](docs/README.md)
- Contributing guidelines: https://html2rss.github.io/get-involved/contributing
- Docker image: https://hub.docker.com/r/html2rss/web
- Discussions: https://github.com/orgs/html2rss/discussions
- Sponsor: https://github.com/sponsors/gildesmarais
12 changes: 9 additions & 3 deletions Rakefile
Original file line number Diff line number Diff line change
Expand Up @@ -46,6 +46,11 @@ desc 'Build and run docker image/container, and send requests to it'
task :test do
current_dir = ENV.fetch('GITHUB_WORKSPACE', __dir__)
smoke_auto_source_enabled = ENV.fetch('SMOKE_AUTO_SOURCE_ENABLED', 'false')
default_smoke_health_token = 'docker-smoke-health-check-token-0123456789abcdef'
smoke_health_token = ENV.fetch('SMOKE_HEALTH_TOKEN', default_smoke_health_token)
default_smoke_api_token =
smoke_auto_source_enabled == 'true' ? 'docker-smoke-admin-token-0123456789abcdef' : 'CHANGE_ME_ADMIN_TOKEN'
smoke_api_token = ENV.fetch('SMOKE_API_TOKEN', default_smoke_api_token)
smoke_build_tag = ENV.fetch('SMOKE_BUILD_TAG', ENV.fetch('BUILD_TAG', 'docker-smoke'))
smoke_git_sha = ENV.fetch('SMOKE_GIT_SHA', ENV.fetch('GITHUB_SHA', ENV.fetch('GIT_SHA', 'docker-smoke')))
image_name = 'html2rss/web'
Expand All @@ -64,8 +69,9 @@ task :test do
'--env PUMA_LOG_CONFIG=1',
"--env BUILD_TAG=#{smoke_build_tag}",
"--env GIT_SHA=#{smoke_git_sha}",
'--env HEALTH_CHECK_TOKEN=CHANGE_ME_HEALTH_CHECK_TOKEN',
"--env HEALTH_CHECK_TOKEN=#{smoke_health_token}",
'--env HTML2RSS_SECRET_KEY=0123456789abcdef0123456789abcdef0123456789abcdef0123456789abcdef',
"--env HTML2RSS_ACCESS_TOKEN=#{smoke_api_token}",
"--env AUTO_SOURCE_ENABLED=#{smoke_auto_source_enabled}",
"--mount type=bind,source=#{current_dir}/config,target=/app/config",
'--name html2rss-web-test',
Expand All @@ -79,8 +85,8 @@ task :test do
Output.describe 'Running RSpec smoke suite against container'
smoke_env = {
'SMOKE_BASE_URL' => 'http://127.0.0.1:4000',
'SMOKE_HEALTH_TOKEN' => 'CHANGE_ME_HEALTH_CHECK_TOKEN',
'SMOKE_API_TOKEN' => 'CHANGE_ME_ADMIN_TOKEN',
'SMOKE_HEALTH_TOKEN' => smoke_health_token,
'SMOKE_API_TOKEN' => smoke_api_token,
'SMOKE_AUTO_SOURCE_ENABLED' => smoke_auto_source_enabled,
'RUN_DOCKER_SPECS' => 'true'
}
Expand Down
2 changes: 1 addition & 1 deletion app/web/api/v1/health.rb
Original file line number Diff line number Diff line change
Expand Up @@ -94,7 +94,7 @@ def bearer_token(request)

# @return [void]
def verify_configuration!
LocalConfig.yaml
LocalConfig.load_snapshot
rescue StandardError
raise Html2rss::Web::HealthCheckFailedError
end
Expand Down
108 changes: 75 additions & 33 deletions app/web/config/environment_validator.rb
Original file line number Diff line number Diff line change
Expand Up @@ -29,7 +29,6 @@ def validate_production_security!

validate_secret_key!
validate_account_configuration!
validate_build_metadata!
end

# @return [Boolean]
Expand Down Expand Up @@ -94,23 +93,79 @@ def validate_secret_key!
exit 1
end

# @return [void]
def validate_build_metadata!
return unless missing_build_metadata?

log_missing_build_metadata!
warn_lines(*missing_build_metadata_warning_lines)
nil
end

def validate_account_configuration!
accounts = AccountManager.accounts
validate_account_token_shapes!(accounts)
validate_health_check_token!(accounts)
validate_create_feed_token!(accounts)
weak_tokens = accounts.select { |acc| acc[:token].length < 16 }
return unless weak_tokens.any?

Comment thread
gildesmarais marked this conversation as resolved.
handle_weak_account_tokens!(weak_tokens)
end

# @param accounts [Array<Hash{Symbol=>Object}>]
# @return [void]
def validate_account_token_shapes!(accounts)
malformed_accounts = accounts.reject { |acc| acc[:token].is_a?(String) && !acc[:token].empty? }
return unless malformed_accounts.any?

handle_malformed_account_tokens!(malformed_accounts)
end

# @param accounts [Array<Hash{Symbol=>Object}>]
# @return [void]
def validate_create_feed_token!(accounts)
return unless invalid_placeholder_create_feed_token?(accounts)

SecurityLogger.log_config_validation_failure(
'access_token',
'Placeholder create-feed token is not allowed when auto source is enabled'
)
warn_lines(
'CRITICAL: Placeholder create-feed token detected in production!',
'Set HTML2RSS_ACCESS_TOKEN to a strong token before enabling automatic feed generation.'
)
exit 1
end

# @param accounts [Array<Hash{Symbol=>Object}>]
# @return [void]
def validate_health_check_token!(accounts)
return unless placeholder_health_check_token?(accounts)

SecurityLogger.log_config_validation_failure(
'health_check_token',
'Placeholder health-check token is not allowed in production'
)
warn_lines(
'CRITICAL: Placeholder health-check token detected in production!',
'Set a real token for the health-check account or remove the account from production config.'
)
exit 1
end

# @param accounts [Array<Hash{Symbol=>Object}>]
# @return [Boolean]
def invalid_placeholder_create_feed_token?(accounts)
auto_source_enabled? && placeholder_create_feed_token?(accounts)
end

# @param accounts [Array<Hash{Symbol=>Object}>]
# @return [Boolean]
def placeholder_create_feed_token?(accounts)
accounts.any? { |account| account[:token] == RuntimeEnv::ADMIN_ACCESS_TOKEN_PLACEHOLDER }
end

# @param accounts [Array<Hash{Symbol=>Object}>]
# @return [Boolean]
def placeholder_health_check_token?(accounts)
accounts.any? do |account|
account[:username] == 'health-check' &&
account[:token] == RuntimeEnv::HEALTH_CHECK_TOKEN_PLACEHOLDER
end
end

# @param lines [Array<String>]
# @return [void]
def warn_lines(*lines)
Expand Down Expand Up @@ -140,31 +195,18 @@ def handle_weak_account_tokens!(weak_tokens)
exit 1
end

# @return [Boolean]
def missing_build_metadata?
build_metadata_values.any?(&:empty?)
end

# @return [Array<String>]
def build_metadata_values
%w[BUILD_TAG GIT_SHA].map { |key| ENV.fetch(key, '').strip }
end

# @param malformed_accounts [Array<Hash{Symbol=>Object}>]
# @return [void]
def log_missing_build_metadata!
SecurityLogger.log_config_validation_failure(
'build_metadata',
'Missing BUILD_TAG or GIT_SHA',
severity: :warn
def handle_malformed_account_tokens!(malformed_accounts)
malformed_usernames = malformed_accounts.map { |acc| acc[:username] || '(unknown)' }.join(', ')
SecurityLogger.log_config_validation_failure('account_tokens',
"Invalid token configuration for users: #{malformed_usernames}")
warn_lines(
'CRITICAL: Invalid account token configuration detected in production!',
'Each account token must be a non-empty string.',
"Invalid token configuration found for users: #{malformed_usernames}"
)
end

# @return [Array<String>]
def missing_build_metadata_warning_lines
[
'WARNING: Missing build metadata for production deployment.',
'Set BUILD_TAG and GIT_SHA to improve release traceability.'
]
exit 1
end
end
# rubocop:enable Metrics/ClassLength
Expand Down
26 changes: 22 additions & 4 deletions app/web/config/local_config.rb
Original file line number Diff line number Diff line change
@@ -1,6 +1,8 @@
# frozen_string_literal: true

require 'erb'
require 'yaml'
require_relative 'runtime_env'
begin
require 'html2rss/configs'
rescue LoadError => error
Expand Down Expand Up @@ -56,17 +58,33 @@ def global
end

##
# @return [Html2rss::Web::ConfigSnapshot::Snapshot]
def snapshot
@mutex.synchronize { @snapshot ||= load_snapshot }
rescue KeyError, TypeError, ArgumentError => error
raise InvalidConfig, "Invalid local config: #{error.message}"
end

##
# Reparses the current config file without touching memoized runtime
# state. Health checks use this path so config drift shows up without
# forcing live request handlers onto a reload path.
#
# @return [Hash<Symbol, Any>]
def yaml
YAML.safe_load_file(CONFIG_FILE, symbolize_names: true).freeze
def load_yaml
template = File.read(CONFIG_FILE)
YAML.safe_load(ERB.new(template, trim_mode: '-').result, symbolize_names: true).freeze
Comment on lines +74 to +76
rescue Errno::ENOENT => error
raise NotFound, "Configuration file not found: #{error.message}"
end

##
# Reparses and normalizes the current config file without mutating the
# memoized runtime snapshot.
#
# @return [Html2rss::Web::ConfigSnapshot::Snapshot]
def snapshot
@mutex.synchronize { @snapshot ||= ConfigSnapshot.load(yaml) }
def load_snapshot
ConfigSnapshot.load(load_yaml)
rescue KeyError, TypeError, ArgumentError => error
raise InvalidConfig, "Invalid local config: #{error.message}"
end
Expand Down
Loading
Loading