html2rss
diff --git a/‎astro-migration/src/content/docs/get-involved/index.mdx‎
Lines changed: 15 additions & 0 deletions b/‎astro-migration/src/content/docs/get-involved/index.mdx‎
Lines changed: 15 additions & 0 deletions
diff --git a/‎astro-migration/src/content/docs/ruby-gem/how-to/advanced-content-extraction.mdx‎
Lines changed: 16 additions & 0 deletions b/‎astro-migration/src/content/docs/ruby-gem/how-to/advanced-content-extraction.mdx‎
Lines changed: 16 additions & 0 deletions
diff --git a/‎astro-migration/src/content/docs/ruby-gem/how-to/index.mdx‎
Lines changed: 8 additions & 0 deletions b/‎astro-migration/src/content/docs/ruby-gem/how-to/index.mdx‎
Lines changed: 8 additions & 0 deletions
diff --git a/‎astro-migration/src/content/docs/ruby-gem/index.mdx‎
Lines changed: 18 additions & 0 deletions b/‎astro-migration/src/content/docs/ruby-gem/index.mdx‎
Lines changed: 18 additions & 0 deletions
diff --git a/‎astro-migration/src/content/docs/ruby-gem/installation.mdx‎
Lines changed: 68 additions & 0 deletions b/‎astro-migration/src/content/docs/ruby-gem/installation.mdx‎
Lines changed: 68 additions & 0 deletions
diff --git a/‎astro-migration/src/content/docs/ruby-gem/reference/index.mdx‎
Lines changed: 8 additions & 0 deletions b/‎astro-migration/src/content/docs/ruby-gem/reference/index.mdx‎
Lines changed: 8 additions & 0 deletions
diff --git a/‎astro-migration/src/content/docs/ruby-gem/reference/selectors.mdx‎
Lines changed: 137 additions & 0 deletions b/‎astro-migration/src/content/docs/ruby-gem/reference/selectors.mdx‎
Lines changed: 137 additions & 0 deletions
diff --git a/‎astro-migration/src/content/docs/ruby-gem/tutorials/index.mdx‎
Lines changed: 8 additions & 0 deletions b/‎astro-migration/src/content/docs/ruby-gem/tutorials/index.mdx‎
Lines changed: 8 additions & 0 deletions
diff --git a/‎astro-migration/src/content/docs/ruby-gem/tutorials/simple-blog-list.mdx‎
Lines changed: 62 additions & 0 deletions b/‎astro-migration/src/content/docs/ruby-gem/tutorials/simple-blog-list.mdx‎
Lines changed: 62 additions & 0 deletions
@@ -0,0 +1,15 @@
+---
+title: 'Get Involved'
+description: 'Engage with the html2rss project. Contribute and connect with the community.'
+---
+
+# Get Involved
+
+- [**Sponsoring**](/get-involved/sponsoring)
+
+Engage with the `html2rss` project. Contribute and connect with the community.
+
+- [**Project Roadmap**](https://github.com/orgs/html2rss/projects/3/views/1): View current work, plans, and priorities.
+- [**Report Bugs & Discuss Features**](/get-involved/issues-and-features): Report bugs or propose features.
+- [**Join Community Discussions**](/get-involved/discussions): Connect with users and contributors.
+- [**Contribute to html2rss**](/get-involved/contributing): Contribute code, documentation, or feed configurations.
@@ -0,0 +1,16 @@
+---
+title: 'Advanced Content Extraction'
+description: 'While basic selectors are straightforward, you can achieve very precise content extraction by combining selectors with different extractors and post-processors.'
+---
+
+# Advanced Content Extraction with Selectors
+
+While basic selectors are straightforward, you can achieve very precise content extraction by combining selectors with different extractors and post-processors.
+
+## Extractors
+
+Learn how to extract specific attributes (like `src` for images) or static values. See [Extractors](/ruby-gem/reference/selectors).
+
+## Post Processors
+
+Manipulate extracted text, sanitize HTML, convert Markdown, or apply custom logic. See [Post Processors](/ruby-gem/reference/selectors).
@@ -0,0 +1,8 @@
+---
+title: 'How-To Guides'
+description: 'This section provides practical examples and solutions for common tasks when using the html2rss gem.'
+---
+
+# How-To Guides
+
+This section provides practical examples and solutions for common tasks when using the `html2rss` gem.
@@ -0,0 +1,18 @@
+---
+title: 'Ruby Gem'
+description: 'This section provides comprehensive documentation for the html2rss Ruby gem.'
+---
+
+# The html2rss Ruby Gem
+
+This section provides comprehensive documentation for the `html2rss` Ruby gem.
+
+## Getting Started
+
+If you are new to `html2rss`, we recommend starting with the [tutorials](/ruby-gem/tutorials).
+
+## Documentation Sections
+
+- **[Tutorials](/ruby-gem/tutorials)**: Step-by-step guides to help you get started with `html2rss`.
+- **[How-To Guides](/ruby-gem/how-to)**: Practical examples and solutions for common tasks.
+- **[Reference](/ruby-gem/reference)**: Detailed information on configuration options.
@@ -0,0 +1,68 @@
+---
+title: 'Installation'
+description: 'This guide will walk you through the process of installing html2rss on your system.'
+---
+
+# Installation
+
+This guide will walk you through the process of installing html2rss on your system. html2rss can be installed in several ways, depending on your preferred method and environment.
+
+---
+
+### Prerequisites
+
+- **Ruby:** html2rss is built with Ruby. Ensure you have Ruby installed (version 3.2 or higher required). You can check your Ruby version by running `ruby -v` in your terminal. If you don't have Ruby, visit [ruby-lang.org](https://www.ruby-lang.org/en/documentation/installation/) for installation instructions.
+- **Bundler (Recommended):** Bundler is a Ruby gem that manages your application's dependencies. It's highly recommended for a smooth installation. Install it with `gem install bundler`.
+
+---
+
+### Method 1: Gem Installation (Recommended for CLI Usage)
+
+The simplest way to get html2rss for command-line usage is to install it as a Ruby gem.
+
+```bash
+gem install html2rss
+```
+
+After installation, you should be able to run `html2rss --version` to confirm it's working.
+
+---
+
+### Method 2: Using a Gemfile (For Ruby Projects)
+
+If you're integrating html2rss into an existing Ruby project, add it to your `Gemfile`:
+
+```ruby
+# Gemfile
+gem 'html2rss'
+```
+
+Then, run `bundle install` in your project directory.
+
+---
+
+### Method 3: GitHub Codespaces (For Cloud Development)
+
+For a quick start without local setup, you can develop html2rss directly in your browser using GitHub Codespaces:
+
+[![Open in GitHub Codespaces](https://github.com/codespaces/badge.svg)](https://github.com/codespaces/new?repo=html2rss/html2rss)
+
+The Codespace comes pre-configured with Ruby 3.4, all dependencies, and VS Code extensions ready to go!
+
+---
+
+### Verifying Installation
+
+To ensure html2rss is installed correctly, open your terminal and run:
+
+```bash
+html2rss --version
+```
+
+You should see the installed version number. If you encounter any issues, please refer to the [Troubleshooting Guide](/support/troubleshooting).
+
+---
+
+### Next Steps
+
+Now that html2rss is installed, let's create your [first RSS feed](/ruby-gem/tutorials/your-first-feed)!
@@ -0,0 +1,8 @@
+---
+title: 'Reference'
+description: 'This section provides detailed information on the various configuration options available in html2rss.'
+---
+
+# Reference
+
+This section provides detailed information on the various configuration options available in `html2rss`.
@@ -0,0 +1,137 @@
+---
+title: 'Selectors'
+description: 'The selectors scraper gives you fine-grained control over content extraction using CSS selectors.'
+---
+
+# Selectors
+
+The `selectors` scraper gives you fine-grained control over content extraction using CSS selectors.
+
+> A valid RSS item requires at least a `title` or a `description`.
+
+## Basic Configuration
+
+At a minimum, you need an `items` selector to define the list of articles and a `title` selector for the article titles.
+
+```yml
+channel:
+  url: "https://example.com"
+selectors:
+  items:
+    selector: ".article"
+  title:
+    selector: "h1"
+```
+
+## Automatic Item Enhancement
+
+To simplify configuration, `html2rss` can automatically extract the `title`, `url`, and `image` from each item. This feature is enabled by default.
+
+```yml
+selectors:
+  items:
+    selector: ".article"
+    enhance: true # default: true
+```
+
+## RSS 2.0 Selectors
+
+While you can define any named selector, only the following are used in the final RSS feed:
+
+| RSS 2.0 Tag   | `html2rss` Name |
+| ------------- | --------------- | ------------------------------ |
+| `title`       | `title`         |
+| `description` | `description`   |
+| `link`        | `url`           |
+| `author`      | `author`        |
+| `category`    | `categories`    |
+| `guid`        | `guid`          |
+| `enclosure`   | `enclosure`     |
+| `pubDate`     | `published_at`  |
+| `comments`    | `comments`      | ⚠️ _Not currently implemented_ |
+
+## Selector Options
+
+Each selector can be configured with the following options:
+
+| Name           | Description                                              |
+| -------------- | -------------------------------------------------------- |
+| `selector`     | The CSS selector for the target element.                 |
+| `extractor`    | The extractor to use for this selector.                  |
+| `attribute`    | The attribute name (required for `attribute` extractor). |
+| `static`       | The static value (required for `static` extractor).      |
+| `post_process` | A list of post-processors to apply to the value.         |
+
+### Extractors
+
+Extractors define how to get the value from a selected element.
+
+- `text`: The inner text of the element (default).
+- `html`: The outer HTML of the element.
+- `href`: The value of the `href` attribute.
+- `attribute`: The value of a specified attribute.
+- `static`: A static value.
+
+### Post-Processors
+
+Post-processors manipulate the extracted value.
+
+- `gsub`: Performs a global substitution on a string.
+- `html_to_markdown`: Converts HTML to Markdown.
+- `markdown_to_html`: Converts Markdown to HTML.
+- `parse_time`: Parses a string into a `Time` object.
+- `parse_uri`: Parses a string into a `URI` object.
+- `sanitize_html`: Sanitizes HTML to prevent security vulnerabilities.
+- `substring`: Extracts a substring from a string.
+- `template`: Creates a new string from a template and other selector values.
+
+> Always use the `sanitize_html` post-processor for any HTML content to prevent security risks.
+
+## Advanced Usage
+
+### Categories
+
+To add categories to an item, provide a list of selector names to the `categories` selector.
+
+```yml
+selectors:
+  genre:
+    selector: ".genre"
+  branch:
+    selector: ".branch"
+  categories:
+    - genre
+    - branch
+```
+
+### Custom GUID
+
+To create a custom GUID for an item, provide a list of selector names to the `guid` selector.
+
+```yml
+selectors:
+  title:
+    selector: "h1"
+  url:
+    selector: "a"
+    extractor: "href"
+  guid:
+    - url
+```
+
+### Enclosures
+
+To add an enclosure (e.g., an image, audio, or video file) to an item, use the `enclosure` selector to specify the URL of the file.
+
+```yml
+selectors:
+  items:
+    selector: ".post"
+  title:
+    selector: "h2"
+  enclosure:
+    selector: "audio"
+    extractor: "attribute"
+    attribute: "src"
+    content_type: "audio/mp3"
+```
@@ -0,0 +1,8 @@
+---
+title: 'Tutorials'
+description: 'This section provides step-by-step tutorials to help you get started with the html2rss Ruby gem.'
+---
+
+# Tutorials
+
+This section provides step-by-step tutorials to help you get started with the `html2rss` Ruby gem.
@@ -0,0 +1,62 @@
+---
+title: 'Scraping a Simple Blog List'
+description: 'This example demonstrates how to create a feed from a typical blog that has a list of articles on its homepage.'
+---
+
+# Tutorial: Scraping a Simple Blog List
+
+This example demonstrates how to create a feed from a typical blog that has a list of articles on its homepage.
+
+---
+
+## The Goal
+
+We want to create an RSS feed that contains the title, link, and summary of each article on the blog.
+
+---
+
+## The HTML
+
+Here's a simplified view of the HTML structure we're targeting. The key is to find a container element that wraps each blog post (in this case, `.post-item`) and then find the selectors for the title, link, and summary within that container.
+
+```html
+<div class="posts">
+  <div class="post-item">
+    <h2 class="post-title"><a href="/blog/post-1">First Post Title</a></h2>
+    <p class="post-summary">Summary of the first post...</p>
+  </div>
+  <div class="post-item">
+    <h2 class="post-title"><a href="/blog/post-2">Second Post Title</a></h2>
+    <p class="post-summary">Summary of the second post...</p>
+  </div>
+</div>
+```
+
+---
+
+## The Configuration
+
+This configuration uses the `selectors` scraper to precisely extract the content we want.
+
+```yaml
+channel:
+  url: https://example.com/blog
+selectors:
+  items:
+    selector: ".post-item"
+  title:
+    selector: ".post-title a"
+  url:
+    selector: ".post-title a"
+    extractor: "href"
+  description:
+    selector: ".post-summary"
+```
+
+### Configuration Breakdown
+
+- **`items.selector: ".post-item"`**: This is the most important selector. It tells `html2rss` that every element with the class `post-item` is a single item in the RSS feed.
+- **`title.selector: ".post-title a"`**: Within each `.post-item`, this finds the `<a>` tag inside the element with the class `post-title`.
+- **`url.selector: ".post-title a"`**: This finds the same `<a>` tag.
+- **`url.extractor: "href"`**: This extracts the URL from the `href` attribute of the `<a>` tag.
+- **`description.selector: ".post-summary"`**: This finds the element with the class `post-summary`.