You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
`html2rss` is an open-source project, and its development is made possible by the support of our community. If you find `html2rss` useful, please consider sponsoring the project.
11
+
12
+
## Why Sponsor?
13
+
14
+
-**Ensure the project's longevity:** Your sponsorship helps to ensure that the project remains actively maintained and developed.
15
+
-**Support new features:** Your contribution will help to fund the development of new features and improvements.
16
+
-**Show your appreciation:** Sponsoring is a great way to show your appreciation for the project and the work that goes into it.
17
+
18
+
## How to Sponsor
19
+
20
+
You can sponsor the project through [GitHub Sponsors](https://github.com/sponsors/gildesmarais).
This section provides a collection of ready-to-use `html2rss` configuration examples for various popular websites and common use cases. These examples demonstrate how to tackle different HTML structures and content types.
12
-
13
-
Use these as a starting point, modify them to fit your specific needs, or get inspiration for building your own custom feeds.
14
-
15
-
---
16
-
17
-
### How to Use an Example
18
-
19
-
1.**Copy the YAML:** Copy the entire YAML configuration block for the example you're interested in.
20
-
2.**Save as `.yml`:** Save the copied content into a file, e.g., `my-example.yml`.
21
-
3.**Generate the Feed:** Run `html2rss` from your terminal:
22
-
```bash
23
-
html2rss feed my-example.yml > my-example.xml
24
-
```
25
-
4. **Enjoy!** Open `my-example.xml`in your favorite RSS reader.
26
-
27
-
---
28
-
29
-
### Contribute Your Own Examples!
30
-
31
-
Have you created a useful `html2rss` configuration? We encourage you to share it with the community by contributing to the [`html2rss-configs`](https://github.com/html2rss/html2rss-configs) repository.
11
+
This section provides practical examples and solutions for common tasks when using the `html2rss` gem.
For easier management, especially when using the CLI or `html2rss-web`, you can store your feed configurations in a YAML file.
12
+
13
+
## Global and Feed-Specific Configurations
14
+
15
+
You can define global settings that apply to all feeds, and then define individual feed configurations under the `feeds` key.
16
+
17
+
```yml
18
+
# Global settings
19
+
headers:
20
+
"User-Agent": "Mozilla/5.0 (iPhone; CPU iPhone OS 10_3_1 like Mac OS X) AppleWebKit/603.1.30 (KHTML, like Gecko) Version/10.0 Mobile/14E304 Safari/602.1"
When a website returns a JSON response (i.e., with a `Content-Type` of `application/json`), `html2rss` converts the JSON to XML, allowing you to use CSS selectors for data extraction.
12
+
13
+
> [!NOTE]
14
+
> The JSON response must be an Array or a Hash for the conversion to work.
Copy file name to clipboardExpand all lines: ruby-gem/index.md
+6-15Lines changed: 6 additions & 15 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -5,25 +5,16 @@ nav_order: 3
5
5
has_children: true
6
6
---
7
7
8
-
# The html2rss Ruby Gem ([GitHub Repo](https://github.com/html2rss/html2rss))
8
+
# The html2rss Ruby Gem
9
9
10
-
This section documents the `html2rss` Ruby gem, the core library for `html2rss-web`. This documentation targets developers using the gem directly. For an easier start, use the [web application]({{ '/web-application' | relative_url }}).
10
+
This section provides comprehensive documentation for the `html2rss` Ruby gem.
11
11
12
12
## Getting Started
13
13
14
-
Start with the [Installation guide]({{ '/ruby-gem/tutorials/installation' | relative_url }}). Then, create your [first feed]({{ '/ruby-gem/tutorials/your-first-feed' | relative_url }}).
14
+
If you are new to `html2rss`, we recommend starting with the [tutorials]({{ '/ruby-gem/tutorials' | relative_url }}).
15
15
16
16
## Documentation Sections
17
17
18
-
-**[Tutorials]({{ '/ruby-gem/tutorials' | relative_url }})**: Step-by-step guides to get you started.
19
-
-**[How-To Guides]({{ '/ruby-gem/how-to' | relative_url }})**: Solutions to common problems and tasks.
20
-
-**[Reference]({{ '/ruby-gem/reference' | relative_url }})**: Technical details and configuration options.
21
-
22
-
## Advanced Topics
23
-
24
-
-[**Handling Dynamic Content and JavaScript**]({{ '/ruby-gem/how-to/handling-dynamic-content' | relative_url }}): Process JavaScript-heavy websites.
Copy file name to clipboardExpand all lines: ruby-gem/reference/auto-source.md
+12-22Lines changed: 12 additions & 22 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -6,37 +6,33 @@ parent: Reference
6
6
grand_parent: Ruby Gem
7
7
---
8
8
9
-
# `auto_source`
9
+
# Auto Source
10
10
11
-
The `auto_source` scraper is the easiest way to create a feed. It intelligently finds items on a page without requiring you to specify CSS selectors.
11
+
The `auto_source` scraper automatically finds items on a page, so you don't have to specify CSS selectors.
12
12
13
-
You can enable it in your YAML config like this:
13
+
To enable it, add `auto_source: {}` to your configuration:
14
14
15
15
```yaml
16
16
channel:
17
17
url: https://example.com
18
18
auto_source: {}
19
19
```
20
20
21
-
---
22
-
23
-
## How it Works
24
-
25
-
The `auto_source` scraper uses a series of strategies to find content:
21
+
## How It Works
26
22
27
-
1. **`schema`:** It looks for structured data in the form of `<script type="json/ld">` tags. Many websites use this to provide machine-readable information about their content, often following the [Schema.org](https://schema.org/) standard.
28
-
2. **`semantic_html`:** It searches for semantic HTML5 tags like `<article>`, `<main>`, and `<section>`. These tags are often used to define the main content of a page.
29
-
3. **`html`:** As a last resort, it analyzes the entire HTML structure to find frequently occurring selectors that are likely to contain the main content.
23
+
`auto_source` uses the following strategies to find content:
| `ttl` | Optional | Integer | Auto-generated | Time to live in minutes. `html2rss` will use the `max-age` from the response headers if available, otherwise it will default to `360`. |
35
-
| `language` | Optional | String | Auto-generated | Determined by the `lang` attribute of the `<html>` tag. |
36
-
| `time_zone` | Optional | String | `'UTC'` | The time zone to use for parsing dates. See a [list of valid time zones](https://en.wikipedia.org/wiki/List_of_tz_database_time_zones). |
| `url` | **Required** | The URL of the website to scrape. |
29
+
| `title` | Optional | The title of the RSS feed. Defaults to the website's title. |
30
+
| `description` | Optional | A description for the RSS feed. Defaults to the website's meta description. |
31
+
| `author` | Optional | The author of the feed, in the format `email (Name)`. |
32
+
| `ttl` | Optional | The "time to live" for the feed in minutes. Defaults to the `max-age` from the response headers, or `360`. |
33
+
| `language` | Optional | The language of the feed. Defaults to the `lang` attribute of the `<html>` tag. |
34
+
| `time_zone` | Optional | The time zone for parsing dates. See the [list of tz database time zones](https://en.wikipedia.org/wiki/List_of_tz_database_time_zones). |
0 commit comments