A robust and flexible web scraper for Forex Factory calendar events. This tool uses Selenium and pandas to collect, update, and manage Forex Factory event data, with incremental scraping and optional detailed event information.
The main scraped CSV dataset is hosted on Hugging Face. The local forex_factory_cache.csv file is treated as generated/downloaded data and is intentionally ignored by git.
If you place forex_factory_cache.csv in the project root, the test suite will validate its basic schema without requiring a live scrape.
- Incremental Scraping: Only fetch new or updated events based on existing CSV data.
- Detailed Event Information: Optionally scrape detailed specifications for each event.
- Flexible Date Range: Specify custom date ranges for scraping.
- Timezone Support: Configure the timezone according to your preference.
- Data Management with pandas: Efficiently handle data merging and updates using pandas.
- Error Handling: Robust handling of common web scraping issues like stale elements and timeouts.
- Command-Line Interface: Easy-to-use CLI with configurable parameters.
- Python 3.12+: The current test environment uses Python 3.12.
- Google Chrome: Required for live Selenium scraping.
-
Clone the Repository
git clone https://github.com/yourusername/forexfactory_scraper.git cd forexfactory_scraper -
Create a Virtual Environment (Optional but Recommended)
python -m venv .venv
- Activate the Virtual Environment:
- Windows:
.venv\Scripts\activate
- macOS/Linux:
source .venv/bin/activate
- Windows:
- Activate the Virtual Environment:
-
Install Dependencies
Ensure you have
pipupdated:pip install --upgrade pip
Install required packages:
pip install -r requirements.txt
For development and tests, install:
pip install -r requirements-dev.txt
-
Download WebDriver
The scraper uses
undetected-chromedriverto handle dynamic content and bypass some scraping protections. No additional setup is required asundetected-chromedrivermanages the ChromeDriver version automatically.
The main script can be executed via the command line, allowing you to specify various parameters such as the date range, output CSV file, timezone, and whether to scrape detailed event information.
--start: (Required) Start date for scraping inYYYY-MM-DDformat.--end: (Required) End date for scraping inYYYY-MM-DDformat.--csv: (Optional) Output CSV file path. Default isforex_factory_cache.csv.--tz: (Optional) Timezone for event dates. Default isAsia/Tehran.--details: (Optional) Flag to enable scraping of detailed event information. If omitted, only basic event data is scraped.--impact: (Optional) Comma-separated impact filter, such ashigh,medium.--keep-currencies: (Optional) Space-separated currency filter, such asUSD EUR GBP.
Navigate to the project root directory and execute the script using Python:
python -m src.forexfactory.main --start YYYY-MM-DD --end YYYY-MM-DD [--csv OUTPUT_CSV] [--tz TIMEZONE] [--details]The scraper reads the existing output CSV before scraping. Days that already exist in the CSV are skipped by incremental mode. When --details is enabled, days with missing detail values are refreshed.
-
Scrape Events from March 21, 2024, to March 25, 2024, Including Details
python -m src.forexfactory.main --start 2024-03-21 --end 2024-03-25 --csv forex_factory_cache.csv --tz Asia/Tehran --details
-
Scrape Events from January 1, 2024, to January 31, 2024, Without Details
python -m src.forexfactory.main --start 2024-01-01 --end 2024-01-31 --csv january_events.csv --tz Asia/Tehran
-
Scrape Events from February 15, 2024, to February 20, 2024, Saving to a Custom CSV File
python -m src.forexfactory.main --start 2024-02-15 --end 2024-02-20 --csv feb_events.csv --tz Asia/Tehran
All dependencies are listed in requirements.txt. Key libraries include:
- selenium: For browser automation.
- pandas: For data manipulation and management.
- undetected-chromedriver: To bypass Selenium detection mechanisms.
- python-dateutil: For advanced date handling.
Install dependencies using:
pip install -r requirements.txtInstall test dependencies using:
pip install -r requirements-dev.txtRun the normal test suite with:
python -m pytest -qThe normal suite does not run a live Selenium scrape. If forex_factory_cache.csv exists in the project root, the integration test validates the cached dataset schema.
To run the live scrape integration test, set RUN_LIVE_SCRAPE=1:
RUN_LIVE_SCRAPE=1 python -m pytest -q tests/integration/test_full_scrape.pypython -m src.forexfactory.main --start 2024-03-21 --end 2024-03-25 --csv forex_factory_cache.csv --tz Asia/Tehran --details
This command scrapes Forex Factory events from March 21, 2024, to March 25, 2024, including detailed specifications for each event, and saves the data to forex_factory_cache.csv with Tehran timezone.
python -m src.forexfactory.main --start 2024-03-21 --end 2024-03-25 --csv forex_factory_cache.csv --tz Asia/TehranThis command performs the same scraping without fetching detailed event specifications, resulting in a faster scraping process.
-
StaleElementReferenceExceptionErrorsCause: The web page's DOM has changed, making the reference to the web element invalid.
Solution:
- Increase the wait time using
WebDriverWait. - Re-fetch the web element after certain actions.
- Implement retry mechanisms.
- Increase the wait time using
-
CAPTCHA or Cloudflare Challenges
Cause: Forex Factory may employ CAPTCHA or Cloudflare protection to prevent automated scraping.
Solution:
- Use
undetected-chromedriverto bypass some protections. - Implement delays between requests to mimic human behavior.
- Use proxies if necessary.
- Be mindful of the scraping rate to avoid IP bans.
- Use
-
Incorrect Date Parsing
Cause: Mismatch between the date format in the CSV and the expected format in the script.
Solution:
- Ensure that dates in the CSV are in ISO format (
YYYY-MM-DDTHH:MM:SS+TZ). - Verify that the CSV contains the expected columns:
DateTime,Currency,Impact,Event,Actual,Forecast,Previous, andDetail.
- Ensure that dates in the CSV are in ISO format (
-
Missing or Incorrect XPath Selectors
Cause: Changes in the Forex Factory website structure leading to incorrect XPath selectors.
Solution:
- Verify the current structure of the Forex Factory website.
- Update XPath selectors in the scraper accordingly.
-
Browser Driver Issues
Cause: Incompatible or outdated ChromeDriver versions.
Solution:
- Ensure that
undetected-chromedriveris up to date. - Verify that Google Chrome is updated to the latest version.
- Ensure that
Logs provide detailed information about the scraping process and can help identify issues.
- Info Logs: Provide general information about the scraping progress.
- Warning Logs: Indicate non-critical issues that do not stop the scraper.
- Error Logs: Highlight critical issues that may require attention.
Ensure that your terminal or log files capture these logs for effective debugging.
Contributions are welcome! If you encounter bugs or have suggestions for improvements, feel free to open an issue or submit a pull request.
-
Fork the Repository
-
Create a Feature Branch
git checkout -b feature/YourFeatureName
-
Commit Your Changes
git commit -m "Add your message here" -
Push to the Branch
git push origin feature/YourFeatureName
-
Open a Pull Request
Provide a clear description of your changes and the problem they solve.
This project is licensed under the MIT License.
Disclaimer: This scraper is intended for personal use and educational purposes only. Ensure compliance with Forex Factory's Terms of Service and avoid violating any usage policies. Use responsibly.