Skip to content

stephtowch/Web_Scraping_in_R

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

7 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Web Scraping in R

Last updated 2024-03-21.

This GitHub repository contains R code for web scraping course information on the Essex Summer School Website, as part of my exam for the Web Scaping and Data Management module on the MA Social Science Data Analytics course.

Web Scraping and Data Management using rvest.

🔭 Overview

The repository is organised into the following sections:

  • 1.0 User Guide): This is a procedure guide for using the rvest package for web scraping, as well as tidyverse, stringr and purrr for some data wrangling.
  • 2.0 Instruction Sheet: This is the instruction sheet for the exam that details what to web scrap and report.
  • 3.0 Data File: This is the.csv output file to review course information at the Essex Summer School including the course names, URLs and descriptions.

📜 Notes

This repository assumes basic competence in R (setting working directory, loading packages, etc) and contains only materials relating to Web Scraping and Data Management in R. So the focus will be generally on the application and not on the theory.

🛠️ Setup

To run the code, you will need:

  1. A fresh installation of R (preferably version 4.4.1 or above).

  2. RStudio IDE (optional but recommended).

  3. Install the required packages by running:

    # Load the package
    install.packages(c("rvest", "tidyverse", "stringr", "pacman", "purrr")).
Package Versions

Run on Windows 11 x64 (build 22621), with R version 4.3.2.

The packages used here:

  • revtools 1.0.4(CRAN)
  • tidyverse 2.0.0(CRAN)
  • stringr 1.5.1(CRAN)
  • purrr 1.0.2(CRAN)

Feel free to adjust this based on your preferences and specific details about your code and setup.

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors