Last updated 2024-03-21.
This GitHub repository contains R code for web scraping course information on the Essex Summer School Website, as part of my exam for the Web Scaping and Data Management module on the MA Social Science Data Analytics course.
Web Scraping and Data Management using rvest.
The repository is organised into the following sections:
- 1.0 User Guide): This is a procedure guide for using the
rvestpackage for web scraping, as well astidyverse,stringrandpurrrfor some data wrangling. - 2.0 Instruction Sheet: This is the instruction sheet for the exam that details what to web scrap and report.
- 3.0 Data File: This is the.csv output file to review course information at the Essex Summer School including the course names, URLs and descriptions.
This repository assumes basic competence in R (setting working directory, loading packages, etc) and contains only materials relating to Web Scraping and Data Management in R. So the focus will be generally on the application and not on the theory.
To run the code, you will need:
-
A fresh installation of
R(preferably version 4.4.1 or above). -
RStudio IDE (optional but recommended).
-
Install the required packages by running:
# Load the package install.packages(c("rvest", "tidyverse", "stringr", "pacman", "purrr")).
Package Versions
Run on Windows 11 x64 (build 22621), with R version 4.3.2.
The packages used here:
revtools1.0.4(CRAN)tidyverse2.0.0(CRAN)stringr1.5.1(CRAN)purrr1.0.2(CRAN)
Feel free to adjust this based on your preferences and specific details about your code and setup.
