[WhoScored] read_schedule() fails with JSONDecodeError in 1.9.0

**Describe the bug**
`WhoScored.read_schedule()` fails with `JSONDecodeError` in soccerdata `1.9.0`. The same workflow worked in my `1.8.8` environment.

This also affects `WhoScored.read_missing_players()` and `WhoScored.read_events()` when they need to call `read_schedule()` internally.

Expected behavior: `read_schedule()` should return the match schedule DataFrame instead of failing while decoding the response.

Python version: `3.12.13`

**Affected scrapers**
This affects the following scrapers:

- [ ] ClubElo
- [ ] ESPN
- [ ] FBref
- [ ] FiveThirtyEight
- [ ] Match History
- [ ] SoFIFA
- [ ] Understat
- [X] WhoScored

**Code example**
A minimal code example that fails. I used `no_cache=True` to make sure an invalid cached file was not causing the bug.

```python
import soccerdata as sd

ws = sd.WhoScored(leagues="ESP-La Liga", seasons="24-25", no_cache=True)
ws.read_schedule()
```

**Error message**

```text
JSONDecodeError: Expecting value: line 1 column 1 (char 0)
```

**Additional context**
I reproduced this in `soccerdata==1.9.0` but not in my `soccerdata==1.8.8` environment.

The same underlying failure also affects:

```python
ws.read_missing_players(match_id=...)
ws.read_events(match_id=...)
```

because both methods call `read_schedule()` before retrieving match-level data.

From a local comparison of the `1.8.8` and `1.9.0` source code, this may be related to a change in the common Selenium download path. In `1.8.8`, requests with `var=None` returned `document.body.innerHTML`; in `1.9.0`, the response goes through the new page validation path based on `page_source`.

`WhoScored.read_schedule()` then calls `json.load(reader)`. If the response is now HTML-wrapped instead of raw JSON, this raises the observed `JSONDecodeError`.

**Contributor Action Plan**

- [ ] I can fix this issue and will submit a pull request.
- [ ] I’m unsure how to fix this, but I'm willing to work on it with guidance.
- [X] I’m not able to fix this issue.

**Reproduction notebook**

I also attached the notebook I used while reproducing the issue and checking the behavior in my environment:

[Guía SoccerData (1.9.0).ipynb](https://github.com/user-attachments/files/27013133/Guia.SoccerData.1.9.0.ipynb)

**Local workaround**

I also found a local workaround that fixed the issue in my environment.

The patch adds a helper that first tries to parse the response as JSON. If that fails, it checks whether the response is HTML-wrapped and then extracts the text from the `<body>` before parsing it as JSON.

This fixed the failing `WhoScored.read_schedule()` call locally. Since `read_missing_players()` and `read_events()` call `read_schedule()` first, it also allowed those workflows to continue.

I am attaching the modified `whoscored.py` file for reference. I understand that this may not be the preferred final implementation, and that the maintainers may prefer to fix this in the common Selenium reader instead.

[whoscored_issue_940_local_patch.py](https://github.com/user-attachments/files/27049390/whoscored_issue_940_local_patch.py)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[WhoScored] read_schedule() fails with JSONDecodeError in 1.9.0 #940

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

[WhoScored] read_schedule() fails with JSONDecodeError in 1.9.0 #940

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions