Skip to content

[WhoScored] read_schedule() fails with JSONDecodeError in 1.9.0 #940

@alexrguezzz

Description

@alexrguezzz

Describe the bug
WhoScored.read_schedule() fails with JSONDecodeError in soccerdata 1.9.0. The same workflow worked in my 1.8.8 environment.

This also affects WhoScored.read_missing_players() and WhoScored.read_events() when they need to call read_schedule() internally.

Expected behavior: read_schedule() should return the match schedule DataFrame instead of failing while decoding the response.

Python version: 3.12.13

Affected scrapers
This affects the following scrapers:

  • ClubElo
  • ESPN
  • FBref
  • FiveThirtyEight
  • Match History
  • SoFIFA
  • Understat
  • WhoScored

Code example
A minimal code example that fails. I used no_cache=True to make sure an invalid cached file was not causing the bug.

import soccerdata as sd

ws = sd.WhoScored(leagues="ESP-La Liga", seasons="24-25", no_cache=True)
ws.read_schedule()

Error message

JSONDecodeError: Expecting value: line 1 column 1 (char 0)

Additional context
I reproduced this in soccerdata==1.9.0 but not in my soccerdata==1.8.8 environment.

The same underlying failure also affects:

ws.read_missing_players(match_id=...)
ws.read_events(match_id=...)

because both methods call read_schedule() before retrieving match-level data.

From a local comparison of the 1.8.8 and 1.9.0 source code, this may be related to a change in the common Selenium download path. In 1.8.8, requests with var=None returned document.body.innerHTML; in 1.9.0, the response goes through the new page validation path based on page_source.

WhoScored.read_schedule() then calls json.load(reader). If the response is now HTML-wrapped instead of raw JSON, this raises the observed JSONDecodeError.

Contributor Action Plan

  • I can fix this issue and will submit a pull request.
  • I’m unsure how to fix this, but I'm willing to work on it with guidance.
  • I’m not able to fix this issue.

Reproduction notebook

I also attached the notebook I used while reproducing the issue and checking the behavior in my environment:

Guía SoccerData (1.9.0).ipynb

Local workaround

I also found a local workaround that fixed the issue in my environment.

The patch adds a helper that first tries to parse the response as JSON. If that fails, it checks whether the response is HTML-wrapped and then extracts the text from the <body> before parsing it as JSON.

This fixed the failing WhoScored.read_schedule() call locally. Since read_missing_players() and read_events() call read_schedule() first, it also allowed those workflows to continue.

I am attaching the modified whoscored.py file for reference. I understand that this may not be the preferred final implementation, and that the maintainers may prefer to fix this in the common Selenium reader instead.

whoscored_issue_940_local_patch.py

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions