Skip to content

Add node_mcf for Eurostat imports to fix missing reference validation errors#2082

Open
ajaits wants to merge 7 commits into
datacommonsorg:masterfrom
ajaits:dc-import-fix-autorefresh
Open

Add node_mcf for Eurostat imports to fix missing reference validation errors#2082
ajaits wants to merge 7 commits into
datacommonsorg:masterfrom
ajaits:dc-import-fix-autorefresh

Conversation

@ajaits

@ajaits ajaits commented Jul 3, 2026

Copy link
Copy Markdown
Contributor

Fixed with Antigravity:

Summary of Changes
We updated the shared manifest.json for Eurostat health determinants to include the missing node_mcf inputs. This resolves validation failures where the imports were "stuck" because they were not loading the node MCFs.

Target File:
manifest.json
Modified Imports:
EuroStatHealth_AlcoholConsumption
EuroStatHealth_FruitsAndVegetables
EuroStatHealth_SocialEnvironment
EurostatHealth_Tobacco_Consumption

  1. Detailed Issues & Fixes
    Eurostat Health Determinants (4 Imports)
    Issue: The automation runs for these imports were failing validation (status Stuck / Failed Validation ⚠️) because they were not configured to load their respective node_mcf files, which contain the Statistical Variable definitions.
    Fix: Added the node_mcf property to the import_inputs section for each of the following imports in
    manifest.json
    :
    diff

    "import_name": "EuroStatHealth_AlcoholConsumption",
    ...
    "import_inputs": [
    {
    "template_mcf": "alcohol_consumption/output/eurostat_population_alcoholconsumption.tmcf",

  •        "cleaned_csv": "alcohol_consumption/output/eurostat_population_alcoholconsumption.csv"
    
  •        "cleaned_csv": "alcohol_consumption/output/eurostat_population_alcoholconsumption.csv",
    
  •        "node_mcf": "alcohol_consumption/output/eurostat_population_alcoholconsumption.mcf"
       }
    
    ],
    (Similar changes applied to EuroStatHealth_FruitsAndVegetables, EuroStatHealth_SocialEnvironment, and EurostatHealth_Tobacco_Consumption)

Note on EuroStatHealth_BodyMassIndex: The report suggested the same fix, but upon inspection, it was already present in the target workspace.
Expected Outcome
In the next automated run, these 4 imports should successfully load their node MCFs, allowing them to pass validation and transition to a healthy status.

@gemini-code-assist gemini-code-assist Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request removes hardcoded dc_api_root configurations across multiple scripts, metadata CSVs, and configuration files, transitioning to retrieving the API root from the DC_API_ROOT environment variable with a fallback to the production API. Additionally, node_mcf entries were added to the Eurostat health determinants manifest. The reviewer identified two issues: first, removing dc_api_root from datacommons_wrappers.py ignores the use_autopush parameter, silently changing the default behavior; second, using config.get in dc_api_wrapper.py does not safely handle cases where dc_api_root is explicitly set to None or an empty string, which could cause a TypeError. Suggestions were provided to resolve both issues.

Comment on lines 64 to 66
wrapper_config = {
'dc_api_batch_size':
max_items,
'dc_api_root':
'https://autopush.api.datacommons.org'
if use_autopush else 'https://api.datacommons.org'
'dc_api_batch_size': max_items,
}

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

By completely removing dc_api_root from wrapper_config, the use_autopush parameter (which defaults to True) is now completely ignored when DC_API_ROOT is not set in the environment. This silently changes the default behavior of dc_check_existence from querying the autopush API to querying the production API.

To preserve the intended behavior while still allowing the DC_API_ROOT environment variable to take precedence, we can conditionally set dc_api_root only when DC_API_ROOT is not present in the environment.

Suggested change
wrapper_config = {
'dc_api_batch_size':
max_items,
'dc_api_root':
'https://autopush.api.datacommons.org'
if use_autopush else 'https://api.datacommons.org'
'dc_api_batch_size': max_items,
}
wrapper_config = {
'dc_api_batch_size': max_items,
}
if 'DC_API_ROOT' not in os.environ:
wrapper_config['dc_api_root'] = (
'https://autopush.api.datacommons.org'
if use_autopush else 'https://api.datacommons.org'
)

Comment thread util/dc_api_wrapper.py
Comment on lines +547 to +548
api_root = config.get('dc_api_root',
os.environ.get('DC_API_ROOT', _DEFAULT_API_ROOT))

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

If config explicitly contains dc_api_root set to None or an empty string, config.get('dc_api_root', ...) will return that falsy value instead of falling back to the environment variable or default. This can lead to a TypeError when api_root is concatenated with the API path on line 552.

Using the or operator ensures a robust fallback in case of None or empty string values.

Suggested change
api_root = config.get('dc_api_root',
os.environ.get('DC_API_ROOT', _DEFAULT_API_ROOT))
api_root = (config.get('dc_api_root') or
os.environ.get('DC_API_ROOT') or _DEFAULT_API_ROOT)
References
  1. Avoid hardcoding environment-specific values within utility modules. Instead, pass these values as parameters or retrieve them from environment variables to ensure portability.

@ajaits ajaits requested a review from niveditasing July 3, 2026 13:12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants