Add node_mcf for Eurostat imports to fix missing reference validation errors#2082
Add node_mcf for Eurostat imports to fix missing reference validation errors#2082ajaits wants to merge 7 commits into
Conversation
There was a problem hiding this comment.
Code Review
This pull request removes hardcoded dc_api_root configurations across multiple scripts, metadata CSVs, and configuration files, transitioning to retrieving the API root from the DC_API_ROOT environment variable with a fallback to the production API. Additionally, node_mcf entries were added to the Eurostat health determinants manifest. The reviewer identified two issues: first, removing dc_api_root from datacommons_wrappers.py ignores the use_autopush parameter, silently changing the default behavior; second, using config.get in dc_api_wrapper.py does not safely handle cases where dc_api_root is explicitly set to None or an empty string, which could cause a TypeError. Suggestions were provided to resolve both issues.
| wrapper_config = { | ||
| 'dc_api_batch_size': | ||
| max_items, | ||
| 'dc_api_root': | ||
| 'https://autopush.api.datacommons.org' | ||
| if use_autopush else 'https://api.datacommons.org' | ||
| 'dc_api_batch_size': max_items, | ||
| } |
There was a problem hiding this comment.
By completely removing dc_api_root from wrapper_config, the use_autopush parameter (which defaults to True) is now completely ignored when DC_API_ROOT is not set in the environment. This silently changes the default behavior of dc_check_existence from querying the autopush API to querying the production API.
To preserve the intended behavior while still allowing the DC_API_ROOT environment variable to take precedence, we can conditionally set dc_api_root only when DC_API_ROOT is not present in the environment.
| wrapper_config = { | |
| 'dc_api_batch_size': | |
| max_items, | |
| 'dc_api_root': | |
| 'https://autopush.api.datacommons.org' | |
| if use_autopush else 'https://api.datacommons.org' | |
| 'dc_api_batch_size': max_items, | |
| } | |
| wrapper_config = { | |
| 'dc_api_batch_size': max_items, | |
| } | |
| if 'DC_API_ROOT' not in os.environ: | |
| wrapper_config['dc_api_root'] = ( | |
| 'https://autopush.api.datacommons.org' | |
| if use_autopush else 'https://api.datacommons.org' | |
| ) |
| api_root = config.get('dc_api_root', | ||
| os.environ.get('DC_API_ROOT', _DEFAULT_API_ROOT)) |
There was a problem hiding this comment.
If config explicitly contains dc_api_root set to None or an empty string, config.get('dc_api_root', ...) will return that falsy value instead of falling back to the environment variable or default. This can lead to a TypeError when api_root is concatenated with the API path on line 552.
Using the or operator ensures a robust fallback in case of None or empty string values.
| api_root = config.get('dc_api_root', | |
| os.environ.get('DC_API_ROOT', _DEFAULT_API_ROOT)) | |
| api_root = (config.get('dc_api_root') or | |
| os.environ.get('DC_API_ROOT') or _DEFAULT_API_ROOT) |
References
- Avoid hardcoding environment-specific values within utility modules. Instead, pass these values as parameters or retrieve them from environment variables to ensure portability.
…dc-import-fix-autorefresh
Fixed with Antigravity:
Summary of Changes
We updated the shared manifest.json for Eurostat health determinants to include the missing node_mcf inputs. This resolves validation failures where the imports were "stuck" because they were not loading the node MCFs.
Target File:
manifest.json
Modified Imports:
EuroStatHealth_AlcoholConsumption
EuroStatHealth_FruitsAndVegetables
EuroStatHealth_SocialEnvironment
EurostatHealth_Tobacco_Consumption
Detailed Issues & Fixes⚠️ ) because they were not configured to load their respective node_mcf files, which contain the Statistical Variable definitions.
Eurostat Health Determinants (4 Imports)
Issue: The automation runs for these imports were failing validation (status Stuck / Failed Validation
Fix: Added the node_mcf property to the import_inputs section for each of the following imports in
manifest.json
:
diff
"import_name": "EuroStatHealth_AlcoholConsumption",
...
"import_inputs": [
{
"template_mcf": "alcohol_consumption/output/eurostat_population_alcoholconsumption.tmcf",
(Similar changes applied to EuroStatHealth_FruitsAndVegetables, EuroStatHealth_SocialEnvironment, and EurostatHealth_Tobacco_Consumption)
Note on EuroStatHealth_BodyMassIndex: The report suggested the same fix, but upon inspection, it was already present in the target workspace.
Expected Outcome
In the next automated run, these 4 imports should successfully load their node MCFs, allowing them to pass validation and transition to a healthy status.