Implement parametrisation for standardising model runs#392
Implement parametrisation for standardising model runs#392mo-nikosbaltas wants to merge 19 commits intomainfrom
Conversation
Zubair Maalick (zmaalick)
left a comment
There was a problem hiding this comment.
All good, nothing to add
Naomi Parsons (NParsonsMO)
left a comment
There was a problem hiding this comment.
I remember Emma Hogan (@ehogan) saying that this PR should make a variables file per dataset, which is not currently happening.

I've also queried a couple of other bits.
There was a problem hiding this comment.
I can believe it was necessary to pull this step out into its own app (if it just needs running once overall rather than separately for each dataset), so if Emma Hogan (@ehogan) is happy with the addition of a new app(?) then this seems fine.
There was a problem hiding this comment.
The extra API (restrucre_dirs) is required because it acts as the fan-in api for synchronisation. We have two instances of standardise_model_data running in parallel and unless they both finish you cannot restructure the data. I had discussed this with Alistair as well.
There was a problem hiding this comment.
There may be a way to run restructure_for_cmip6 in a way that can be done in parallel, but I propose that any investigation related to this is deferred. The name of the app should be changed, but I proposed that this is also deferred (to #318).
There was a problem hiding this comment.
Following the agreement we made at the technical meeting this morning to revert configure_standardise.sh, restructure_dirs.sh should be reverted to this previous version.
There was a problem hiding this comment.
Done. Although not different it had only some useful logging which have been removed. 7d8aef1
There was a problem hiding this comment.
The version of restructure_dirs.sh in this PR has not been reverted to this previous version. Please revert the changes to restructure_dirs.sh so that it matches this previous version.
There was a problem hiding this comment.
I removed all comments and checks and reverted back to the serial version. f889818
| fi | ||
|
|
||
| # --------------------------------------------------------------------------- | ||
| # 1. Create variables.txt once (shared by both runs) |
There was a problem hiding this comment.
I think we are meant to make two now, not one shared one.
There was a problem hiding this comment.
We had changed our minds!!! Perhaps you missed that. Not allowed to have a separate 'create_variables' app.
There was a problem hiding this comment.
Would it not just be a case of making a variables_<dataset>.txt filepath akin to REQUEST_PATH = ${CYLC_WORKFLOW_SHARE_DIR}/etc/request_%(dataset)s.cfg while keeping it in the same (combined) app?
There was a problem hiding this comment.
As discussed at the technical meeting this morning, the reason we didn't want to introduce a create_variables app that created a single variables file is because it's possible for the same variables to come from different streams depending on the model evaluation run (/ dataset) used, which means a variables file is required for each dataset. Keeping the variables file creation within configure_standardise is preferred, since it minimises the number of changes compared to the previous version of the code. Edited to add that since configure_standardise is running in parallel, each job should write its own variables file, rather than try to write a single file with the same name, which is not safe.
There was a problem hiding this comment.
Keep it as it is.
There was a problem hiding this comment.
All defensive programming and logging have been removed.
| [[[environment]]] | ||
| ROSE_TASK_APP = standardise_model_data | ||
| SITE = {{ SITE }} | ||
| RUN_LABEL = %(dataset)s |
There was a problem hiding this comment.
It doesn't seem good to define these things (RUN_LABEL and REQUEST_PATH) twice (once in configure_standardise<recipe,dataset> and once in standardise_model_data) - could you just define them once then inherit?
There was a problem hiding this comment.
Also, please use the appropriate Cylc variable for the parameterisation variable, i.e. ${CYLC_TASK_PARAM_dataset} rather than %(dataset)s. An example of this for the recipe parameterised variable is available at https://github.com/MetOffice/CMEW/blob/main/CMEW/flow.cylc#L51.
There was a problem hiding this comment.
Would it be possible to use only ${CYLC_TASK_PARAM_dataset} in favour of ${RUN_LABEL}, please? (Using ${RUN_LABEL} obfuscates the recognisable and understandable variable ${CYLC_TASK_PARAM_dataset})
|
|
||
| def create_request(): | ||
| """Retrieve CDDS request information from Rose suite configuration. | ||
| def create_request() -> configparser.ConfigParser: |
There was a problem hiding this comment.
I note from an earlier comment of Emma's on another review that CMEW should not contain type hints:
#349 (comment)
I'm not sure why or if that's an absolute, again probably one for Emma Hogan (@ehogan).
Should there still be a docstring?
There was a problem hiding this comment.
Python functions must have docstrings. Documentation requirements: Python documentation provides more information.
There was a problem hiding this comment.
Type hinting has been removed and docstrings reinstated. docstrings were removed after previous changes were made.
There was a problem hiding this comment.
All type hinting and the inclusion of docstrings had been done 7d8aef1
| BASH_XTRACEFD=1 | ||
| set -xeuo pipefail | ||
|
|
||
| echo "[INFO] Running configure_standardise for REF and EVAL runs" |
There was a problem hiding this comment.
At the technical meeting this morning, we agreed that configure_standardise.sh should be reverted to this previous version. Specifically:
-o pipefailis only needed inrose-app.conffiles, as described in the Developer Guide: Rose requirements- The echoing and defensive programming is not required; the options provided to
setat the top of the script do the following:-x Print commands and their arguments as they are executed.-e Exit immediately if a command exits with a non-zero status.-u Treat unset variables as an error when substituting.
- If we want to validate variables, this should be done in the
flow.cylcfile, e.g. https://github.com/MetOffice/CMEW/blob/main/CMEW/flow.cylc#L5, and not in an individual app. However, this is out of scope of this issue (a new issue should be opened should we want to validate variables)
There was a problem hiding this comment.
The version of configure_standardise.sh in this PR has not been reverted to this previous version. Please revert the changes to configure_standardise.sh so that it matches this previous version.
There was a problem hiding this comment.
Basically removed some useful comments and checks. The only diffrence is that the create_variables_file.py script has to be executed before create_request_file.py otherwise it breaks. f889818
As discussed in the technical meeting this morning, |
| @@ -1,4 +1,4 @@ | |||
| # (C) Crown Copyright 2024-2025, Met Office. | |||
| # (C) Crown Copyright 2024-2026, Met Office. | |||
There was a problem hiding this comment.
Following the agreement we made at the technical meeting this morning to revert configure_standardise.sh, the rose-app.conf file for standardise_model_data should be reverted to this previous version and the standardise_model_data.sh script removed.
There was a problem hiding this comment.
The version of standardise_model_data/rose-app.conf in this PR has not been reverted to this previous version. Please revert the changes to standardise_model_data/rose-app.conf so that it matches this previous version, and then remove the standardise_model_data.sh script.
There was a problem hiding this comment.
standardise_model_data tasks run in parallel and they are standalone tasks so rose-app.conf had to be changed. You cannot revert to the serial version.
mo-nikosbaltas
left a comment
There was a problem hiding this comment.
Making comment visible.
There was a problem hiding this comment.
Done. Although not different it had only some useful logging which have been removed. 7d8aef1
| fi | ||
|
|
||
| # --------------------------------------------------------------------------- | ||
| # 1. Create variables.txt once (shared by both runs) |
|
|
||
| def create_request(): | ||
| """Retrieve CDDS request information from Rose suite configuration. | ||
| def create_request() -> configparser.ConfigParser: |
There was a problem hiding this comment.
All type hinting and the inclusion of docstrings had been done 7d8aef1
| @@ -1,4 +1,4 @@ | |||
| # (C) Crown Copyright 2024-2025, Met Office. | |||
| # (C) Crown Copyright 2024-2026, Met Office. | |||
| fi | ||
|
|
||
| # --------------------------------------------------------------------------- | ||
| # 1. Create variables.txt once (shared by both runs) |
Naomi Parsons (NParsonsMO)
left a comment
There was a problem hiding this comment.
All good!
Approved.
Swaps create_variables and create_request, relative to last commit
And remove script
(to retain the file history)

Closes #338
PR creation checklist for the developer
<issue_number>above ☝️ has been replaced with the issue number.mainhas been selected as the base branch.<issue_number>_<short_description_of_feature>.good first issuelabel) have been added to the PR.Climate Model Evaluation Workflow (CMEW)project has been added to the PR.Definition of Done for the developer
docdirectory, including the Quick Start section; select one of the following):PR creation checklist for the reviewer
<issue_number>above ☝️ has been replaced with the issue number.mainhas been selected as the base branch.<issue_number>_<short_description_of_feature>.good first issuelabel) have been added to the PR.Climate Model Evaluation Workflow (CMEW)project has been added to the PR.Definition of Done for the reviewer
docdirectory, including the Quick Start section; select one of the following):Important
#<pull_request_number>: <pull_request_title>when writing the merge commit message for the pull request, so the pull request number is immediately visible on GitHub, regardless of the length of the pull request title.