Skip to content

[Celestica] Support long duration test for AgentEnsembleLinkSanityTestDataPlaneFlood and ASIC-ASIC PRBS Tests#1066

Open
lihua-cls wants to merge 1 commit intofacebook:mainfrom
lihua-cls:tahansb_link_stress_duration
Open

[Celestica] Support long duration test for AgentEnsembleLinkSanityTestDataPlaneFlood and ASIC-ASIC PRBS Tests#1066
lihua-cls wants to merge 1 commit intofacebook:mainfrom
lihua-cls:tahansb_link_stress_duration

Conversation

@lihua-cls
Copy link
Copy Markdown
Contributor

Pre-submission checklist

  • I've ran the linters locally and fixed lint errors related to the files I modified in this PR. You can install the linters by running pip install -r requirements-dev.txt && pre-commit install
  • pre-commit run
[INFO] Stashing unstaged files to /root/.cache/pre-commit/patch1775635465-3066528.
clang-format.............................................................Passed
shellcheck...........................................(no files to check)Skipped
shfmt................................................(no files to check)Skipped
trim trailing whitespace.................................................Passed
fix end of files.........................................................Passed
check yaml...........................................(no files to check)Skipped
check json...........................................(no files to check)Skipped
check for merge conflicts................................................Passed
ruff check...............................................................Passed
ruff format..............................................................Passed
[INFO] Restored changes from /root/.cache/pre-commit/patch1775635465-3066528.

Summary

Modify below tests to support a long duration tests, i.e: 48-hour continuous run:

Prbs_ASIC_P31_TO_ASIC_P31.prbsSanity
AgentEnsembleLinkSanityTestDataPlaneFlood.warmbootIsHitLess
AgentEnsembleLinkSanityTestDataPlaneFlood.qsfpWarmbootIsHitLess

Solution

  1. add a flag "--link_stress_duration " for both run_test.py and single binary run, which will be used to specify the time duration (in minutes) the case running.

    Note: There's already another flag --link_stress_test which will run Prbs Test for 10 minutes. If both "link_stress_test" and "link_stress_duration" specified, only the old one "link_stress_test" will take effect to keep the behavior no change. If no "link_stress_duration" specified, the behavior keep no change as before.

  2. Longer the Prbs check interval (previously was 10s) to 3 minutes if duration more than 10 minutes.

  3. For DataPlaneFlood test cases, periodically pump traffic every 10 seconds during the test, until the duration timeout.

  4. In run_test.py, overwrite the test_run_timeout so that the case won't be timeout.

Test Plan

  1. run single binary with link_stress_duration for the 3 cases, ensure the run duration matches the expected value.
  2. run single binary with link_stress_duration for other cases, ensure the parameter won't take effect
  3. run single binary without link_stress_duration, ensure the parameter won't take effect, behavior is the same as before.
  4. use run_test.py to test 1~3, ensure the results are the same.
  5. 24 hours duration test for the 3 cases, ensure they all passed

Test Result

[       OK ] cold_boot.Prbs_ASIC_P31_TO_ASIC_P31.prbsSanity (86569893 ms)
[       OK ] cold_boot.AgentEnsembleLinkSanityTestDataPlaneFlood.warmbootIsHitLess (86485485 ms)
[       OK ] cold_boot.AgentEnsembleLinkSanityTestDataPlaneFlood.qsfpWarmbootIsHitLess (86507338 ms)

Full logs in Gdrive

@lihua-cls lihua-cls requested review from a team as code owners April 8, 2026 08:11
@meta-cla meta-cla bot added the CLA Signed label Apr 8, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant