Skip to content

Reward gap model 20260405#316

Draft
shnarazk wants to merge 3 commits intodev-0.19.0from
reward-gap-model-20260405
Draft

Reward gap model 20260405#316
shnarazk wants to merge 3 commits intodev-0.19.0from
reward-gap-model-20260405

Conversation

@shnarazk
Copy link
Copy Markdown
Owner

@shnarazk shnarazk commented Apr 5, 2026

"Conflict frequency" switches to "focus mode" if the average of conflict intervals is getting smaller. Therefore finding a "core" is the trigger. But it only supports learning-rate-based rewarding. Both picks recently used literals. We can't expect much.

"Reward gap model" is for the case we find new important-but-less-rewarded literals. Since they are new for conflict analysis, they would have small rewards and RL-based rewarding can't focus them immediately even the other literals have pretty big values (they are "core"). So switching to VMTF can handle the big range of rewards situations.

"splr",  1,"0a8a4c28d27228e954354ea0a6e7f16c-sum_of_three_cubes_42_kno",  TIMEOUT
"splr",  2,"1a936d41e3439d602c3ddcf96458a38c-arles_thres10_p10_r8185.c",    0.015
"splr",  3,"2b4467a5ac4ac41b36d4c3432b07f767-oddball_69_5_tto_zp.norma", 1921.757
"splr",  4,"3a1f1b4b9a521737cc760017fe9d8b43-MVRoundRobin_n16_d10_v3.c",  TIMEOUT
"splr",  5,"4ba2c1aa580b6497df6baf5e7e2c87be-at-least-two-vmpc_28.cnf" ,   64.577
"splr",  6,"5aed29ce52192a55ffbd2a6f340017e7-oisc-subrv-and-nested-12.",  TIMEOUT
"splr",  7,"6cb995b1c550beb579c53e27f6ca881a-RoundRobin_n16_d13.cnf"   ,  TIMEOUT
"splr",  8,"7a044c997ede14d00002f1db39d45170-sum_of_3_cubes_37_bits_87", 4821.444
"splr",  9,"8a05f9b6bf49285e40d0a197967ea5d3-arles_thres10_p10_r7466.c",    0.037
"splr", 10,"9a839badecb20dcf505ec79eedd3753a-anbul-dated-5-15-u.cnf"   ,  175.928
"splr", 11,"a0bcdaffb0ea36b678899fd86bdc7f18-arles_thres10_p10_r8186.c",    0.015
"splr", 12,"b1c8eaa002ac2fa1c8bfd1002738e78e-cliquecolouring_n15_k7_c6",  TIMEOUT
"splr", 13,"c0bd86bd7ca2c65e44311de374168150-goldcrest-and-14.cnf"     ,  TIMEOUT
"splr", 14,"d0298807e51730261ef65db827dcd70f-Break_triple_16_70.xml.cn", 1080.762
"splr", 15,"e2d2b011b0805782df6adba648db92e8-59-129706.cnf"            ,  926.741
"splr", 16,"f0bafebdcce23ccfbaf6c27a7522069b-div-mitern172.cnf"        ,  411.601
med:   293.764, max:  4821.444,total except 6 timeouts: 9402.877

Not so good. Reward itself is defined as an average. Its tendency is a second level average. And maybe it be difficult to define a meaningful threshold over varied problems.

@shnarazk shnarazk self-assigned this Apr 5, 2026
@shnarazk shnarazk added experimental project new scheme import some idea on papers and removed experimental project labels Apr 5, 2026
@shnarazk shnarazk changed the base branch from main to dev-0.19.0 April 5, 2026 23:35
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

new scheme import some idea on papers

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant