Reward gap model 20260405 by shnarazk · Pull Request #316 · shnarazk/splr

shnarazk · 2026-04-05T23:17:16Z

"Conflict frequency" switches to "focus mode" if the average of conflict intervals is getting smaller. Therefore finding a "core" is the trigger. But it only supports learning-rate-based rewarding. Both picks recently used literals. We can't expect much.

"Reward gap model" is for the case we find new important-but-less-rewarded literals. Since they are new for conflict analysis, they would have small rewards and RL-based rewarding can't focus them immediately even the other literals have pretty big values (they are "core"). So switching to VMTF can handle the big range of rewards situations.

"splr",  1,"0a8a4c28d27228e954354ea0a6e7f16c-sum_of_three_cubes_42_kno",  TIMEOUT
"splr",  2,"1a936d41e3439d602c3ddcf96458a38c-arles_thres10_p10_r8185.c",    0.015
"splr",  3,"2b4467a5ac4ac41b36d4c3432b07f767-oddball_69_5_tto_zp.norma", 1921.757
"splr",  4,"3a1f1b4b9a521737cc760017fe9d8b43-MVRoundRobin_n16_d10_v3.c",  TIMEOUT
"splr",  5,"4ba2c1aa580b6497df6baf5e7e2c87be-at-least-two-vmpc_28.cnf" ,   64.577
"splr",  6,"5aed29ce52192a55ffbd2a6f340017e7-oisc-subrv-and-nested-12.",  TIMEOUT
"splr",  7,"6cb995b1c550beb579c53e27f6ca881a-RoundRobin_n16_d13.cnf"   ,  TIMEOUT
"splr",  8,"7a044c997ede14d00002f1db39d45170-sum_of_3_cubes_37_bits_87", 4821.444
"splr",  9,"8a05f9b6bf49285e40d0a197967ea5d3-arles_thres10_p10_r7466.c",    0.037
"splr", 10,"9a839badecb20dcf505ec79eedd3753a-anbul-dated-5-15-u.cnf"   ,  175.928
"splr", 11,"a0bcdaffb0ea36b678899fd86bdc7f18-arles_thres10_p10_r8186.c",    0.015
"splr", 12,"b1c8eaa002ac2fa1c8bfd1002738e78e-cliquecolouring_n15_k7_c6",  TIMEOUT
"splr", 13,"c0bd86bd7ca2c65e44311de374168150-goldcrest-and-14.cnf"     ,  TIMEOUT
"splr", 14,"d0298807e51730261ef65db827dcd70f-Break_triple_16_70.xml.cn", 1080.762
"splr", 15,"e2d2b011b0805782df6adba648db92e8-59-129706.cnf"            ,  926.741
"splr", 16,"f0bafebdcce23ccfbaf6c27a7522069b-div-mitern172.cnf"        ,  411.601
med:   293.764, max:  4821.444,total except 6 timeouts: 9402.877

Not so good. Reward itself is defined as an average. Its tendency is a second level average. And maybe it be difficult to define a meaningful threshold over varied problems.

shnarazk added 3 commits April 5, 2026 21:59

exp(conflict_interval): use top and bottom plateaus

2e7b9f1

chore: open PR

1ff0b09

exp(reward_divergence): implement the idea

918d9c5

shnarazk self-assigned this Apr 5, 2026

shnarazk added experimental project new scheme import some idea on papers and removed experimental project labels Apr 5, 2026

shnarazk changed the base branch from main to dev-0.19.0 April 5, 2026 23:35

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Reward gap model 20260405#316

Reward gap model 20260405#316
shnarazk wants to merge 3 commits intodev-0.19.0from
reward-gap-model-20260405

shnarazk commented Apr 5, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

shnarazk commented Apr 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

shnarazk commented Apr 5, 2026 •

edited

Loading