If an AI or a human are able implement code that solves test tasks generated by the algorithm in FIRST-NB.html from https://github.com/ogrnv/random-intelligence-tests, then their coding intelligence could be evaluated by the test results produced by those implementations.
Because this method can measure intelligence with arbitrarily high precision and be repeated as often as needed, it enables determining whether an increase in intelligence occurs for:
- a specific AI or person,
- a specific group of communicating AIs or people, or
- a mixed group of communicating AIs and people.
If AI self-improvement is possible, then it could be done by feedback with reinforcement from results of RIT (Random Intelligence Tests).
The known best results of intelligence tests of AI-generated code for Prompt2 and example.c
the data was obtained with saving global variables before each call of an AI code and restoring the variables after that:
8*8 7 42 500 12 2.324298549 Claude.ai Sonnet 4.5 us 2025-12-13 18:59:26
8*8 7 42 500 12 1.515516602 Chatgpt.com unknown us 2025-12-19 00:04:14
8*8 7 42 5 12 0.27 Monte Carlo method mc 2025-12-20 07:00:00
Artificial general intelligence must have an intelligence greater than 300 for 8*8 7 42 5 12
Typical results of intelligence tests of AI-generated code for Prompt2 and example.c
demonstrates that the more complex the tasks, the higher the probability of infinite loops:
8*8 2 59 1 2 666.667 aLLM n/a 2025-12-14 <-- without infinite loops
8*8 3 59 1 2 285.714 aLLM n/a 2025-12-14
8*8 3 59 1 2 000.000 aLLM n/a 2025-12-14 <-- and infinite loops
8*8 4 59 1 2 000.000 aLLM n/a 2025-12-14 <-- infinite loops
- the number of:
- cells of the board
- chip types
- chips on the board
- rounds in a test
- steps in a round
- intelligence = 1000 / average number of moves made per step
- AI name
- country
- date and time of the code generation
pv4.c and pv4i.py are prompts that not require saving/restoring global variables in the generated code, but AIs exhibit worse coding results for the prompts.