Skip to content
This repository has been archived by the owner on May 15, 2024. It is now read-only.

68000 test generation and random data #28

Open
rasky opened this issue Apr 13, 2023 · 4 comments
Open

68000 test generation and random data #28

rasky opened this issue Apr 13, 2023 · 4 comments

Comments

@rasky
Copy link

rasky commented Apr 13, 2023

Looking at the 68000 tests, it seems that the initial state of the CPU for each test is seemingly random. I don't seem to find the generator (I guess it's not public), so I am not exactly sure "how much random" it is: whether all registers are random, or some are tweaked from an initial random state to reproduce specific behaviors to be tested.

I am trying to run the tests on embedded devices and this is proving a big complex because of the sheer size of the test vectors. I don't mind them being big as in "many tests" (could use even more!), but the problem is that the test data itself is very badly compressible because the initial/final state is really full of random numbers.

I was wondering if, in general, it would be possible to make public the PRNG algorithm used to generate the random initial state, and maybe put the seed for the PRNG in each test. If each test contained the seed used to generate its initial state (and the PRNG was documented), I could in theory regenerate the state from the seed only, without having to embed it altogether. If the initial state was then tweaked a bit after the PRNG pass, I could store just the differences, which would probably be much smaller.

Does this make any sense?

@larsbrinkhoff
Copy link

So I gather you are running the tests on real hardware? If so, do you have access to a genuine MC68000? I'm curious about some of the obscure corner cases, and if the test data really match hardware. In particular, what side effects are applied before an address error happens.

@rasky
Copy link
Author

rasky commented Apr 14, 2023

Actually, I’m not: I’m just running tests on an embedded device where I’m emulating a mc68000. I guess the issue I raise here might also eventually facilitate a hardware test though.

@galibert
Copy link

galibert commented Sep 14, 2023

In particular, what side effects are applied before an address error happens.

All of them. The access happens with a0 dropped (technically, not connected), and the write part of the microcode instruction which waits for the end of the access executes fully. Then after that the exception is taken.

A subtlety though, that doesn't mean that for instance a tst.w (a1) to an odd address is going to change the flags. That's because the flag changing happens in the next microcode instruction (which does a and #ffff actually to set the flags) which is not reached. But on some other instructions it matters. It is identical for bus error btw.

@dbalsom
Copy link
Contributor

dbalsom commented May 7, 2024

I talked this over a bit with rasky on discord and I think the general idea is that the tests as they are waste a lot of space including random data that can be either generated or otherwise turned into a reference.

If we want to randomize our starting registers, it's not strictly necessary to list the starting register state at all. We can simply provide a seed value to a specified byte-producing PRNG and state that the registers are set, in some defined order of bytes, from the output of that PRNG.

An even simpler method for the user, perhaps, is to pre-generate a random pad file (ie, random.bin) and specify the offset into the file. Then we simply specify that the registers are written in a specific byte order from the 28 bytes (or whatever) within the random pad file at offset XXXX (wrapping); so we only need to store the offset. Registers in the initial state could be provided in addition to an offset if they are modified somehow, such as masking CX in x86 string operations so the tests don't run for one million cycles. The presence of a register in the initial state would override the value taken from the pad.

Correspondingly, we can modify the way the 'final' state is output to exclude register and memory states that have not changed from the 'initial' state.

The pad approach has an advantage vs seed I think in that it relieves the consumer from having to translate any PNRG into the language of their choice and removes any ambiguity over implementing it. Even parsing JSON can be a tall barrier depending on language, and I'd hate to add more.

We can even keep a mapping of pad offsets to memory data when, for example, doing string moves, and then in our memory states we have the possibility for some notation to reflect run-length-encodings (Write X bytes at offset Y in pad to memory location M)

I think this approach would significantly reduce the size of the test suites, at only a mild inconvenience for consumers of the test data. Even when space is not necessarily a concern, smaller test files are more quickly processed when executing a test suite.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants