Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Hashing #161

Merged
merged 2 commits into from
Oct 28, 2023
Merged

Hashing #161

merged 2 commits into from
Oct 28, 2023

Conversation

martindevans
Copy link
Contributor

@martindevans martindevans commented Oct 27, 2023

I noticed that Arch contains a custom implementation of MurmurHash3. Benchmarking shows that the built in System.HashCode is faster (and of course simpler):

|     Method |     N |       Mean |   Error |  StdDev |
|----------- |------ |-----------:|--------:|--------:|
| MurmurHash |  1000 |   209.1 ns | 0.11 ns | 0.11 ns |
|   HashCode |  1000 |   133.8 ns | 0.18 ns | 0.17 ns |
| MurmurHash | 10000 | 2,117.4 ns | 2.74 ns | 2.43 ns |
|   HashCode | 10000 | 1,349.3 ns | 2.95 ns | 2.62 ns |

Caveats!

There are some caveats to this method which may or may not matter.

Stability

System.HashCode is not stable between processes. Each process has a global hash seed that is mixed into the hash, so hashes must never be persisted or shared between processes any other way. I doubt this matters for Arch.

Order Invariance

The comments in the code previously said:

The order of the elements does not change the hashcode, so it depends on the elements themselves.

The new method does not provide this.

However neither did the old method! This is either a bug in the old code (in which case I'll replace this with an order invariant hash) or a stale/misleading comment (in which case it's now fixed).

```
|     Method |     N |       Mean |   Error |  StdDev |
|----------- |------ |-----------:|--------:|--------:|
| MurmurHash |  1000 |   209.1 ns | 0.11 ns | 0.11 ns |
|   HashCode |  1000 |   133.8 ns | 0.18 ns | 0.17 ns |
| MurmurHash | 10000 | 2,117.4 ns | 2.74 ns | 2.43 ns |
|   HashCode | 10000 | 1,349.3 ns | 2.95 ns | 2.62 ns |
```

Based on this removed MurmurHash3.cs entirely and replaced all uses with `HashCode`.
@genaray
Copy link
Owner

genaray commented Oct 27, 2023

Hmm... this is difficult. Murmurhash3 was chosen for a reason. It has the lowest collision rate among the hash methods. In the past it has happened several times that some users complained that their program crashed or showed undefined behavior. This problem was caused by the hash and solved with the Murmur3hash. Therefore I hold back this PR for now until clarity was created ^^

@martindevans
Copy link
Contributor Author

Does the order invariance thing indicate a bug, or is it just a stale comment?

@genaray
Copy link
Owner

genaray commented Oct 27, 2023

Does the order invariance thing indicate a bug, or is it just a stale comment?

Thats intended behaviour which is quite important because e.g. Transform, Velocity, Sprite should return the same hash as Velocity, Sprite, Transform. So the hash should not depend on the order of the elements ^^

@martindevans
Copy link
Contributor Author

Murmurhash3 was chosen for a reason. It has the lowest collision rate among the hash methods.

I just tested this out by running this program for 160M iterations. The final results were:

Total:166,800,000 SYS:3,154,500 MUR:3,201,524 (System.HashCode is the winner)

i.e. System.HashCode is slightly better in terms of collisions, as well as being faster.

[order invariance is] intended behaviour which is quite important

In that case it looks like there's a much more serious bug here! MurmurHash3 is not order invariant:

MurmurHash3.Hash32(new byte[] { 1, 2, 3 }, 0) == 2161234436
MurmurHash3.Hash32(new byte[] { 3, 2, 1 }, 0) == 19304811

@genaray
Copy link
Owner

genaray commented Oct 27, 2023

Well i was referring that issue from the past here : #87
However i assume the issue was caused by a design flaw in the past. In the current version Span<ComponentType> is being converted to a SpanBitSet which is then converted to hash. I think that prevents the collision issue from the past. So it might make sense to replace the hash function fully. But I'm gonna investigate further, since we need to make sure that nothing breaks, especially with high number of components.

@genaray genaray merged commit b5fa959 into genaray:master Oct 28, 2023
1 check passed
@martindevans martindevans deleted the hashing branch October 28, 2023 16:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants