Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Loading from disk fails #5

Open
dare05 opened this issue Jan 16, 2017 · 4 comments
Open

Loading from disk fails #5

dare05 opened this issue Jan 16, 2017 · 4 comments

Comments

@dare05
Copy link

dare05 commented Jan 16, 2017

Here's the code:

filter = BloomFilter.new size: 100_000, error_rate: 0.00001
100_000.times do
  filter.insert "#{rand(1..1_000_000)}"
end
filter.dump "hii"

Now when I do:

filter = BloomFilter.load "hii"

I get:

in `load': unable to load BloomFilter, expected 299534 but got 420 bytes (StandardError)

I'm supposing this has to do with the operating system, I'm using Windows 7, and it prob has to do with it having 2 ways to load/save files to disk (text/binary mode).

I actually had the same problem with bloomfilter-rb and I fixed it by forcing File.open to write and read in binary mode (it was as easy as appending "b" to the "r" and "w" modes). But here I see the code for serialization is written in C, so that wouldn't be possible.

@deepfryed
Copy link
Owner

The reads and writes are done in the C extension (defaults to binary mode). You're dumping to a file named test and reading from a file named hii -- seems incorrect.

@dare05
Copy link
Author

dare05 commented Jan 16, 2017

@deepfryed Sorry, it was a typo, changed it. So basically, dumping the same file, loading the same file. Same error, just different "got" numbers, it always expects 299534 bytes but sometimes it gets 600 bytes, sometimes 224, 35, 15, 170 etc.

@deepfryed
Copy link
Owner

roger, I'll check it out.

@dare05
Copy link
Author

dare05 commented Jan 16, 2017

I was wrong about bloomfilter-rb btw, even after I changed it to binary, after I loaded the saved file, it always reports 'true' no matter how big of a number I enter (I feed it the same input as the above code, same everything). It may not be the binary format but something totally else...So make sure that after you fix the size-mismatch, you also test for correctness, try testing against a huge number from the loaded file and see what happens.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants