Skip to content

An ISO-C program to prove no string of bit's is unique and then makes a map of all the strings against the local OS.

Notifications You must be signed in to change notification settings

benbunk/yournotunique

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

20 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Purpose:
I want to map the largest possible chunk of data (bits) in a source file to an identical chunk of bits in some other file on the same computer. THe reason for this is to prove that no file is unique, that it is actually a composition of patterns already found in other files. Ultimately, this data can be used for building a map of all the chunks in the source to their various destinations. Once the map is built it can be shared and the original source can be constructed by the user of the map without ever transfering the original source file.

Unknowns:
1. It is still unknown if the map can be made smaller then the original source file. Because the mapping algorithm is supposed to use a sliding window for finding chunks it depends entirely on how many chunks it must find. For instance, if a single chunk of bits that represents 25% of the file can be mapped, than your map can potentially be 25% smaller then the original source file...but based on the initial results a chunk of that size is not realistic, but it is possible and hasn't yet been disproven entirely.

2. It is also unknown what a map will actually look like, it's possible that the data format and the data required for a map point could easily be so verbose that it consumes more memory then the chunk of data that it points to.

Usage:
In order to use this program compile scanner.c and execute the resulting binary with no arguments to see help text.

Sliding Window:
The sliding window works in 2 steps.
1. Set the minimum chunk or buffer size equal to the size of the text being searched for.
2. Starting at the first byte grab the first matching chunk size from the searched file.
3. Compare the two buffers to see if they're identical.
4. *Check the current file buffer for the first occurrence of the first byte from the search buffer.
5. *If the byte exists change the read head to that position and grab a new buffer. Go back to step 3.
5b. *If the byte doesn't exist move the read head to the end of the buffer and go to step 2.
6. If there is a match finish.

*Not implemented.

Notes:
Directory Scanner is intended to replace r_search in searcher.c so that you can pipe the results of directoryscanner.c into searcher.c so a command will eventually be something like: 

scandirectory -d /usr/bin | searcher

But directory scanner is imcomplete and doesnt work so searcher.c still contains all of the recursive directory scanning logic.

02/2013 - As my knowledge of linux has grown I feel like directory scanner is a waste of time. The find program can likely scan the directories and do the piping we require with many more options and flexibilities then I intended for directory scanner. I also looked into grep for replacing searcher but the sliding window technique isn't supported so maybe if this matures we could port it over as a patch to grep.

Priority #1 is to make sure that the sliding window of the search works correctly currently it's brute forcing through the chunks of the files. 

About

An ISO-C program to prove no string of bit's is unique and then makes a map of all the strings against the local OS.

Resources

Stars

Watchers

Forks

Packages

No packages published

Languages