-
Notifications
You must be signed in to change notification settings - Fork 236
Performance improvement for zipping with AES encryption #273
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are multiple issues with this change:
- One of the supported features (and why the
ZipEntry
class has a password field) is adding individual files to an archive, each having a different password. Because you are caching the generated key, but not checking to see if the currentZipEntry
has a different password. Maybe building a cache of generated passwords would be a good idea, but my gut feeling is that adding the addition of the password comparison would solve the issue. You still keep the happy path more optimized while only taking a small hit for an additional string comparison in the unhappy path. That is just a gut feeling though, please do some performance testing around the test suite which simulates large files and please add tests around this change including at least one that is generating an archive with many files, each with a different password (or at the very least, a set that uses a rotating set of passwords so you are testing around the worst case path) and run a performance check on them. - The build is failing and so there are some errors that need to be addressed. From looking at the diff in the ZipFile.Save.cs you have a misplaced
#if AESCRYPTO
that is causing a build error. - Please be sure to run the current tests to be sure all is working as expected.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry for the delay in reviewing this. I was traveling and did not get a chance to sit down and look it over and verify it.
There is still a breaking change in your code that changes behavior compared to the existing code. You have addressed the issue of encrypting files with different passwords, but you are not handling the case of the same password but a different encryption strength, i.e. file1
is added with EncryptionAlgorithm.WinZipAes256
and then file2
with EncryptionAlgorithm.WinZipAes128
. With your current code you are only checking the password to see if it ihas been calculated already. You need to include the key strength in your code. You should also add test cases for that.
I am attaching a sample zip file that contains encrypted files using the same password, but with different key strengths. The passwords for the files are the first part of the file name.
sample.zip
I ran this code in LINQPad to generate that file using the DotNetZip library:
var passwords = new [] {"test123", "test456", "test789"};
using var zip = new ZipFile();
zip.Encryption = EncryptionAlgorithm.WinZipAes256;
var rand = new Random(42);
Span<byte> declaration = Encoding.UTF8.GetBytes(File.ReadAllText(@"D:\Temp\zip\Declaration.txt"));
const string pathBase = @"D:\temp\zip";
for(int i=0; i < 12; ++i)
{
var pass = passwords[i % 3];
var file = Path.Combine(pathBase, $"{pass}.{Path.GetRandomFileName()}");
zip.Password = passwords[i % 3];
using (var fs = File.OpenWrite(file))
{
var count = rand.Next(1,8);
for (int j = 0; j < count; ++j)
fs.Write(declaration);
fs.Flush();
}
var e = zip.AddFile(file, "");
if (i % 2 == 0)
{
e.Encryption = EncryptionAlgorithm.WinZipAes128;
}
}
zip.Save(Path.Combine(pathBase, "Output.zip"));
Corrected this error and added two more unit tests. Let me know if there are any other issues. Thanks! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like the direction of it, just a typo and a minor improvement and then I think it should be all set. Nice job.
made those changes. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry to keep coming back on this, please understand that due to the fact that this is crypto and thus dealing with security and so I want to be sure that this is done properly and handled correctly, and I just realized that there is a major flaw that is lurking with this - the salt value. One of the things that takes place when you aare computing that key is that each file gets encoded with a different salt value. It took me a few reviews of this code to catch that. I was focusing on the performance and other things, but in looking at this again I now realize that this change actually weakens the security and it does not follow what WinZip is doing within its own crypto library. With that in mind, this is a great performance improvement, but it breaks the security side. Please don't be discouraged by this. I had been simply looking at the code, and hadn't thought through what the deeper implications of this change is. But because of the issue with the salt now being duplicated for each file, this is a change that cannot be brought into the library. For more information you can read the WinZip documentation here: WinZip Encryption Salt
I left an issue about this (#271), but these changes will fix the issue.
These were the results I got from some performance testing:
I tested with varying the encryption types (PKzipWeak vs WinZipAes256) with different compression types (Level9, Level4, and Level0) when compressing an 86.9MB folder with 2282 files. When using WinZipAes256 encryption (bottom 3 rows) we can see that after the performance changes made by this commit (last column) we saw ~85-73% (depending on the compression level) reduction in runtime compared to before the performance changes made by this commit (second last column).
If you need me to clean up the code at all before merging these changes, please let me know. Thanks!