Skip to content
This repository has been archived by the owner on May 15, 2024. It is now read-only.

Performance improvement for zipping with AES encryption #273

Closed
wants to merge 9 commits into from

Conversation

AndrewBSzabo
Copy link

I left an issue about this (#271), but these changes will fix the issue.

These were the results I got from some performance testing:
image

I tested with varying the encryption types (PKzipWeak vs WinZipAes256) with different compression types (Level9, Level4, and Level0) when compressing an 86.9MB folder with 2282 files. When using WinZipAes256 encryption (bottom 3 rows) we can see that after the performance changes made by this commit (last column) we saw ~85-73% (depending on the compression level) reduction in runtime compared to before the performance changes made by this commit (second last column).

If you need me to clean up the code at all before merging these changes, please let me know. Thanks!

Copy link
Collaborator

@jshergal jshergal left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are multiple issues with this change:

  1. One of the supported features (and why the ZipEntry class has a password field) is adding individual files to an archive, each having a different password. Because you are caching the generated key, but not checking to see if the current ZipEntry has a different password. Maybe building a cache of generated passwords would be a good idea, but my gut feeling is that adding the addition of the password comparison would solve the issue. You still keep the happy path more optimized while only taking a small hit for an additional string comparison in the unhappy path. That is just a gut feeling though, please do some performance testing around the test suite which simulates large files and please add tests around this change including at least one that is generating an archive with many files, each with a different password (or at the very least, a set that uses a rotating set of passwords so you are testing around the worst case path) and run a performance check on them.
  2. The build is failing and so there are some errors that need to be addressed. From looking at the diff in the ZipFile.Save.cs you have a misplaced #if AESCRYPTO that is causing a build error.
  3. Please be sure to run the current tests to be sure all is working as expected.

@AndrewBSzabo
Copy link
Author

  1. I modified the code so that the passwords of the ZipEntry objects are used. I then will store encryption keys that have already been calculated in a dictionary (Dictionary<Password,EncryptionKey>) so that if the same password is used, we can just grab it from the dictionary rather than recalculating it again. With this change the performance improvement is now dependent on how many different passwords are being used when creating/adding to a zip. The following screenshot are some performance numbers when the number of passwords used in a file is varied from 1, 10, 100, and 1000:
    image
    We can see that we get the largest performance improvement when using <10 passwords, and the smallest performance improvement when using >1000 passwords (still some improvement, just not as much).

  2. Fixed the build issue

  3. All related unit tests are still passing, and I added a new WinZipAes test for creating a zip with entries that have different passwords.

Copy link
Collaborator

@jshergal jshergal left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for the delay in reviewing this. I was traveling and did not get a chance to sit down and look it over and verify it.

There is still a breaking change in your code that changes behavior compared to the existing code. You have addressed the issue of encrypting files with different passwords, but you are not handling the case of the same password but a different encryption strength, i.e. file1 is added with EncryptionAlgorithm.WinZipAes256 and then file2 with EncryptionAlgorithm.WinZipAes128. With your current code you are only checking the password to see if it ihas been calculated already. You need to include the key strength in your code. You should also add test cases for that.
I am attaching a sample zip file that contains encrypted files using the same password, but with different key strengths. The passwords for the files are the first part of the file name.
sample.zip

I ran this code in LINQPad to generate that file using the DotNetZip library:

Declaration.txt

var passwords = new [] {"test123", "test456", "test789"};
using var zip = new ZipFile();
zip.Encryption = EncryptionAlgorithm.WinZipAes256;
var rand = new Random(42);
Span<byte> declaration = Encoding.UTF8.GetBytes(File.ReadAllText(@"D:\Temp\zip\Declaration.txt"));
const string pathBase = @"D:\temp\zip";
for(int i=0; i < 12; ++i)
{
	var pass = passwords[i % 3];
	var file = Path.Combine(pathBase, $"{pass}.{Path.GetRandomFileName()}");
	zip.Password = passwords[i % 3];
	using (var fs = File.OpenWrite(file))
	{
		var count = rand.Next(1,8);
		for (int j = 0; j < count; ++j)
			fs.Write(declaration);
		fs.Flush();
	}
	var e = zip.AddFile(file, "");
	if (i % 2 == 0)
	{
		e.Encryption = EncryptionAlgorithm.WinZipAes128;
	}
}

zip.Save(Path.Combine(pathBase, "Output.zip"));

@AndrewBSzabo
Copy link
Author

Corrected this error and added two more unit tests. Let me know if there are any other issues. Thanks!

Copy link
Collaborator

@jshergal jshergal left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like the direction of it, just a typo and a minor improvement and then I think it should be all set. Nice job.

src/Zip Tests/WinZipAesTests.cs Outdated Show resolved Hide resolved
src/Zip.Shared/ZipEntry.Write.cs Outdated Show resolved Hide resolved
@AndrewBSzabo
Copy link
Author

made those changes.

Copy link
Collaborator

@jshergal jshergal left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry to keep coming back on this, please understand that due to the fact that this is crypto and thus dealing with security and so I want to be sure that this is done properly and handled correctly, and I just realized that there is a major flaw that is lurking with this - the salt value. One of the things that takes place when you aare computing that key is that each file gets encoded with a different salt value. It took me a few reviews of this code to catch that. I was focusing on the performance and other things, but in looking at this again I now realize that this change actually weakens the security and it does not follow what WinZip is doing within its own crypto library. With that in mind, this is a great performance improvement, but it breaks the security side. Please don't be discouraged by this. I had been simply looking at the code, and hadn't thought through what the deeper implications of this change is. But because of the issue with the salt now being duplicated for each file, this is a change that cannot be brought into the library. For more information you can read the WinZip documentation here: WinZip Encryption Salt

@jshergal jshergal closed this Dec 23, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants