|
EDais :: Tutorials :: ZLib in VB :: Chapter 2 Decompression and validation Ok, we’ve got the data compressed but it’s not really of much use to us unless we can decompress it again so lets move on to dealing with decompression.
This matches the compress function signature and as such makes porting the call to VB simple:
We’ll go on from the previous chapter’s code, and simply decompress the compressed data buffer (assuming the compression operation succeeded.) Note; the function returns Z_BUF_ERROR if the supplied buffer was too small so it is possible to decompress a ZLib compressed buffer without knowing its uncompressed size, by sending it various buffer sizes until it no longer returns Z_BUF_ERROR. This approach is very inefficient since it means the data has to be decompressed each time so should only be used as a last resort, in most cases you should already know the length of the decompressed buffer. In this case we know the size of the decompressed data (the file data) so we can simply allocate the buffer:
Now simply call the decompression method:
Assuming all went well the size of the decompressed buffer should be the same as the original fie size. To further verify that the data is indeed correct we can use what’s known as a cyclical redundancy check or CRC which takes a buffer and performs some mathematics on each byte to get a final result. If even one byte of the two buffers differs then the CRC checks will be different which allows us to detect the validity of the data. The ZLib library exposes two CRC methods, the first performs a full CRC check on the data, where as the second performs a much quicker (but less accurate) check. If you want to find out more about how these two methods work then the full source code is available, you’ll find the full CRC method implemented in crc32.c and it’s corresponding header file, and the Adler CRC method in adler.c
Again these are pretty simple to port to VB, each taking an initial value then a pointer to a data buffer and its length:
The way these methods work is to take an initial value and use that as a base to calculate the rest of the CRC from the given buffer. The reason for this is it allows CRC calculation of multi-part buffers rather than having to send the entire thing in one go. The only problem here is that what initial value do we start the CRC buffer on for the first piece of data we sent to it, does it even matter? The answer depends on what you’re using the check for, if you simply want to check inside your own application then as long as you specify the same initial value for both the source CRC check and the destination CRC check then it really doesn’t matter which initial value you specify. If however you’re receiving the CRC as calculated by another application (common in things such as network transfer where data is prone to ‘go missing’ or get corrupted) then you must be sure to specify the same initial value as the other application. While you could get the application to send its initial CRC value to you, there is no reason that that data wouldn’t get corrupted but luckily there is a better way. By calling the functions and specifying a null pointer to the buffer, it will simply return its preferred initial value so as long as the other application is using this too then you know you’re starting from the correct value. For this test we’ll use the full CRC method, however the Adler method works in exactly the same way, so go ahead and find the initial value:
With the version of the library I’m using, the initial value of the CRC is zero, however it’s always best to get the library to tell you rather than hard coding it since this could (but shouldn’t) change in future versions.
Now calculate the CRC of the decompressed buffer (I’ll be using a slightly more condensed version by getting the initial value inline) and compare them:
As long as all went well the CRC’s should match. |