There is two way to optimize the bzip2 compression time.
In the mainsort function of blocksort.c
- Using memset() function instead a for loop to set each elements in an array to zero. By doing this, it will reduce the memory usage by using shared library, and reduce the branch amount from the for loop.
-for (i = 65536; i >= 0; i–) ftab[i] = 0;
+memset(&ftab[0],0,65537*sizeof(Int32));
2.Manually unroll the loop it will increase the run time memory usage, as the author said it will produced wrong code while compile with gcc 2.7.x.
It will reduce the branch of the program, which will reduce the compression time.
– //for (i = 1; i <= 65536; i++) ftab[i] += ftab[i-1]
+for (i = 1; i<= 65536; i+=8){
+ ftab[i] += ftab[i-1];
+ ftab[i+1] += ftab[i];
+ ftab[i+2] += ftab[i+1];
+ ftab[i+3] += ftab[i+2];
+ ftab[i+4] += ftab[i+3];
+ ftab[i+5] += ftab[i+4];
+ ftab[i+6] += ftab[i+5];
+ ftab[i+7] += ftab[i+6];
+ }
After compile the program, it passed 6 tests from its own on x86_64 and AArch64 machine.
Doing 6 tests (3 compress, 3 uncompress) …
If there’s a problem, things might stop at this point.
./bzip2 -1 < sample1.ref > sample1.rb2
./bzip2 -2 < sample2.ref > sample2.rb2
./bzip2 -3 < sample3.ref > sample3.rb2
./bzip2 -d < sample1.bz2 > sample1.tst
./bzip2 -d < sample2.bz2 > sample2.tst
./bzip2 -ds < sample3.bz2 > sample3.tst
cmp sample1.bz2 sample1.rb2
cmp sample2.bz2 sample2.rb2
cmp sample3.bz2 sample3.rb2
cmp sample1.tst sample1.ref
cmp sample2.tst sample2.ref
cmp sample3.tst sample3.ref
If you got this far and the ‘cmp’s didn’t complain, it looks
like you’re in business.
On x86_64 machine, while compression the linux-kernel tarball(version 4.5) with 628MB(658,053,120 bytes)
The compression time before
real 1m39.944s
user 1m39.464s
sys 0m0.388s
The compression time after
real 1m27.951s
user 1m27.428s
sys 0m0.443s
After running the cmp between the file compressed before and after, it was the same file
It reduced the compression by nearly 12% from the previous version
On AArch64 machine, while compression the linux-kernel tarball(version 4.5) with 628MB(658,053,120 bytes)
The compression time before
real 13m32.371s
user 13m29.980s
sys 0m1.890s
The compression time after
real 11m59.926s
user 11m58.790s
sys 0m0.960s
After running the cmp between the file compressed before and after, it was the same file
It reduced the compression time by nearly 11.5% from the previous version