bzip2 optimization-phase two

There is two way to optimize the bzip2 compression time.

In the mainsort function of blocksort.c

  1. Using memset() function instead a for loop to set each elements in an array to zero. By doing this, it will reduce the memory usage by using shared library, and reduce the branch amount from the for loop.

-for (i = 65536; i >= 0; i–) ftab[i] = 0;
 +memset(&ftab[0],0,65537*sizeof(Int32));

2.Manually unroll the loop it will increase the run time memory usage, as the author said it will produced wrong code while compile with gcc 2.7.x.

It will reduce the branch of the program, which will reduce the compression time.

–   //for (i = 1; i <= 65536; i++) ftab[i] += ftab[i-1]

+for (i = 1; i<= 65536; i+=8){
+       ftab[i] += ftab[i-1];
+       ftab[i+1] += ftab[i];
+       ftab[i+2] += ftab[i+1];
+       ftab[i+3] += ftab[i+2];
+       ftab[i+4] += ftab[i+3];
+       ftab[i+5] += ftab[i+4];
+       ftab[i+6] += ftab[i+5];
+       ftab[i+7] += ftab[i+6];
+       }

After compile the program, it passed 6 tests from its own on x86_64 and AArch64 machine.
Doing 6 tests (3 compress, 3 uncompress) …
If there’s a problem, things might stop at this point.

./bzip2 -1  < sample1.ref > sample1.rb2
./bzip2 -2  < sample2.ref > sample2.rb2
./bzip2 -3  < sample3.ref > sample3.rb2
./bzip2 -d  < sample1.bz2 > sample1.tst
./bzip2 -d  < sample2.bz2 > sample2.tst
./bzip2 -ds < sample3.bz2 > sample3.tst
cmp sample1.bz2 sample1.rb2
cmp sample2.bz2 sample2.rb2
cmp sample3.bz2 sample3.rb2
cmp sample1.tst sample1.ref
cmp sample2.tst sample2.ref
cmp sample3.tst sample3.ref

If you got this far and the ‘cmp’s didn’t complain, it looks
like you’re in business.

On x86_64 machine, while compression the linux-kernel tarball(version 4.5) with 628MB(658,053,120 bytes)

The compression time before

real    1m39.944s
user    1m39.464s
sys     0m0.388s

The compression time after

real    1m27.951s
user    1m27.428s
sys     0m0.443s

After running the cmp between the file compressed before and after, it was the same file

It reduced the compression by nearly 12% from the previous version

On AArch64 machine, while compression the linux-kernel tarball(version 4.5) with 628MB(658,053,120 bytes)

The compression time before
real    13m32.371s
user    13m29.980s
sys     0m1.890s

The compression time after

real    11m59.926s
user    11m58.790s
sys     0m0.960s

After running the cmp between the file compressed before and after, it was the same file

It reduced the compression time  by nearly 11.5% from the previous version