Saturday, April 24, 2021

Compression under linux

 I tried to re-compress some files. Goals:

  • Best possible compression
    • But also good and fast
    • Also check decompression time
  • Use all CPU threads
  • Use much of the memory
  • Use relatively fast drive 

Hardware used for tests:

  • AMD Ryzen 3700x (8 cores, 16 threads)
  • 96GB RAM (DDR4)
  • M.2 SSD with about 2GiB/s read and write speed

 Test file: some .tar.gz archive with android sources: 46 GiB. Test was partially successful - I failed to use proper commands and gather all information.

Hints for next attempt:

  • Source file must be owned by another user and read-only
  • Need to use `time` command at least
  • Need to monitor CPU load and temperature throughout the measurement
    • Ryzen can increase clock speed if it is cool (and I have used a very poor cooler from another older CPU only because it looks nice and is small having similar TDP)
  • Need to monitor disk buffers
    • Ideally whole file is kept in the memory or is not buffered at all
    • Ideally compression buffers won't push file out of the buffer

Initial results:

Tool Output size
(GB)
Time Output size (B) Load Command
LRZip 10 142m01s 10'731'879'579 138m15s
real 138m15s
user 1801m37s
sys 4m1s
lrzip -Uz -L 9 src.tar
zpaq 12 215m16s 12'114'694'539 215m16s
user 198336
sys 948s
cpu 1542%
zpaq a src.tar.zpaq src.tar -m5
7z 13 36m32s 13'838'803'720


7z a -mx=9 -mmt=14 src.tar.7z src.tar
lbzip2 20
20'891'605'828


lbzip2 -9 src.tar

 

So far LRZip is the best - what was expected. However 7z is next on the list. Zpaq is not second - in fact this method is used by 7z. Lbzip2 is the words - but fastest.

Results:

  • Use 7z most cases
  • For archiving purpose - use lrzip
  • Gzip with lowest compression to be used for any text files (ex. logs)

Next attempt:

  • Use different tools and settings:
    • 7z (try different settings)
    • gzip lowest and highest
    • bzip2 lowest and highest
    • zpaq
    • zpac
    • lrzip
    • xz
    • paq81
    • kgb
    • lzma
    • pax
    • cpio
    • ar
  • Monitor CPU

No comments: