7 Zip Archive Formats For Essays

7zip is award winning open source file archiver. Besides operating on the 7z format, it supports many other popular archive formats and can seamlessly work on them. The 7zip project was started in 1999 by a Russian freelance programmer who is the developer and maintainer of this project. 7zip claims to have the highest compression ratio. As an end user, I have personally used 7zip many times and found it better than many other fellow archivers especially when compressing files into a 7z format. Its a great tool to have in your kitty so I decided to write a basic tutorial on how to use 7zip through Linux command line.

7zip is distributed under LGPL license as a free software to use. The version available for Linux is known as p7zip package. I am using Linux mint so the installation part of this tutorial would be most suited for Linux mint, Ubuntu and other debain Linux distributions while the examples are universal for any Linux distribution.

How to install p7zip package

When I started exploring 7zip package on my Linux mint machine, soon I found that its not currently installed. So I decided to install it. The first command that I used to install this package was :

$ sudo apt-get install p7zip [sudo] password for himanshu: Reading package lists... Done Building dependency tree Reading state information... Done p7zip is already the newest version. 0 upgraded, 0 newly installed, 0 to remove and 50 not upgraded.

The output pointed that p7zip is already installed. Then I researched and bit and found that to install 7z archiver as a command line utility, I need to install the p7zip-full package. So I tried to install this package :

$ sudo apt-get install p7zip-full Reading package lists... Done Building dependency tree Reading state information... Done Suggested packages: p7zip-rar The following NEW packages will be installed: p7zip-full 0 upgraded, 1 newly installed, 0 to remove and 50 not upgraded. Need to get 1,419kB of archives. After this operation, 3,662kB of additional disk space will be used. WARNING: The following packages cannot be authenticated! p7zip-full Authentication warning overridden. Get:1 http://archive.ubuntu.com/ubuntu/ lucid/universe p7zip-full 9.04~dfsg.1-1 [1,419kB] Fetched 1,419kB in 29s (48.0kB/s) Selecting previously deselected package p7zip-full. (Reading database ... 133376 files and directories currently installed.) Unpacking p7zip-full (from .../p7zip-full_9.04~dfsg.1-1_amd64.deb) ... Processing triggers for man-db ... Setting up p7zip-full (9.04~dfsg.1-1) ...

After both p7zip and p7zip-full are installed, you will see the following three command line utilities installed in your Linux box :


From p7zip wiki :
The package includes three binaries, /usr/bin/7z, /usr/bin/7za, and /usr/bin/7zr. Their manpages explain the differences:

7z uses plugins to handle archives.
7za is a stand-alone executable. 7za handles fewer archive formats than 7z, but does not need any others.
7zr is a stand-alone executable. 7zr handles fewer archive formats than 7z, but does not need any others. 7zr is a "light-version" of 7za that only handles 7z archives.

One thing that was different at my end was that the utility 7zr was installed as part of p7zip package while the other two were installed as part of p7zip-full package. I still don't know the reason behind this.


Anyway, now all the three utilities were present and this can be confirmed by the 'whereis' command. $ whereis 7z 7z: /usr/bin/7z /usr/share/man/man1/7z.1.gz $ whereis 7za 7za: /usr/bin/7za /usr/share/man/man1/7za.1.gz $ whereis 7zr 7zr: /usr/bin/7zr /usr/share/man/man1/7zr.1.gz

This was all about installation. Now lets try to explore the 7z utility. As we know that the 7z utility is the main utility, so we will discuss only 7z here.

The syntax of 7z utility is :

7z [adeltux] [-] [SWITCH]

7z command line examples

In all the examples below, I'll use the following files :

$ ls abc.txt basic bufferoverflow.c

In the above output, 'basic' is a directory while the other two are files.

1. Create an archive

This can be done by using the function letter 'a'.

Here is a small example :

$ 7z a basic.7z basic 7-Zip 9.04 beta Copyright (c) 1999-2009 Igor Pavlov 2009-05-30 p7zip Version 9.04 (locale=en_IN,Utf16=on,HugeFiles=on,2 CPUs) Scanning Creating archive basic.7z Compressing basic/helloworld.c Compressing basic/helloworld.o Compressing basic/helloworld.i Compressing basic/helloworld.s Compressing basic/helloworld Everything is Ok $ ls abc.txt basic basic.7z bufferoverflow.c

So we can see that, using 7z an archive basic.7z was created for the directory 'basic'.

2. Extract an archive

This can be done using the function letter 'e'.

Lets extract the archive created in the previous example :

$ 7z e basic.7z 7-Zip 9.04 beta Copyright (c) 1999-2009 Igor Pavlov 2009-05-30 p7zip Version 9.04 (locale=en_IN,Utf16=on,HugeFiles=on,2 CPUs) Processing archive: basic.7z Extracting basic/helloworld.c Extracting basic/helloworld.o Extracting basic/helloworld.i Extracting basic/helloworld.s Extracting basic/helloworld Extracting basic Everything is Ok Folders: 1 Files: 5 Size: 27541 Compressed: 5805 $ ls abc.txt basic basic.7z bufferoverflow.c helloworldhelloworld.c helloworld.i helloworld.o helloworld.s $

So we see that basic.7z was extracted and all the files were extracted into the same folder. The files extracted are shown in bold in the output above.

3. List archive details

This can be done by using the function letter 'l'.

Here is an example :

$ 7z l basic.7z 7-Zip 9.04 beta Copyright (c) 1999-2009 Igor Pavlov 2009-05-30 p7zip Version 9.04 (locale=en_IN,Utf16=on,HugeFiles=on,2 CPUs) Listing archive: basic.7z ---- Path = basic.7z Type = 7z Method = LZMA BCJ Solid = + Blocks = 2 Physical Size = 5805 Headers Size = 232 Date Time Attr Size Compressed Name ------------------- ----- ------------ ------------ ------------------------ 2012-09-09 16:47:17 ....A 192 3600 basic/helloworld.c 2012-09-09 16:47:17 ....A 1568 basic/helloworld.o 2012-09-09 16:47:17 ....A 16700 basic/helloworld.i 2012-09-09 16:47:17 ....A 577 basic/helloworld.s 2012-09-09 16:47:17 ....A 8504 1973 basic/helloworld 2012-09-09 16:47:17 D.... 0 0 basic ------------------- ----- ------------ ------------ ------------------------ 27541 5573 5 files, 1 folders

So we see that the details of the archive basic.7z were listed in the output.

4. Test integrity of the archive

This can be done using the function letter 't'.

Here is an example :

$ 7z t basic.7z basic 7-Zip 9.04 beta Copyright (c) 1999-2009 Igor Pavlov 2009-05-30 p7zip Version 9.04 (locale=en_IN,Utf16=on,HugeFiles=on,2 CPUs) Processing archive: basic.7z Testing basic/helloworld.c Testing basic/helloworld.o Testing basic/helloworld.i Testing basic/helloworld.s Testing basic/helloworld Testing basic Everything is Ok Folders: 1 Files: 5 Size: 27541 Compressed: 17566

So we see that integrity check was done.

5. Update an existing archive

This can be done using the function letter 'u'.

Here is an example :

$ 7z u basic.7z basic 7-Zip 9.04 beta Copyright (c) 1999-2009 Igor Pavlov 2009-05-30 p7zip Version 9.04 (locale=en_IN,Utf16=on,HugeFiles=on,2 CPUs) Scanning Updating archive basic.7z Everything is Ok

So we see that the output says that archive is already up to date. Lets now introduce a new file into the directory 'basic' and again the run the update command :

$ cp bufferoverflow.c basic/ $ ls basic/ bufferoverflow.c helloworld helloworld.c helloworld.i helloworld.o helloworld.s
$ 7z u basic.7z basic 7-Zip 9.04 beta Copyright (c) 1999-2009 Igor Pavlov 2009-05-30 p7zip Version 9.04 (locale=en_IN,Utf16=on,HugeFiles=on,2 CPUs) Scanning Updating archive basic.7z Compressing basic/bufferoverflow.c Everything is Ok $

So firstly the file bufferoverflow.c was copied to directory 'basic' and then the update command was run again. It can be seen in the output that the archive was updated by compressing this new file and adding it to the archive.

6. Delete a file from the archive

This can be done using the function letter 'd' along with the switch -r. This switch tells the 7zip utility to traverse the subdirectories.

Here is an example :

$ 7z l basic.7z 7-Zip 9.04 beta Copyright (c) 1999-2009 Igor Pavlov 2009-05-30 p7zip Version 9.04 (locale=en_IN,Utf16=on,HugeFiles=on,2 CPUs) Listing archive: basic.7z ---- Path = basic.7z Type = 7z Method = LZMA BCJ Solid = + Blocks = 3 Physical Size = 6154 Headers Size = 269 Date Time Attr Size Compressed Name ------------------- ----- ------------ ------------ ------------------------ 2012-09-09 16:47:17 ....A 192 3600 basic/helloworld.c 2012-09-09 16:47:17 ....A 1568 basic/helloworld.o 2012-09-09 16:47:17 ....A 16700 basic/helloworld.i 2012-09-09 16:47:17 ....A 577 basic/helloworld.s 2012-09-09 17:33:51 ....A 634 312 basic/bufferoverflow.c 2012-09-09 16:47:17 ....A 8504 1973 basic/helloworld 2012-09-09 17:33:34 D.... 0 0 basic ------------------- ----- ------------ ------------ ------------------------ 28175 5885 6 files, 1 folders $ 7z d basic.7z helloworld -r 7-Zip 9.04 beta Copyright (c) 1999-2009 Igor Pavlov 2009-05-30 p7zip Version 9.04 (locale=en_IN,Utf16=on,HugeFiles=on,2 CPUs) Updating archive basic.7z Everything is Ok $ 7z l basic.7z 7-Zip 9.04 beta Copyright (c) 1999-2009 Igor Pavlov 2009-05-30 p7zip Version 9.04 (locale=en_IN,Utf16=on,HugeFiles=on,2 CPUs) Listing archive: basic.7z ---- Path = basic.7z Type = 7z Method = LZMA Solid = + Blocks = 2 Physical Size = 4165 Headers Size = 253 Date Time Attr Size Compressed Name ------------------- ----- ------------ ------------ ------------------------ 2012-09-09 16:47:17 ....A 192 3600 basic/helloworld.c 2012-09-09 16:47:17 ....A 1568 basic/helloworld.o 2012-09-09 16:47:17 ....A 16700 basic/helloworld.i 2012-09-09 16:47:17 ....A 577 basic/helloworld.s 2012-09-09 17:33:51 ....A 634 312 basic/bufferoverflow.c 2012-09-09 17:33:34 D.... 0 0 basic ------------------- ----- ------------ ------------ ------------------------ 19671 3912 5 files, 1 folders $

First we checked the files in the archive, next we tried to delete the 'helloworld' executable. Again when the entries in the archive were listed, no trace of 'helloworld' was found. So we can say that this file was successfully deleted from the archive.


NOTE : Besides function letters that we used in the examples above, there are numerous switches also that we can use with this utility. For information on switches, you should go to the man page of 7z utility.

An example, from the man page that describes the use of switches :

$7z a -t7z -m0=lzma -mx=9 -mfb=64 -md=32m -ms=on archive.7z dir1 adds all files from directory "dir1" to archive archive.7z using "ultra settings" -t7z 7z archive -m0=lzma lzma method -mx=9 level of compression = 9 (Ultra) -mfb=64 number of fast bytes for LZMA = 64 -md=32m dictionary size = 32 megabytes -ms=on solid archive = on

So we see that switches can be used to customize the settings.

Some important points

The following section from the man page is worth mentioning here :

DO NOT USE the 7-zip format for backup purpose on Linux/Unix because :

- 7-zip does not store the owner/group of the file.

On Linux/Unix, in order to backup directories you must use tar :

- to backup a directory :
tar cf - directory | 7za a -si directory.tar.7z

- to restore your backup :
7za x -so directory.tar.7z | tar xf

- If you want to send files and directories (not the owner of file) to others Unix/MacOS/Windows users, you can use the 7-zip format.
example : 7za a directory.7z directory

Do not use "-r" because this flag does not do what you think.

Do not use directory/* because of ".*" files (example : "directory/*" does not match "directory/.profile")

For those who want to download the 7zip tool or want to look at the code, here is the project's home page on sourceforge.



Tags:  7zip7zalinuxarchiverp7zip7zr

For other uses, see 7Z (disambiguation).

7z is a compressed archive file format that supports several different data compression, encryption and pre-processing algorithms. The 7z format initially appeared as implemented by the 7-Zip archiver. The 7-Zip program is publicly available under the terms of the GNU Lesser General Public License. The LZMA SDK 4.62 was placed in the public domain in December 2008. The latest stable version of 7-Zip and LZMA SDK is version 16.[1]

The 7z file format specification is distributed with 7-Zip's source code. The specification can be found in plain text format in the 'doc' sub-directory of the source code distribution.

Features and enhancements[edit]

The 7z format provides the following main features:

  • Open, modular architecture that allows any compression, conversion, or encryption method to be stacked.
  • High compression ratios (depending on the compression method used)
  • AES-256 encryption.
  • Large file support (up to approximately 16 exbibytes, or 264 bytes).
  • Unicode file names
  • Support for solid compression, where multiple files of like type are compressed within a single stream, in order to exploit the combined redundancy inherent in similar files.
  • Compression and encryption of archive headers.
  • Support for multi-part archives : e.g. xxx.7z.001, xxx.7z.002, ... (see the context menu items Split File... to create them and Combine Files... to re-assemble an archive from a set of multi-part component files)
  • Support for custom codec plugin DLLs.

The format's open architecture allows additional future compression methods to be added to the standard.

Compression methods[edit]

The following compression methods are currently defined:

  • LZMA – A variation of the LZ77 algorithm, using a sliding dictionary up to 4 GB in length for duplicate string elimination. The LZ stage is followed by entropy coding using a Markov chain-based range coder and binary trees.
  • LZMA2 – modified version of LZMA providing better multithreading support and less expansion of incompressible data.[2]
  • Bzip2 – The standard Burrows–Wheeler transform algorithm. Bzip2 uses two reversible transformations; BWT, then Move to front with Huffman coding for symbol reduction (the actual compression element).
  • PPMd – Dmitry Shkarin's 2002 PPMdH (PPMII/cPPMII) with small changes: PPMII is an improved version of the 1984 PPM compression algorithm (prediction by partial matching).
  • DEFLATE – Standard algorithm based on 32 kB LZ77 and Huffman coding. Deflate is found in several file formats including ZIP, gzip, PNG and PDF. 7-Zip contains a from-scratch DEFLATE encoder that frequently beats the de facto standard zlib version in compression size, but at the expense of CPU usage.

A suite of recompression tools called AdvanceCOMP contains a copy of the DEFLATE encoder from the 7-Zip implementation; these utilities can often be used to further compress the size of existing gzip, ZIP, PNG, or MNG files.

Pre-processing filters[edit]

The LZMA SDK comes with the BCJ and BCJ2 preprocessors included, so that later stages are able to achieve greater compression: For x86, ARM, PowerPC (PPC), IA-64 Itanium, and ARM Thumb processors, jump targets are normalized before compression by changing relative position into absolute values. For x86, this means that near jumps, calls and conditional jumps (but not short jumps and conditional jumps) are converted from the machine language "jump 1655 bytes backwards" style notation to normalized "jump to address 5554" style notation; all jumps to 5554, perhaps a common subroutine, are thus encoded identically, making them more compressible.

  • BCJ – Converter for 32-bit x86 executables. Normalise target addresses of near jumps and calls from relative distances to absolute destinations.
  • BCJ2 – Pre-processor for 32-bit x86 executables. BCJ2 is an improvement on BCJ, adding additional x86 jump/call instruction processing. Near jump, near call, conditional near jump targets are split out and compressed separately in another stream.
  • Delta encoding – delta filter, basic preprocessor for multimedia data.

Similar executable pre-processing technology is included in other software; the RAR compressor features displacement compression for 32-bit x86 executables and IA-64 executables, and the UPX runtime executable file compressor includes support for working with 16-bit values within DOS binary files.

Encryption[edit]

The 7z format supports encryption with the AES algorithm with a 256-bit key. The key is generated from a user-supplied passphrase using an algorithm based on the SHA-256 hash function. The SHA-256 is executed 218 (262144) times,[3] which causes a significant delay on slow PCs before compression or extraction starts. This technique is called key stretching and is used to make a brute-force search for the passphrase more difficult. Current GPU-based, and custom hardware attacks limit the effectiveness of this particular method of key stretching,[4] so it is still important to choose a strong password. The 7z format provides the option to encrypt the filenames of a 7z archive.

Limitations[edit]

The 7z format does not store filesystem permissions (such as UNIX owner/group permissions or NTFSACLs), and hence can be inappropriate for backup/archival purposes. A workaround on UNIX-like systems for this is to convert data to a tar bitstream before compressing with 7z. But it is worth noting that GNU tar (common in many UNIX environments) can also compress with the LZMA algorithm natively, without the use of 7z, and that in this case the suggested[5] file extension for the archive is ".tar.lzma" (or just ".tlz"), and not ".tar.7z". On the other hand, it is important to note, that tar does not save the filesystem encoding, which means that tar compressed filenames can become unreadable if decompressed on a different computer. It is also possible to use LZMA2 by running it through the xz tool. Recent versions of GNU tar support the switch, which runs TAR through XZ. The file extension is ".tar.xz" or ".txz". This method of compression has been adopted with many distributions for packaging, such as Arch, Debian (deb), Fedora (rpm) and Slackware.

The 7z format does not allow extraction of some "broken files"—that is (for example) if one has the first segment of a series of 7z files, 7z cannot give the start of the files within the archive—it must wait until all segments are downloaded. The 7z format also lacks recovery records, making it vulnerable to data degradation. By way of comparison, zip files also lack a recovery feature. In contrast the proprietary rar format permits recoveries as well as the extraction of broken files and file spanning.

See also[edit]

References[edit]

Further reading[edit]

  • Salomon, David (2007). Data compression: the complete reference. Springer. p. 241. ISBN 1-84628-602-6. 

External links[edit]

One thought on “7 Zip Archive Formats For Essays

Leave a Reply

Your email address will not be published. Required fields are marked *