4mLIBARCHIVE-FORMATS24m(5)	   File Formats Manual	      4mLIBARCHIVE-FORMATS24m(5)

1mNAME0m
       libarchive-formats  —  archive  formats supported by the libarchive li‐
       brary

1mDESCRIPTION0m
       The 4mlibarchive24m(3) library reads  and  writes  a  variety  of  streaming
       archive formats.	 Generally speaking, all of these archive formats con‐
       sist  of a series of “entries”.	Each entry stores a single file system
       object, such as a file, directory, or symbolic link.

       The following provides a brief description of each format supported  by
       libarchive,  with some information about recognized extensions or limi‐
       tations of the current library support.	Note that just because a  for‐
       mat  is supported by libarchive does not imply that a program that uses
       libarchive will support that format.  Applications that use  libarchive
       specify which formats they wish to support, though many programs do use
       libarchive convenience functions to enable all supported formats.

   1mTar Formats0m
       The  4mlibarchive24m(3)	library	 can  read most tar archives.  It can write
       POSIX-standard “ustar” and “pax interchange” formats as well as v7  tar
       format and a subset of the legacy GNU tar format.

       All  tar formats store each entry in one or more 512-byte records.  The
       first record is used for file metadata, including filename,  timestamp,
       and  mode  information,	and  the  file	data  is  stored in subsequent
       records.	 Later variants have extended this by either appropriating un‐
       defined areas of the header record, extending the  header  to  multiple
       records,	 or  by storing special entries that modify the interpretation
       of subsequent entries.

       1mgnutar	 22mThe	 4mlibarchive24m(3)  library  can  read	 most	GNU-format   tar
	       archives.   It  currently  supports the most popular GNU exten‐
	       sions, including modern long filename and linkname support,  as
	       well  as atime and ctime data.  The libarchive library does not
	       support multi-volume archives, nor the old  GNU	long  filename
	       format.	It can read GNU sparse file entries, including the new
	       POSIX-based formats.

	       The  4mlibarchive24m(3)	library can write GNU tar format, including
	       long filename and linkname support, as well as atime and	 ctime
	       data.

       1mpax	 22mThe	 4mlibarchive24m(3)  library  can read and write POSIX-compliant
	       pax  interchange	 format	 archives.   Pax  interchange	format
	       archives are an extension of the older ustar format that adds a
	       separate	 entry	with additional attributes stored as key/value
	       pairs immediately before each regular entry.  The  presence  of
	       these additional entries is the only difference between pax in‐
	       terchange  format and the older ustar format.  The extended at‐
	       tributes are of unlimited length and are stored as  UTF-8  Uni‐
	       code strings.  Keywords defined in the standard are in all low‐
	       ercase;	vendors are allowed to define custom keys by preceding
	       them with the vendor name in all uppercase.  When  writing  pax
	       archives,  libarchive  uses  many of the SCHILY keys defined by
	       Joerg Schilling's “star” archiver and a	few  LIBARCHIVE	 keys.
	       The  libarchive	library	 can  read most of the SCHILY keys and
	       most of the GNU keys introduced by GNU tar.   It	 silently  ig‐
	       nores any keywords that it does not understand.

	       The  pax	 interchange  format converts filenames to Unicode and
	       stores them using the UTF-8 encoding.  Prior to libarchive 3.0,
	       libarchive erroneously assumed that the	system	wide-character
	       routines	 natively  supported  Unicode.	This caused it to mis-
	       handle non-ASCII filenames on systems that did not satisfy this
	       assumption.

       1mrestricted pax0m
	       The libarchive library can also write pax archives in which  it
	       attempts	 to  suppress  the  extended attributes entry whenever
	       possible.  The result will be identical to a ustar archive  un‐
	       less  the extended attributes entry is required to store a long
	       file name, long linkname, extended ACL, file flags, or  if  any
	       of  the	standard  ustar data (user name, group name, UID, GID,
	       etc) cannot be fully represented in the ustar header.   In  all
	       cases,  the  result  can	 be dearchived by any program that can
	       read POSIX-compliant pax interchange format archives.  Programs
	       that correctly read ustar format (see below) will also be  able
	       to  read this format; any extended attributes will be extracted
	       as separate files stored in 4mPaxHeader24m directories.

       1mustar	 22mThe libarchive library can both read  and  write  this  format.
	       This format has the following limitations:
	       1m•   22mDevice	major  and  minor  numbers  are limited to 21 bits.
		   Nodes with larger numbers will not be added to the archive.
	       1m•   22mPath names  in	the  archive  are  limited  to	255  bytes.
		   (Shorter  if	 there	is no / character in exactly the right
		   place.)
	       1m•   22mSymbolic links and hard links are  stored  in  the  archive
		   with the name of the referenced file.  This name is limited
		   to 100 bytes.
	       1m•   22mExtended  attributes,  file flags, and other extended secu‐
		   rity information cannot be stored.
	       1m•   22mArchive entries are limited to 8 gigabytes in size.
	       Note that the pax interchange format has none of these restric‐
	       tions.  The ustar format is old and widely  supported.	It  is
	       recommended when compatibility is the primary concern.

       1mv7	 22mThe	 libarchive  library  can  read and write the legacy v7 tar
	       format.	This format has the following limitations:
	       1m•   22mOnly regular files, directories, and symbolic links can	 be
		   archived.   Block  and  character  device nodes, FIFOs, and
		   sockets cannot be archived.
	       1m•   22mPath names in the archive are limited to 100 bytes.
	       1m•   22mSymbolic links and hard links are  stored  in  the  archive
		   with the name of the referenced file.  This name is limited
		   to 100 bytes.
	       1m•   22mUser and group information are stored as numeric IDs; there
		   is no provision for storing user or group names.
	       1m•   22mExtended  attributes,  file flags, and other extended secu‐
		   rity information cannot be stored.
	       1m•   22mArchive entries are limited to 8 gigabytes in size.
	       Generally, users should prefer the ustar format for portability
	       as the v7 tar format is both less useful and less portable.

       The libarchive library also reads a variety of commonly-used extensions
       to the basic tar format.	 These extensions are recognized automatically
       whenever they appear.

       Numeric extensions.
	       The POSIX standards require fixed-length numeric fields	to  be
	       written	with some character position reserved for terminators.
	       Libarchive allows these fields to be written without terminator
	       characters.  This extends the allowable range;  in  particular,
	       ustar archives with this extension can support entries up to 64
	       gigabytes  in size.  Libarchive also recognizes base-256 values
	       in most numeric fields.	This essentially removes  all  limita‐
	       tions on file size, modification time, and device numbers.

       Solaris extensions
	       Libarchive  recognizes ACL and extended attribute records writ‐
	       ten by Solaris tar.

       The first tar program appeared in Seventh Edition Unix  in  1979.   The
       first  official	standard for the tar file format was the “ustar” (Unix
       Standard Tar) format defined by POSIX in 1988.	POSIX.1-2001  extended
       the ustar format to create the “pax interchange” format.

   1mCpio Formats0m
       The libarchive library can read and write a number of common cpio vari‐
       ants.  A cpio archive stores each entry as a fixed-size header followed
       by a variable-length filename and variable-length data.	Unlike the tar
       format, the cpio format does only minimal padding of the header or file
       data.   There  are several cpio variants, which differ primarily in how
       they store the initial header: some store the values as octal or	 hexa‐
       decimal numbers in ASCII, others as binary values of varying byte order
       and length.

       1mbinary	 22mThe	 libarchive library transparently reads both big-endian and
	       little-endian variants of the the two binary cpio formats;  the
	       original	 one  from  PWB/UNIX, and the later, more widely used,
	       variant.	 This format used 32-bit binary values for  file  size
	       and  mtime, and 16-bit binary values for the other fields.  The
	       formats support only the file types present in UNIX at the time
	       of their creation.  File sizes are limited to 24	 bits  in  the
	       PWB format, because of the limits of the file system, and to 31
	       bits in the newer binary format, where signed 32 bit longs were
	       used.

       1modc	 22mThis  is  the  POSIX  standardized	format, which is officially
	       known as the “cpio interchange format” or  the  “octet-oriented
	       cpio  archive format” and sometimes unofficially referred to as
	       the “old character format”.  This format stores the header con‐
	       tents as octal values in ASCII.	It is standard, portable,  and
	       immune  from  byte-order	 confusion.   File sizes and mtime are
	       limited to 33 bits (8GB file size), other fields are limited to
	       18 bits.

       1mSVR4/newc0m
	       The libarchive library can read both CRC and  non-CRC  variants
	       of  this	 format.  The SVR4 format uses eight-digit hexadecimal
	       values for all header fields.  This limits file	size  to  4GB,
	       and  also  limits  the  mtime and other fields to 32 bits.  The
	       SVR4 format can optionally include a CRC of the file  contents,
	       although libarchive does not currently verify this CRC.

       Cpio  first appeared in PWB/UNIX 1.0, which was released within AT&T in
       1977.  PWB/UNIX 1.0 formed the basis of System III Unix, released  out‐
       side  of	 AT&T  in 1981.	 This makes cpio older than tar, although cpio
       was not included in Version 7 AT&T Unix.	 As a result, the tar  command
       became  much better known in universities and research groups that used
       Version 7.  The combination of the 1mfind  22mand  1mcpio  22mutilities	provided
       very  precise  control  over file selection.  Unfortunately, the format
       has many limitations that make it unsuitable for widespread use.	  Only
       the  POSIX format permits files over 4GB, and its 18-bit limit for most
       other fields makes it unsuitable for modern systems.  In addition, cpio
       formats only store numeric UID/GID  values  (not	 usernames  and	 group
       names), which can make it very difficult to correctly transfer archives
       across systems with dissimilar user numbering.

   1mShar Formats0m
       A “shell archive” is a shell script that, when executed on a POSIX-com‐
       pliant  system, will recreate a collection of file system objects.  The
       libarchive library can write two different kinds of shar archives:

       1mshar	 22mThe traditional shar format uses a limited set  of	POSIX  com‐
	       mands, including 4mecho24m(1), 4mmkdir24m(1), and 4msed24m(1).  It is suitable
	       for  portably  archiving small collections of plain text files.
	       However, it is not generally  well-suited  for  large  archives
	       (many  implementations  of  4msh24m(1)  have limits on the size of a
	       script) nor should it be used with non-text files.

       1mshardump0m
	       This  format  is	 similar  to  shar  but	 encodes  files	 using
	       4muuencode24m(1)	 so  that  the result will be a plain text file re‐
	       gardless of the file contents.	It  also  includes  additional
	       shell  commands	that attempt to reproduce as many file attrib‐
	       utes as possible, including owner, mode, and flags.  The	 addi‐
	       tional  commands	 used to restore file attributes make shardump
	       archives less portable than plain shar archives.

   1mISO9660 format0m
       Libarchive can read and extract from files containing ISO9660-compliant
       CDROM images.  In many cases, this can remove the need to burn a physi‐
       cal CDROM just in order to read the files contained in an  ISO9660  im‐
       age.  It also avoids security and complexity issues that come with vir‐
       tual  mounts and loopback devices.  Libarchive supports the most common
       Rockridge extensions and has partial support for Joliet extensions.  If
       both extensions are present, the Joliet extensions will be used and the
       Rockridge extensions will be ignored.  In particular, this  can	create
       problems	 with hardlinks and symlinks, which are supported by Rockridge
       but not by Joliet.

       Libarchive reads ISO9660 images using a streaming strategy.   This  al‐
       lows  it	 to read compressed images directly (decompressing on the fly)
       and allows it to read images directly from network sockets, pipes,  and
       other  non-seekable  data  sources.  This strategy works well for opti‐
       mized ISO9660 images created by many popular programs.	Such  programs
       collect all directory information at the beginning of the ISO9660 image
       so it can be read from a physical disk with a minimum of seeking.  How‐
       ever, not all ISO9660 images can be read in this fashion.

       Libarchive  can also write ISO9660 images.  Such images are fully opti‐
       mized with the directory information preceding all file data.  This  is
       done  by storing all file data to a temporary file while collecting di‐
       rectory information in memory.  When the image is finished,  libarchive
       writes  out the directory structure followed by the file data.  The lo‐
       cation used for the temporary file can be changed by the usual environ‐
       ment variables.

   1mZip format0m
       Libarchive can read and write zip  format  archives  that  have	uncom‐
       pressed	entries	 and  entries compressed with the “deflate” , “LZMA” ,
       “XZ” , “BZIP2” and “ZSTD” algorithms.  Libarchive can  also  read,  but
       not  write,  zip	 format archives that have entries compressed with the
       “PPMd” algorithm.  Other zip compression algorithms are not  supported.
       The  extensions	supported by libarchive are Zip64, Libarchive's exten‐
       sions to better support streaming, PKZIP's traditional ZIP  encryption,
       Info-ZIP's  Unix extra fields, extra time, and Unicode path, as well as
       WinZIP's AES encryption.	 It can extract	 jar  archives,	 __MACOSX  re‐
       source  forks  extension	 for  OS  X, and self-extracting zip archives.
       Libarchive can use either of two different strategies for  reading  Zip
       archives:  a  streaming strategy which is fast and can handle extremely
       large archives, and a seeking  strategy	which  can  correctly  process
       self-extracting Zip archives and archives with deleted members or other
       in-place modifications.

       The  streaming  reader processes Zip archives as they are read.	It can
       read archives of arbitrary size from tape or network sockets,  and  can
       decode  Zip  archives  that have been separately compressed or encoded.
       However, self-extracting Zip archives and archives with	certain	 types
       of  modifications  cannot  be correctly handled.	 Such archives require
       that the reader first process the Central Directory, which is  ordinar‐
       ily located at the end of a Zip archive and is thus inaccessible to the
       streaming  reader.   If	the  program using libarchive has enabled seek
       support, then libarchive will use this to processes the central	direc‐
       tory first.

       In  particular,	the  seeking  reader  must be used to correctly handle
       self-extracting archives.  Such archives consist of a program  followed
       by  a  regular Zip archive.  The streaming reader cannot parse the ini‐
       tial program portion, but the seeking reader starts by reading the Cen‐
       tral Directory from the end of the archive.   Similarly,	 Zip  archives
       that  have  been	 modified  in-place  can have deleted entries or other
       garbage data that can only be accurately detected by first reading  the
       Central Directory.

   1mArchive (library) file format0m
       The  Unix  archive format (commonly created by the 4mar24m(1) archiver) is a
       general-purpose format which is	used  almost  exclusively  for	object
       files  to  be  read  by the link editor 4mld24m(1).  The ar format has never
       been standardised.  There are two common variants: the GNU  format  de‐
       rived  from  SVR4,  and the BSD format, which first appeared in 4.4BSD.
       The two differ primarily in their handling of filenames longer than  15
       characters:  the GNU/SVR4 variant writes a filename table at the begin‐
       ning of the archive; the BSD format stores each long filename in an ex‐
       tension area adjacent to the entry.  Libarchive can  read  both	exten‐
       sions,  including  archives  that  may include both types of long file‐
       names.  Programs using libarchive can write  GNU/SVR4  format  if  they
       provide	an  entry  called 4m//24m containing a filename table to be written
       into the archive before any of the entries.  Any	 entries  whose	 names
       are  not	 in  the  filename  table will be written using BSD-style long
       filenames.  This can cause problems for programs such as GNU ld that do
       not support the BSD-style long filenames.

   1mmtree0m
       Libarchive can read and write files in 4mmtree24m(5) format.  This format is
       not a true archive format, but rather a textual description of  a  file
       hierarchy  in which each line specifies the name of a file and provides
       specific metadata about that file.  Libarchive can read all of the key‐
       words supported by both the NetBSD and FreeBSD  versions	 of  4mmtree24m(8),
       although	 many  of  the	keywords  cannot  currently  be	 stored	 in an
       archive_entry object.  When writing, libarchive	supports  use  of  the
       4marchive_write_set_options24m(3) interface to specify which keywords should
       be  included  in the output.  If libarchive was compiled with access to
       suitable cryptographic libraries (such as the  OpenSSL  libraries),  it
       can  compute  hash  entries  such as 1msha512 22mor 1mmd5 22mfrom file data being
       written to the mtree writer.

       When reading an mtree file, libarchive will  locate  the	 corresponding
       files  on  disk	using  the  1mcontents 22mkeyword if present or the regular
       filename.  If it can locate and open the file on disk, it will use that
       to fill in any metadata that is missing from the mtree  file  and  will
       read   the  file	 contents  and	return	those  to  the	program	 using
       libarchive.  If it cannot locate and open the file on disk,  libarchive
       will return an error for any attempt to read the entry body.

   1m7-Zip0m
       Libarchive  can	read and write 7-Zip format archives.  TODO: Need more
       information

   1mCAB0m
       Libarchive can read Microsoft Cabinet ( “CAB”) format archives.	 TODO:
       Need more information.

   1mLHA0m
       TODO: Information about libarchive's LHA support

   1mRAR0m
       Libarchive  has	limited support for reading RAR format archives.  Cur‐
       rently, libarchive can read RARv3 format archives which have  been  ei‐
       ther  created  uncompressed, or compressed using any of the compression
       methods supported by the RARv3 format.  Libarchive can also read	 self-
       extracting RAR archives.

   1mWarc0m
       Libarchive can read and write “web archives”.  TODO: Need more informa‐
       tion

   1mXAR0m
       Libarchive  can read and write the XAR format used by many Apple tools.
       TODO: Need more information

1mSEE ALSO0m
       4mar24m(1), 4mcpio24m(1), 4mmkisofs24m(1), 4mshar24m(1), 4mtar24m(1), 4mzip24m(1), 4mzlib24m(3),	 4mcpio24m(5),
       4mmtree24m(5), 4mtar24m(5)

Debian			       December 27, 2016	 4mLIBARCHIVE-FORMATS24m(5)
