In this How To article I demonstrate using Python to compress files into zip archives along with extracting files contained in a zip archive. The venerable, batteries included, Python standard library provides the zipfile module which exposes a well designed API for working with zip archives in a platform manner and will be the focus of this article.
For this first section on compressing files into a zip archive, along with other parts in this tutorial, I will be working with the following directory structure and test files.
$ tree .
.
└── testdata
├── file01.txt
├── file02.txt
├── file03.txt
├── file04.txt
├── file05.txt
├── file06.txt
├── file07.txt
├── file08.txt
├── file09.txt
└── file10.txt
The zipfile module provides a ZipFile class that I will be primarily working with in this article. One thing to mention right off the bat is that the ZipFile object is a context manager and thus a suitable candidate for using the with(...) construct in Python which will automatically handle resource deallocation upon exiting the with(...) block and should be the preferred method of usage.
To start the discussion I will first demonstrate how to compress a single file into a zip archive.
# compress_single.py
from zipfile import ZipFile
if __name__ == '__main__':
single_file = 'testdata/file01.txt'
with ZipFile('file01.zip', mode='w') as zf:
zf.write(single_file)
Running the above program will compress the single testdata/file01.txt file into the zip archive named file01.zip by constructing an instance of the ZipFile object in write mode then calling the write(...) method on the resulting object passing it the path to the file to be compressed.
$ python compress_single.py
$ ls -l
-rw-r--r-- 1 adammcquistan staff 192 Apr 15 22:45 compress.py
-rw-r--r-- 1 adammcquistan staff 156 Apr 15 22:48 file01.zip
drwxr-xr-x 12 adammcquistan staff 384 Apr 15 22:26 testdata
Compressing mutiple files into a single zip archive is only a simple extension to the previous example. To accomplish this use case multiple write(...) method calls, one for each file to add to the archive, are made on the ZipFile object as shown below.
# compress_many.py
from zipfile import ZipFile
if __name__ == '__main__':
input_files = [
'testdata/file01.txt',
'testdata/file02.txt',
'testdata/file03.txt',
'testdata/file04.txt',
'testdata/file05.txt',
'testdata/file06.txt',
'testdata/file07.txt',
'testdata/file08.txt',
'testdata/file09.txt',
'testdata/file10.txt',
]
with ZipFile('files.zip', mode='w') as zf:
for f in input_files:
zf.write(f)
And running the script looks as expected.
$ ls -l
-rw-r--r-- 1 adammcquistan staff 514 Apr 15 23:12 compress_many.py
-rw-r--r-- 1 adammcquistan staff 1367 Apr 15 23:16 files.zip
drwxr-xr-x 12 adammcquistan staff 384 Apr 15 22:26 testdata
Often times its useful to programmatically peek into a zip archive and inspect its contents. To accomplish this the zipfile module provides the ZipInfo class which represents individual zip archive items such as each entries name and decompressed size. Once a ZipFile object is constructed you can call the infolist() method on it which returns a list of ZipInfo objects.
As an example, the following module named peek_zip.py queries the files.zip archive created in the last section and displays the name and size of each ZipInfo object representing the contents of the files.zip archive.
# peek_zip.py
from zipfile import ZipFile
if __name__ == '__main__':
with ZipFile('files.zip') as zf:
for zipinfo in zf.infolist():
print(f"{zipinfo.filename} ({zipinfo.file_size}B)")
As you can see running the program shows the expected output.
$ python peek_zip.py
testdata/file01.txt (20B)
testdata/file02.txt (21B)
testdata/file03.txt (20B)
testdata/file04.txt (20B)
testdata/file05.txt (20B)
testdata/file06.txt (20B)
testdata/file07.txt (22B)
testdata/file08.txt (21B)
testdata/file09.txt (21B)
testdata/file10.txt (20B)
Extracting the contents of a zip archive is a fairly trivial task as well. To accomplish this task one should construct a ZipFile object passing it the path to the archive you wish to extract along with a 'r' argument to the mode parameter indicating you are reading from the archive. Then you can either extract individual files with the extract(...) method or all contents with extractall(...) method.
To extract and individual entry supply the ZipInfo name and the path to which you want to extract it to or omit it completely and have it extract to the current working directory.
# extract_single.py
import os
from zipfile import ZipFile
if __name__ == '__main__':
output_dir = 'extract_singles'
if not os.path.exists(output_dir):
os.mkdir(output_dir)
with ZipFile('files.zip', mode='r') as zf:
zf.extract('testdata/file01.txt', path=output_dir)
Running the program and doing a directory listing is shown below.
$ python extract_single.py
$ ls -l extract_singles/testdata
-rw-r--r-- 1 adammcquistan staff 20 Apr 15 23:57 file01.txt
Similarly, you can use the extractall(...) method to extract the entire contents of a zip archive to a specified location as seen below.
# extract_all.py
import os
from zipfile import ZipFile
if __name__ == '__main__':
output_dir = 'extract_all'
if not os.path.exists(output_dir):
os.mkdir(output_dir)
with ZipFile('files.zip', mode='r') as zf:
zf.extractall(output_dir)
Then for completeness here is the output.
$ python3 extract_all.py
$ ls -l extract_all/testdata
-rw-r--r-- 1 adammcquistan staff 20 Apr 16 00:05 file01.txt
-rw-r--r-- 1 adammcquistan staff 21 Apr 16 00:05 file02.txt
-rw-r--r-- 1 adammcquistan staff 20 Apr 16 00:05 file03.txt
-rw-r--r-- 1 adammcquistan staff 20 Apr 16 00:05 file04.txt
-rw-r--r-- 1 adammcquistan staff 20 Apr 16 00:05 file05.txt
-rw-r--r-- 1 adammcquistan staff 20 Apr 16 00:05 file06.txt
-rw-r--r-- 1 adammcquistan staff 22 Apr 16 00:05 file07.txt
-rw-r--r-- 1 adammcquistan staff 21 Apr 16 00:05 file08.txt
-rw-r--r-- 1 adammcquistan staff 21 Apr 16 00:05 file09.txt
-rw-r--r-- 1 adammcquistan staff 20 Apr 16 00:05 file10.txt
thecodinginterface.com earns commision from sales of linked products such as the books above. This enables providing continued free tutorials and content so, thank you for supporting the authors of these resources as well as thecodinginterface.com
In this article I have discussed and provided several code samples demonstrating how to work with zip files using the Python programming language utilizing the zipfile module from the standard library.
As always, I thank you for reading and please feel free to ask questions or critique in the comments section below.