Unpacking archives¶
Let’s say our data file is actually a zip (or tar) archive with a collection of
files.
We may want to store an unpacked version of the archive or extract just a
single file from it.
We can do both operations with the pooch.Unzip and
pooch.Untar processors.
For example, to extract a single file from a zip archive:
from pooch import Unzip
def fetch_zipped_file():
"""
Load a large zipped sample data as a pandas.DataFrame.
"""
# Extract the file "actual-data-file.txt" from the archive
unpack = Unzip(members=["actual-data-file.txt"])
# Pass in the processor to unzip the data file
fnames = GOODBOY.fetch("zipped-data-file.zip", processor=unpack)
# Returns the paths of all extract members (in our case, only one)
fname = fnames[0]
# fname is now the path of the unzipped file ("actual-data-file.txt")
# which can be loaded by pandas directly
data = pandas.read_csv(fname)
return data
By default, the Unzip processor (and similarly the
Untar processor) will create a new folder in the same location
as the downloaded archive file, and give it the same name as the archive file
with the suffix .unzip (or .untar) appended.
If you want to change the location of the unpacked files, you can provide a
parameter extract_dir to the processor to tell it where you want to unpack
the files:
from pooch import Untar
def fetch_and_unpack_tar_file():
"""
Unpack a file from a tar archive to a custom subdirectory in the cache.
"""
# Extract a single file from the archive, to a specific location
unpack_to_custom_dir = Untar(members=["actual-data-file.txt"],
extract_dir="custom_folder")
# Pass in the processor to untar the data file
fnames = GOODBOY.fetch("tarred-data-file.tar.gz", processor=unpack)
# Returns the paths of all extract members (in our case, only one)
fname = fnames[0]
return fname
To extract all files into a folder and return the path to each file, omit the
members parameter:
def fetch_zipped_archive():
"""
Load all files from a zipped archive.
"""
fnames = GOODBOY.fetch("zipped-archive.zip", processor=Unzip())
return fnames
Use pooch.Untar to do the exact same for tar archives (with optional
compression).