pooch.DOIDownloader#
- class pooch.DOIDownloader(progressbar=False, chunk_size=1024, **kwargs)[source]#
Download manager for fetching files from Digital Object Identifiers (DOIs).
Open-access data repositories often issue Digital Object Identifiers (DOIs) for data which provide a stable link and citation point. The trick is finding out the download URL for a file given the DOI.
When called, this downloader uses the repository’s public API to find out the download URL from the DOI and file name. It then uses
pooch.HTTPDownloader
to download the URL into the specified local file. Allowing “URL”s to be specified with the DOI instead of the actual HTTP download link. Uses therequests
library to manage downloads and interact with the APIs.The format of the “URL” is:
doi:{DOI}/{file name}
.Notice that there are no
//
like in HTTP/FTP and you must specify a file name after the DOI (separated by a/
).Use with
pooch.Pooch.fetch
orpooch.retrieve
to be able to download files given the DOI instead of an HTTP link.Supported repositories:
Attention
DOIs from other repositories will not work since we need to access their particular APIs to find the download links. We welcome suggestions and contributions adding new repositories.
- Parameters:
progressbar (bool or an arbitrary progress bar object) – If True, will print a progress bar of the download to standard error (stderr). Requires tqdm to be installed. Alternatively, an arbitrary progress bar object can be passed. See Using custom progress bars for details.
chunk_size (int) – Files are streamed chunk_size bytes at a time instead of loading everything into memory at one. Usually doesn’t need to be changed.
**kwargs – All keyword arguments given when creating an instance of this class will be passed to
requests.get
.
Examples
Download one of the data files from the figshare archive of Pooch test data:
>>> import os >>> downloader = DOIDownloader() >>> url = "doi:10.6084/m9.figshare.14763051.v1/tiny-data.txt" >>> # Not using with Pooch.fetch so no need to pass an instance of Pooch >>> downloader(url=url, output_file="tiny-data.txt", pooch=None) >>> os.path.exists("tiny-data.txt") True >>> with open("tiny-data.txt") as f: ... print(f.read().strip()) # A tiny data file for test purposes only 1 2 3 4 5 6 >>> os.remove("tiny-data.txt")
Same thing but for our Zenodo archive:
>>> url = "doi:10.5281/zenodo.4924875/tiny-data.txt" >>> downloader(url=url, output_file="tiny-data.txt", pooch=None) >>> os.path.exists("tiny-data.txt") True >>> with open("tiny-data.txt") as f: ... print(f.read().strip()) # A tiny data file for test purposes only 1 2 3 4 5 6 >>> os.remove("tiny-data.txt")
Methods Summary
DOIDownloader.__call__
(url, output_file, pooch)Download the given DOI URL over HTTP to the given output file.
- DOIDownloader.__call__(url, output_file, pooch)[source]#
Download the given DOI URL over HTTP to the given output file.
Uses the repository’s API to determine the actual HTTP download URL from the given DOI.
Uses
requests.get
.