Multiple download URLs¶
You can set different download URLs for individual files with the urls
argument of pooch.create
.
It should be a dictionary with the file names as keys and the URLs for
downloading the files as values.
For example, say we have a citadel.csv
file that we want to download from
https://www.some-data-hosting-site.com
instead:
# The basic setup is the same
POOCH = pooch.create(
path=pooch.os_cache("plumbus"),
base_url="https://github.com/rick/plumbus/raw/{version}/data/",
version=version,
version_dev="main",
registry={
"c137.csv": "19uheidhlkjdwhoiwuhc0uhcwljchw9ochwochw89dcgw9dcgwc",
"cronen.csv": "1upodh2ioduhw9celdjhlfvhksgdwikdgcowjhcwoduchowjg8w",
# Still include the file in the registry
"citadel.csv": "893yprofwjndcwhx9c0ehp3ue9gcwoscjwdfgh923e0hwhcwiyc",
},
# Now specify custom URLs for some of the files in the registry.
urls={
"citadel.csv": "https://www.some-data-hosting-site.com/files/citadel.csv",
},
)
When POOCH.fetch("citadel.csv")
is called, the download will by from the
specified URL instead of the base_url
.
The file name will not be appended automatically to the URL in case you want to
change the file name in local storage.
Attention
Versioning of custom URLs is not supported since they are assumed to be data files independent of your project. The file will still be placed in a versioned cache folder.
Tip
Custom URLs can be used along side base_url
or you can omit
base_url
entirely by setting it to an empty string (base_url=""
).
Doing so requires setting a custom URL for every file in the registry.
Usage with registry files¶
You can also include custom URLs in a registry file by adding the URL for a file to end of the line (separated by a space):
c137.csv 19uheidhlkjdwhoiwuhc0uhcwljchw9ochwochw89dcgw9dcgwc
cronen.csv 1upodh2ioduhw9celdjhlfvhksgdwikdgcowjhcwoduchowjg8w
citadel.csv 893yprofwjndcwhx9c0ehp3ue9gcwoscjwdfgh923e0hwhcwiyc https://www.some-data-hosting-site.com/files/citadel.csv
pooch.Pooch.load_registry
will automatically populate the urls
attribute.
This way, custom URLs don’t need to be set in the code.
In fact, the module code doesn’t change at all:
# Define the Pooch exactly the same (urls is None by default)
POOCH = pooch.create(
path=pooch.os_cache("plumbus"),
base_url="https://github.com/rick/plumbus/raw/{version}/data/",
version=version,
version_dev="main",
registry=None,
)
# If custom URLs are present in the registry file, they will be set
# automatically.
POOCH.load_registry(os.path.join(os.path.dirname(__file__), "registry.txt"))