pooch.create¶
-
pooch.
create
(path, base_url, version, version_dev, env=None, registry=None)[source]¶ Create a new
Pooch
with sensible defaults to fetch data files.The Pooch will be versioned, meaning that the local storage folder and the base URL depend on the projection version. This is necessary if your users have multiple versions of your library installed (using virtual environments) and you updated the data files between versions. Otherwise, every time a user switches environments would trigger a re-download of the data.
The version string will be appended to the local storage path (for example,
~/.mypooch/cache/v0.1
) and inserted into the base URL (for example,https://github.com/fatiando/pooch/raw/v0.1/data
). If the version string contains+XX.XXXXX
, it will be interpreted as a development version.If the local storage path doesn’t exit, it will be created.
Parameters: - path : str, PathLike, list or tuple
The path to the local data storage folder. If this is a list or tuple, we’ll join the parts with the appropriate separator. The version will be appended to the end of this path. Use
pooch.os_cache
for a sensible default.- base_url : str
Base URL for the remote data source. All requests will be made relative to this URL. The string should have a
{version}
formatting mark in it. We will call.format(version=version)
on this string. If the URL is a directory path, it must end in a'/'
because we will not include it.- version : str
The version string for your project. Should be PEP440 compatible.
- version_dev : str
The name used for the development version of a project. If your data is hosted on Github (and base_url is a Github raw link), then
"master"
is a good choice.- env : str
An environment variable that can be used to overwrite path. This allows users to control where they want the data to be stored. We’ll append version to the end of this value as well.
- registry : dict
A record of the files that are managed by this Pooch. Keys should be the file names and the values should be their SHA256 hashes. Only files in the registry can be fetched from the local storage. Files in subdirectories of path must use Unix-style separators (
'/'
) even on Windows.
Returns: Examples
Create a
Pooch
for a release (v0.1):>>> pup = create(path="myproject", ... base_url="http://some.link.com/{version}/", ... version="v0.1", ... version_dev="master", ... registry={"data.txt": "9081wo2eb2gc0u..."}) >>> print(pup.path.parts) # The path is a pathlib.Path ('myproject', 'v0.1') >>> # We'll create the directory if it doesn't exist yet. >>> pup.path.exists() True >>> print(pup.base_url) http://some.link.com/v0.1/ >>> print(pup.registry) {'data.txt': '9081wo2eb2gc0u...'}
If this is a development version (12 commits ahead of v0.1):
>>> pup = create(path="myproject", ... base_url="http://some.link.com/{version}/", ... version="v0.1+12.do9iwd", ... version_dev="master") >>> print(pup.path.parts) ('myproject', 'master') >>> pup.path.exists() True >>> print(pup.base_url) http://some.link.com/master/
To place the storage folder at a subdirectory, pass in a list and we’ll join the path for you using the appropriate separator for your operating system:
>>> pup = create(path=["myproject", "cache", "data"], ... base_url="http://some.link.com/{version}/", ... version="v0.1", ... version_dev="master") >>> print(pup.path.parts) ('myproject', 'cache', 'data', 'v0.1') >>> pup.path.exists() True
The user can overwrite the storage path by setting an environment variable:
>>> # The variable is not set so we'll use *path* >>> pup = create(path=["myproject", "not_from_env"], ... base_url="http://some.link.com/{version}/", ... version="v0.1", ... version_dev="master", ... env="MYPROJECT_DATA_DIR") >>> print(pup.path.parts) ('myproject', 'not_from_env', 'v0.1') >>> # Set the environment variable and try again >>> import os >>> os.environ["MYPROJECT_DATA_DIR"] = os.path.join("myproject", "from_env") >>> pup = create(path=["myproject", "not_from_env"], ... base_url="http://some.link.com/{version}/", ... version="v0.1", ... version_dev="master", ... env="MYPROJECT_DATA_DIR") >>> print(pup.path.parts) ('myproject', 'from_env', 'v0.1')
Clean up the files we created:
>>> import shutil; shutil.rmtree("myproject")