etoolbox.utils.cloud¶

Tools for working with RMI’s Azure storage.

Functions¶

`cloud_clean`(*[, dry, all_])	Cleanup cache and config directories.
`cloud_setup`()	Interactive cloud setup.
`cloud_init`(account_name, token, *[, dry_run, clobber])	Write SAS token file to disk.
`read_token`()	Read SAS token from disk or environment variable.
`read_account_name`()	Read SAS token from disk or environment variable.
`storage_options`()	Simplify reading from Azure using `polars`.
`rmi_cloud_fs`([account_name, token])	Work with files on Azure.
`cache_info`()	Return info about cloud cache contents.
`cached_path`(cloud_path, *[, download])	Get the local cache path of a cloud file.
`cloud_list`(path, *[, detail])	List cloud files in a folder.
`get`(to_get_path, destination[, fs, quiet, clobber, ...])	Download a remote file from the cloud.
`put`(to_put_path, destination[, fs, quiet, clobber, ...])	Upload local files or directories to the cloud.
`read_patio_resource_results`(datestr)	Reads patio resource results from Azure.
`read_cloud_file`(filename)	Read parquet, csv, or DataZip files from Azure.
`write_cloud_file`(data, filename)	Writes economic results for patio data to a specified filename in Azure storage.

Module Contents¶

etoolbox.utils.cloud.cloud_clean(*, dry=False, all_=False)[source]¶

Cleanup cache and config directories.

Parameters:

dry (bool)
all_ (bool)

etoolbox.utils.cloud.cloud_setup()[source]¶: Interactive cloud setup.

etoolbox.utils.cloud.cloud_init(account_name, token, *, dry_run=False, clobber=False)[source]¶

Write SAS token file to disk.

Parameters:

account_name (str)
token (bytes | str)
dry_run (bool)
clobber (bool)

etoolbox.utils.cloud.read_token()[source]¶

Read SAS token from disk or environment variable.

Return type:: str

etoolbox.utils.cloud.read_account_name()[source]¶

Read SAS token from disk or environment variable.

Return type:: str

etoolbox.utils.cloud.storage_options()[source]¶

Simplify reading from Azure using polars.

When using pandas or writing to Azure, see rmi_cloud_fs().

Examples

>>> import polars as pl
>>> from etoolbox.utils.cloud import storage_options

>>> df = pl.read_parquet("az://patio-data/test_data.parquet", **storage_options())
>>> df.head()
shape: (5, 2)
┌────────────────────┬──────────────────┐
│ energy_source_code ┆ co2_mt_per_mmbtu │
│ ---                ┆ ---              │
│ str                ┆ f64              │
╞════════════════════╪══════════════════╡
│ AB                 ┆ 1.1817e-7        │
│ ANT                ┆ 1.0369e-7        │
│ BFG                ┆ 2.7432e-7        │
│ BIT                ┆ 9.3280e-8        │
│ BLQ                ┆ 9.4480e-8        │
└────────────────────┴──────────────────┘

etoolbox.utils.cloud.rmi_cloud_fs(account_name=None, token=None)[source]¶

Work with files on Azure.

This can be used to read or write arbitrary files to or from Azure. And for files read from Azure, it will create and manage a local cache.

Examples

>>> import pandas as pd
>>> from etoolbox.utils.cloud import rmi_cloud_fs

>>> fs = rmi_cloud_fs()
>>> df = pd.read_parquet("az://patio-data/test_data.parquet", filesystem=fs)
>>> df.head()
  energy_source_code  co2_mt_per_mmbtu
0                 AB      1.181700e-07
1                ANT      1.036900e-07
2                BFG      2.743200e-07
3                BIT      9.328000e-08
4                BLQ      9.448000e-08

Read with polars using the same filecache as with pandas.

>>> import polars as pl

>>> with fs.open("az://patio-data/test_data.parquet") as f:
...     df = pl.read_parquet(f)
>>> df.head()
shape: (5, 2)
┌────────────────────┬──────────────────┐
│ energy_source_code ┆ co2_mt_per_mmbtu │
│ ---                ┆ ---              │
│ str                ┆ f64              │
╞════════════════════╪══════════════════╡
│ AB                 ┆ 1.1817e-7        │
│ ANT                ┆ 1.0369e-7        │
│ BFG                ┆ 2.7432e-7        │
│ BIT                ┆ 9.3280e-8        │
│ BLQ                ┆ 9.4480e-8        │
└────────────────────┴──────────────────┘

Write a parquet file, or really anything to Azure…

>>> with fs.open("az://patio-data/file.parquet", mode="wb") as f:
...     df.write_parquet(f)

Return type:: fsspec.implementations.cached.WholeFileCacheFileSystem

etoolbox.utils.cloud.cache_info()[source]¶: Return info about cloud cache contents.

etoolbox.utils.cloud.cached_path(cloud_path, *, download=False)[source]¶

Get the local cache path of a cloud file.

Parameters:

cloud_path (str) – path on azure, e.g. az://raw-data/test_data.parquet
download – download the file from Azure to create a local cache if it does not exist.

Return type:

str | None

Examples

>>> import polars as pl
>>> from etoolbox.utils.cloud import rmi_cloud_fs, cached_path

>>> fs = rmi_cloud_fs()
>>> cloud_path = "az://patio-data/test_data.parquet"
>>> with fs.open(cloud_path) as f:
...     df = pl.read_parquet(f)
>>> cached_path(cloud_path)
'656706c40cb490423b652aa6d3b4903c56ab6c798ac4eb2fa3ccbab39ceebc4a'

etoolbox.utils.cloud.cloud_list(path, *, detail=False)[source]¶

List cloud files in a folder.

Parameters:

path (str) – remote folder to list contents of e.g. ‘<container>/…’
detail – include detail information

Return type:

list[str] | dict

etoolbox.utils.cloud.get(to_get_path, destination, fs=None, *, quiet=True, clobber=False, azcopy_path='/opt/homebrew/bin/azcopy')[source]¶

Download a remote file from the cloud.

Uses azcopy CLI if available.

Parameters:

to_get_path (str) – remote file or folder to download of the form <container>/...
destination (pathlib.Path | str) – local destination for the downloaded files
fs – filesystem
quiet – disable logging of adlfs output
clobber – overwrite existing files and directories if True
azcopy_path – path to azcopy executable

Return type:

None

etoolbox.utils.cloud.put(to_put_path, destination, fs=None, *, quiet=True, clobber=False, azcopy_path='/opt/homebrew/bin/azcopy')[source]¶

Upload local files or directories to the cloud.

Copies a specific file or tree of files. If destination ends with a “/”, it will be assumed to be a directory, and target files will go within.

Uses azcopy CLI if available.

Parameters:

to_put_path (pathlib.Path) – local file or folder to copy
destination (str) – copy destination of the form <container>/...
fs – filesystem
quiet – disable logging of adlfs output
clobber – force overwriting of existing files (only works when azcopy is used)
azcopy_path – path to azcopy executable

Return type:

None

etoolbox.utils.cloud.read_patio_resource_results(datestr)[source]¶

Reads patio resource results from Azure.

Reads patio resource results from Azure and returns the extracted data as a dictionary (named list). The method handles the specific format of patio resource files and manages file system interactions as well as cache mechanisms.

Parameters:: datestr (str) – Date string that identifies the model run.
Return type:: dict[str, pandas.DataFrame]

etoolbox.utils.cloud.read_cloud_file(filename)[source]¶

Read parquet, csv, or DataZip files from Azure.

The method handles the specific format of patio resource files and manages file system interactions as well as cache mechanisms.

Parameters:: filename (str) – the full path to the file including container and file extension.
Return type:: dict[str, pandas.DataFrame] | pandas.DataFrame

Examples

>>> from etoolbox.utils.cloud import read_cloud_file

>>> df = read_cloud_file("patio-data/20241031/utility_ids.parquet")
>>> df.head()
   utility_id_ferc1  ...  public_private_unmapped
0               1.0  ...                 unmapped
1             342.0  ...                   public
2             294.0  ...                   public
3             394.0  ...                   public
4             349.0  ...                   public

[5 rows x 37 columns]

etoolbox.utils.cloud.write_cloud_file(data, filename)[source]¶

Writes economic results for patio data to a specified filename in Azure storage.

Parameters:

data (pandas.DataFrame | str | bytes) – DataFrame, or str or bytes representing
filename (str) – Target filename for storing the results, it must include the container, full path, and appropriate file extension, i.e., parquet for a DataFrame; csv json yaml yml toml or txt for str/bytes.

Return type:

None