etoolbox.utils.cloud

Tools for working with RMI’s Azure storage.

Functions

cloud_clean(*[, dry, all_])

Cleanup cache and config directories.

cloud_setup()

Interactive cloud setup.

cloud_init(account_name, token, *[, dry_run, clobber])

Write SAS token file to disk.

read_token()

Read SAS token from disk or environment variable.

read_account_name()

Read SAS token from disk or environment variable.

storage_options()

Simplify reading from Azure using polars.

rmi_cloud_fs([account_name, token])

Work with files on Azure.

cache_info()

Return info about cloud cache contents.

cached_path(cloud_path, *[, download])

Get the local cache path of a cloud file.

cloud_list(path, *[, detail])

List cloud files in a folder.

get(to_get_path, destination[, fs, quiet, clobber, ...])

Download a remote file from the cloud.

put(to_put_path, destination[, fs, quiet, clobber, ...])

Upload local files or directories to the cloud.

read_patio_resource_results(datestr)

Reads patio resource results from Azure.

read_cloud_file(filename)

Read parquet, csv, or DataZip files from Azure.

write_cloud_file(data, filename)

Writes economic results for patio data to a specified filename in Azure storage.

Module Contents

etoolbox.utils.cloud.cloud_clean(*, dry=False, all_=False)[source]

Cleanup cache and config directories.

Parameters:
etoolbox.utils.cloud.cloud_setup()[source]

Interactive cloud setup.

etoolbox.utils.cloud.cloud_init(account_name, token, *, dry_run=False, clobber=False)[source]

Write SAS token file to disk.

Parameters:
etoolbox.utils.cloud.read_token()[source]

Read SAS token from disk or environment variable.

Return type:

str

etoolbox.utils.cloud.read_account_name()[source]

Read SAS token from disk or environment variable.

Return type:

str

etoolbox.utils.cloud.storage_options()[source]

Simplify reading from Azure using polars.

When using pandas or writing to Azure, see rmi_cloud_fs().

Examples

>>> import polars as pl
>>> from etoolbox.utils.cloud import storage_options
>>> df = pl.read_parquet("az://raw-data/test_data.parquet", **storage_options())
>>> df.select("plant_id_eia", "re_type").head()
shape: (5, 2)
┌──────────────────────┬─────────┐
│ plant_id_eia         ┆ re_type │
│ ---                  ┆ ---     │
│ i64                  ┆ str     │
╞══════════════════════╪═════════╡
│ -1065799821027645681 ┆ solar   │
│ 500701449105794732   ┆ solar   │
│ 5264981444132581172  ┆ solar   │
│ 8596148642566783026  ┆ solar   │
│ 8293386810295812914  ┆ solar   │
└──────────────────────┴─────────┘
etoolbox.utils.cloud.rmi_cloud_fs(account_name=None, token=None)[source]

Work with files on Azure.

This can be used to read or write arbitrary files to or from Azure. And for files read from Azure, it will create and manage a local cache.

Examples

>>> import pandas as pd
>>> from etoolbox.utils.cloud import rmi_cloud_fs
>>> fs = rmi_cloud_fs()
>>> df = pd.read_parquet("az://raw-data/test_data.parquet", filesystem=fs)
>>> df[["plant_id_eia", "re_type"]].head()
          plant_id_eia re_type
0 -1065799821027645681   solar
1   500701449105794732   solar
2  5264981444132581172   solar
3  8596148642566783026   solar
4  8293386810295812914   solar

Read with polars using the same filecache as with pandas.

>>> import polars as pl
>>> with fs.open("az://raw-data/test_data.parquet") as f:
...     df = pl.read_parquet(f)
>>> df.select("plant_id_eia", "re_type").head()
shape: (5, 2)
┌──────────────────────┬─────────┐
│ plant_id_eia         ┆ re_type │
│ ---                  ┆ ---     │
│ i64                  ┆ str     │
╞══════════════════════╪═════════╡
│ -1065799821027645681 ┆ solar   │
│ 500701449105794732   ┆ solar   │
│ 5264981444132581172  ┆ solar   │
│ 8596148642566783026  ┆ solar   │
│ 8293386810295812914  ┆ solar   │
└──────────────────────┴─────────┘

Write a parquet file, or really anything to Azure…

>>> with fs.open("az://raw-data/file.parquet", mode="wb") as f:
...     df.write_parquet(f)
Return type:

fsspec.implementations.cached.WholeFileCacheFileSystem

etoolbox.utils.cloud.cache_info()[source]

Return info about cloud cache contents.

etoolbox.utils.cloud.cached_path(cloud_path, *, download=False)[source]

Get the local cache path of a cloud file.

Parameters:
  • cloud_path (str) – path on azure, e.g. az://raw-data/test_data.parquet

  • download – download the file from Azure to create a local cache if it does not exist.

Return type:

str | None

Examples

>>> import polars as pl
>>> from etoolbox.utils.cloud import rmi_cloud_fs, cached_path
>>> fs = rmi_cloud_fs()
>>> cloud_path = "az://raw-data/test_data.parquet"
>>> with fs.open(cloud_path) as f:
...     df = pl.read_parquet(f)
>>> cached_path(cloud_path)
'2a722b95bfff23b14d1deaa81cca3b697b875934df3858159d205d20dcf1e305'
etoolbox.utils.cloud.cloud_list(path, *, detail=False)[source]

List cloud files in a folder.

Parameters:
  • path (str) – remote folder to list contents of e.g. ‘<container>/…’

  • detail – include detail information

Return type:

list[str] | dict

etoolbox.utils.cloud.get(to_get_path, destination, fs=None, *, quiet=True, clobber=False, azcopy_path='/opt/homebrew/bin/azcopy')[source]

Download a remote file from the cloud.

Uses azcopy CLI if available.

Parameters:
  • to_get_path (str) – remote file or folder to download of the form <container>/...

  • destination (pathlib.Path | str) – local destination for the downloaded files

  • fs – filesystem

  • quiet – disable logging of adlfs output

  • clobber – overwrite existing files and directories if True

  • azcopy_path – path to azcopy executable

Return type:

None

etoolbox.utils.cloud.put(to_put_path, destination, fs=None, *, quiet=True, clobber=False, azcopy_path='/opt/homebrew/bin/azcopy')[source]

Upload local files or directories to the cloud.

Copies a specific file or tree of files. If destination ends with a “/”, it will be assumed to be a directory, and target files will go within.

Uses azcopy CLI if available.

Parameters:
  • to_put_path (pathlib.Path) – local file or folder to copy

  • destination (str) – copy destination of the form <container>/...

  • fs – filesystem

  • quiet – disable logging of adlfs output

  • clobber – force overwriting of existing files (only works when azcopy is used)

  • azcopy_path – path to azcopy executable

Return type:

None

etoolbox.utils.cloud.read_patio_resource_results(datestr)[source]

Reads patio resource results from Azure.

Reads patio resource results from Azure and returns the extracted data as a dictionary (named list). The method handles the specific format of patio resource files and manages file system interactions as well as cache mechanisms.

Parameters:

datestr (str) – Date string that identifies the model run.

Return type:

dict[str, pandas.DataFrame]

etoolbox.utils.cloud.read_cloud_file(filename)[source]

Read parquet, csv, or DataZip files from Azure.

The method handles the specific format of patio resource files and manages file system interactions as well as cache mechanisms.

Parameters:

filename (str) – the full path to the file including container and file extension.

Return type:

dict[str, pandas.DataFrame] | pandas.DataFrame

Examples

>>> from etoolbox.utils.cloud import read_cloud_file
>>> df = read_cloud_file("patio-data/20241031/utility_ids.parquet")
>>> df.head()
   utility_id_ferc1  ...  public_private_unmapped
0               1.0  ...                 unmapped
1             342.0  ...                   public
2             294.0  ...                   public
3             394.0  ...                   public
4             349.0  ...                   public

[5 rows x 37 columns]
etoolbox.utils.cloud.write_cloud_file(data, filename)[source]

Writes economic results for patio data to a specified filename in Azure storage.

Parameters:
  • data (pandas.DataFrame | str | bytes) – DataFrame, or str or bytes representing

  • filename (str) – Target filename for storing the results, it must include the container, full path, and appropriate file extension, i.e., parquet for a DataFrame; csv json yaml yml toml or txt for str/bytes.

Return type:

None