etoolbox.utils.cloud¶
Tools for working with RMI’s Azure storage.
Functions¶
|
Cleanup cache and config directories. |
Interactive cloud setup. |
|
|
Write SAS token file to disk. |
Read SAS token from disk or environment variable. |
|
Read SAS token from disk or environment variable. |
|
Simplify reading from Azure using |
|
|
Work with files on Azure. |
Return info about cloud cache contents. |
|
|
Get the local cache path of a cloud file. |
|
List cloud files in a folder. |
|
Download a remote file from the cloud. |
|
Upload local files or directories to the cloud. |
|
Reads patio resource results from Azure. |
|
Read parquet, csv, or DataZip files from Azure. |
|
Writes economic results for patio data to a specified filename in Azure storage. |
Module Contents¶
- etoolbox.utils.cloud.cloud_clean(*, dry=False, all_=False)[source]¶
Cleanup cache and config directories.
- etoolbox.utils.cloud.cloud_init(account_name, token, *, dry_run=False, clobber=False)[source]¶
Write SAS token file to disk.
- etoolbox.utils.cloud.read_token()[source]¶
Read SAS token from disk or environment variable.
- Return type:
- etoolbox.utils.cloud.read_account_name()[source]¶
Read SAS token from disk or environment variable.
- Return type:
- etoolbox.utils.cloud.storage_options()[source]¶
Simplify reading from Azure using
polars
.When using
pandas
or writing to Azure, seermi_cloud_fs()
.Examples
>>> import polars as pl >>> from etoolbox.utils.cloud import storage_options
>>> df = pl.read_parquet("az://raw-data/test_data.parquet", **storage_options()) >>> df.select("plant_id_eia", "re_type").head() shape: (5, 2) ┌──────────────────────┬─────────┐ │ plant_id_eia ┆ re_type │ │ --- ┆ --- │ │ i64 ┆ str │ ╞══════════════════════╪═════════╡ │ -1065799821027645681 ┆ solar │ │ 500701449105794732 ┆ solar │ │ 5264981444132581172 ┆ solar │ │ 8596148642566783026 ┆ solar │ │ 8293386810295812914 ┆ solar │ └──────────────────────┴─────────┘
- etoolbox.utils.cloud.rmi_cloud_fs(account_name=None, token=None)[source]¶
Work with files on Azure.
This can be used to read or write arbitrary files to or from Azure. And for files read from Azure, it will create and manage a local cache.
Examples
>>> import pandas as pd >>> from etoolbox.utils.cloud import rmi_cloud_fs
>>> fs = rmi_cloud_fs() >>> df = pd.read_parquet("az://raw-data/test_data.parquet", filesystem=fs) >>> df[["plant_id_eia", "re_type"]].head() plant_id_eia re_type 0 -1065799821027645681 solar 1 500701449105794732 solar 2 5264981444132581172 solar 3 8596148642566783026 solar 4 8293386810295812914 solar
Read with
polars
using the same filecache as withpandas
.>>> import polars as pl
>>> with fs.open("az://raw-data/test_data.parquet") as f: ... df = pl.read_parquet(f) >>> df.select("plant_id_eia", "re_type").head() shape: (5, 2) ┌──────────────────────┬─────────┐ │ plant_id_eia ┆ re_type │ │ --- ┆ --- │ │ i64 ┆ str │ ╞══════════════════════╪═════════╡ │ -1065799821027645681 ┆ solar │ │ 500701449105794732 ┆ solar │ │ 5264981444132581172 ┆ solar │ │ 8596148642566783026 ┆ solar │ │ 8293386810295812914 ┆ solar │ └──────────────────────┴─────────┘
Write a parquet file, or really anything to Azure…
>>> with fs.open("az://raw-data/file.parquet", mode="wb") as f: ... df.write_parquet(f)
- etoolbox.utils.cloud.cached_path(cloud_path, *, download=False)[source]¶
Get the local cache path of a cloud file.
- Parameters:
cloud_path (str) – path on azure, e.g.
az://raw-data/test_data.parquet
download – download the file from Azure to create a local cache if it does not exist.
- Return type:
str | None
Examples
>>> import polars as pl >>> from etoolbox.utils.cloud import rmi_cloud_fs, cached_path
>>> fs = rmi_cloud_fs() >>> cloud_path = "az://raw-data/test_data.parquet" >>> with fs.open(cloud_path) as f: ... df = pl.read_parquet(f) >>> cached_path(cloud_path) '2a722b95bfff23b14d1deaa81cca3b697b875934df3858159d205d20dcf1e305'
- etoolbox.utils.cloud.get(to_get_path, destination, fs=None, *, quiet=True, clobber=False, azcopy_path='/opt/homebrew/bin/azcopy')[source]¶
Download a remote file from the cloud.
Uses
azcopy
CLI if available.- Parameters:
to_get_path (str) – remote file or folder to download of the form
<container>/...
destination (pathlib.Path | str) – local destination for the downloaded files
fs – filesystem
quiet – disable logging of adlfs output
clobber – overwrite existing files and directories if True
azcopy_path – path to azcopy executable
- Return type:
None
- etoolbox.utils.cloud.put(to_put_path, destination, fs=None, *, quiet=True, clobber=False, azcopy_path='/opt/homebrew/bin/azcopy')[source]¶
Upload local files or directories to the cloud.
Copies a specific file or tree of files. If destination ends with a “/”, it will be assumed to be a directory, and target files will go within.
Uses
azcopy
CLI if available.- Parameters:
to_put_path (pathlib.Path) – local file or folder to copy
destination (str) – copy destination of the form
<container>/...
fs – filesystem
quiet – disable logging of adlfs output
clobber – force overwriting of existing files (only works when azcopy is used)
azcopy_path – path to azcopy executable
- Return type:
None
- etoolbox.utils.cloud.read_patio_resource_results(datestr)[source]¶
Reads patio resource results from Azure.
Reads patio resource results from Azure and returns the extracted data as a dictionary (named list). The method handles the specific format of patio resource files and manages file system interactions as well as cache mechanisms.
- Parameters:
datestr (str) – Date string that identifies the model run.
- Return type:
- etoolbox.utils.cloud.read_cloud_file(filename)[source]¶
Read parquet, csv, or DataZip files from Azure.
The method handles the specific format of patio resource files and manages file system interactions as well as cache mechanisms.
- Parameters:
filename (str) – the full path to the file including container and file extension.
- Return type:
Examples
>>> from etoolbox.utils.cloud import read_cloud_file
>>> df = read_cloud_file("patio-data/20241031/utility_ids.parquet") >>> df.head() utility_id_ferc1 ... public_private_unmapped 0 1.0 ... unmapped 1 342.0 ... public 2 294.0 ... public 3 394.0 ... public 4 349.0 ... public [5 rows x 37 columns]
- etoolbox.utils.cloud.write_cloud_file(data, filename)[source]¶
Writes economic results for patio data to a specified filename in Azure storage.
- Parameters:
data (pandas.DataFrame | str | bytes) – DataFrame, or str or bytes representing
filename (str) – Target filename for storing the results, it must include the container, full path, and appropriate file extension, i.e., parquet for a DataFrame; csv json yaml yml toml or txt for str/bytes.
- Return type:
None