etoolbox.utils.cloud¶
Tools for working with RMI’s Azure storage.
Functions¶
|
Cleanup cache and config directories. |
Interactive cloud setup. |
|
|
Write SAS token file to disk. |
Read SAS token from disk or environment variable. |
|
Read SAS token from disk or environment variable. |
|
Simplify reading from Azure using |
|
|
Work with files on Azure. |
Return info about cloud cache contents. |
|
|
Get the local cache path of a cloud file. |
|
List cloud files in a folder. |
|
Download a remote file from the cloud. |
|
Upload local files or directories to the cloud. |
|
Reads patio resource results from Azure. |
|
Read parquet, csv, or DataZip files from Azure. |
|
Writes economic results for patio data to a specified filename in Azure storage. |
Module Contents¶
- etoolbox.utils.cloud.cloud_clean(*, dry=False, all_=False)[source]¶
Cleanup cache and config directories.
- etoolbox.utils.cloud.cloud_init(account_name, token, *, dry_run=False, clobber=False)[source]¶
Write SAS token file to disk.
- etoolbox.utils.cloud.read_token()[source]¶
Read SAS token from disk or environment variable.
- Return type:
- etoolbox.utils.cloud.read_account_name()[source]¶
Read SAS token from disk or environment variable.
- Return type:
- etoolbox.utils.cloud.storage_options()[source]¶
Simplify reading from Azure using
polars
.When using
pandas
or writing to Azure, seermi_cloud_fs()
.Examples
>>> import polars as pl >>> from etoolbox.utils.cloud import storage_options
>>> df = pl.read_parquet("az://patio-data/test_data.parquet", **storage_options()) >>> df.head() shape: (5, 2) ┌────────────────────┬──────────────────┐ │ energy_source_code ┆ co2_mt_per_mmbtu │ │ --- ┆ --- │ │ str ┆ f64 │ ╞════════════════════╪══════════════════╡ │ AB ┆ 1.1817e-7 │ │ ANT ┆ 1.0369e-7 │ │ BFG ┆ 2.7432e-7 │ │ BIT ┆ 9.3280e-8 │ │ BLQ ┆ 9.4480e-8 │ └────────────────────┴──────────────────┘
- etoolbox.utils.cloud.rmi_cloud_fs(account_name=None, token=None)[source]¶
Work with files on Azure.
This can be used to read or write arbitrary files to or from Azure. And for files read from Azure, it will create and manage a local cache.
Examples
>>> import pandas as pd >>> from etoolbox.utils.cloud import rmi_cloud_fs
>>> fs = rmi_cloud_fs() >>> df = pd.read_parquet("az://patio-data/test_data.parquet", filesystem=fs) >>> df.head() energy_source_code co2_mt_per_mmbtu 0 AB 1.181700e-07 1 ANT 1.036900e-07 2 BFG 2.743200e-07 3 BIT 9.328000e-08 4 BLQ 9.448000e-08
Read with
polars
using the same filecache as withpandas
.>>> import polars as pl
>>> with fs.open("az://patio-data/test_data.parquet") as f: ... df = pl.read_parquet(f) >>> df.head() shape: (5, 2) ┌────────────────────┬──────────────────┐ │ energy_source_code ┆ co2_mt_per_mmbtu │ │ --- ┆ --- │ │ str ┆ f64 │ ╞════════════════════╪══════════════════╡ │ AB ┆ 1.1817e-7 │ │ ANT ┆ 1.0369e-7 │ │ BFG ┆ 2.7432e-7 │ │ BIT ┆ 9.3280e-8 │ │ BLQ ┆ 9.4480e-8 │ └────────────────────┴──────────────────┘
Write a parquet file, or really anything to Azure…
>>> with fs.open("az://patio-data/file.parquet", mode="wb") as f: ... df.write_parquet(f)
- etoolbox.utils.cloud.cached_path(cloud_path, *, download=False)[source]¶
Get the local cache path of a cloud file.
- Parameters:
cloud_path (str) – path on azure, e.g.
az://raw-data/test_data.parquet
download – download the file from Azure to create a local cache if it does not exist.
- Return type:
str | None
Examples
>>> import polars as pl >>> from etoolbox.utils.cloud import rmi_cloud_fs, cached_path
>>> fs = rmi_cloud_fs() >>> cloud_path = "az://patio-data/test_data.parquet" >>> with fs.open(cloud_path) as f: ... df = pl.read_parquet(f) >>> cached_path(cloud_path) '656706c40cb490423b652aa6d3b4903c56ab6c798ac4eb2fa3ccbab39ceebc4a'
- etoolbox.utils.cloud.get(to_get_path, destination, fs=None, *, quiet=True, clobber=False, azcopy_path='/opt/homebrew/bin/azcopy')[source]¶
Download a remote file from the cloud.
Uses
azcopy
CLI if available.- Parameters:
to_get_path (str) – remote file or folder to download of the form
<container>/...
destination (pathlib.Path | str) – local destination for the downloaded files
fs – filesystem
quiet – disable logging of adlfs output
clobber – overwrite existing files and directories if True
azcopy_path – path to azcopy executable
- Return type:
None
- etoolbox.utils.cloud.put(to_put_path, destination, fs=None, *, quiet=True, clobber=False, azcopy_path='/opt/homebrew/bin/azcopy')[source]¶
Upload local files or directories to the cloud.
Copies a specific file or tree of files. If destination ends with a “/”, it will be assumed to be a directory, and target files will go within.
Uses
azcopy
CLI if available.- Parameters:
to_put_path (pathlib.Path) – local file or folder to copy
destination (str) – copy destination of the form
<container>/...
fs – filesystem
quiet – disable logging of adlfs output
clobber – force overwriting of existing files (only works when azcopy is used)
azcopy_path – path to azcopy executable
- Return type:
None
- etoolbox.utils.cloud.read_patio_resource_results(datestr)[source]¶
Reads patio resource results from Azure.
Reads patio resource results from Azure and returns the extracted data as a dictionary (named list). The method handles the specific format of patio resource files and manages file system interactions as well as cache mechanisms.
- Parameters:
datestr (str) – Date string that identifies the model run.
- Return type:
- etoolbox.utils.cloud.read_cloud_file(filename)[source]¶
Read parquet, csv, or DataZip files from Azure.
The method handles the specific format of patio resource files and manages file system interactions as well as cache mechanisms.
- Parameters:
filename (str) – the full path to the file including container and file extension.
- Return type:
Examples
>>> from etoolbox.utils.cloud import read_cloud_file
>>> df = read_cloud_file("patio-data/20241031/utility_ids.parquet") >>> df.head() utility_id_ferc1 ... public_private_unmapped 0 1.0 ... unmapped 1 342.0 ... public 2 294.0 ... public 3 394.0 ... public 4 349.0 ... public [5 rows x 37 columns]
- etoolbox.utils.cloud.write_cloud_file(data, filename)[source]¶
Writes economic results for patio data to a specified filename in Azure storage.
- Parameters:
data (pandas.DataFrame | str | bytes) – DataFrame, or str or bytes representing
filename (str) – Target filename for storing the results, it must include the container, full path, and appropriate file extension, i.e., parquet for a DataFrame; csv json yaml yml toml or txt for str/bytes.
- Return type:
None