etoolbox.utils.pudl¶
Functions and objects for accessing PUDL data.
Classes¶
A DataZip of a PudlTabl can be recreated with this to avoid importing PUDL. |
Functions¶
|
Remove rmi.pudl local cache. |
Return info about the contents of the PUDL cache. |
|
|
List PUDL tables in AWS using |
|
Read PUDL table from AWS as |
|
Read PUDL table from AWS as |
|
Read PUDL table from AWS as |
|
Generator ownership. |
Conform types of PUDL columns to those in PudlTabl. |
|
|
Get the URL for the pudl.sqlite DB. |
Module Contents¶
- etoolbox.utils.pudl.rmi_pudl_clean(args=None, *, dry=True, legacy=False)[source]¶
Remove rmi.pudl local cache.
- etoolbox.utils.pudl.pudl_list(release='nightly', token=None, *, detail=False)[source]¶
List PUDL tables in AWS using
ls
command.- Parameters:
- Return type:
Examples
>>> from etoolbox.utils.pudl import pudl_list
List PUDL releases, the actual release is the part after the
/
.>>> pudl_list(None) ['pudl.catalyst.coop/nightly', 'pudl.catalyst.coop/stable', ...]
For the most recent, you want the last on the list ie
releases[-1]
- etoolbox.utils.pudl.pd_read_pudl(table_name, release='nightly', token=None, filters=None, *, date_as_object=False, **kwargs)[source]¶
Read PUDL table from AWS as
pandas.DataFrame
.- Parameters:
table_name (str) – name of table in PUDL sqlite database
release (str) –
nightly
,stable
or versioned, usepudl_list()
to see releases.filters – passed to
pyarrow.parquet.read_table()
date_as_object (bool) – Cast dates to objects. If False, convert to datetime64 dtype with the equivalent time unit (if supported), this is the default here, differing from that in
pyarrow.Table.to_pandas()
.kwargs – passed to
pyarrow.Table.to_pandas()
- Return type:
- etoolbox.utils.pudl.pl_scan_pudl(table_name, release='nightly', token=None, *, use_polars=False, **kwargs)[source]¶
Read PUDL table from AWS as
polars.LazyFrame
.Note
Accessing PUDL tables directly from AWS using polars requires version 0.20 or higher.
- Parameters:
table_name (str) – name of table in PUDL sqlite database
release (str) –
nightly
,stable
or versioned, usepudl_list()
to see releases.token (str | pathlib.Path | None) – ignored
use_polars – If
True
, use polars AWS client (currently nonfunctional), this does not work with local caching. IfFalse
, usefsspec.implementations.cached.WholeFileCacheFileSystem
for file access and caching.kwargs – passed to
polars.scan_parquet()
- Return type:
polars.LazyFrame
- etoolbox.utils.pudl.pl_read_pudl(table_name, release='nightly', token=None, *, use_polars=False, **kwargs)[source]¶
Read PUDL table from AWS as
polars.DataFrame
.Note
Accessing PUDL tables directly from AWS using polars requires version 0.20 or higher.
- Parameters:
table_name (str) – name of table in PUDL sqlite database
release (str) –
nightly
,stable
or versioned, usepudl_list()
to see releases.token (str | None) – ignored
use_polars – use polars AWS client rather than s3fs, this does not work with local caching (must be false until we fix)
kwargs – passed to
polars.scan_parquet()
- Return type:
polars.DataFrame
- etoolbox.utils.pudl.generator_ownership(year=None, release='nightly')[source]¶
Generator ownership.
- Parameters:
year (int | None) – year of report date to use
release (str) –
nightly
,stable
or versioned, usepudl_list()
to see releases.
- Return type:
polars.DataFrame
Examples
>>> from etoolbox.utils.pudl import generator_ownership >>> >>> generator_ownership(year=2023, release="v2024.10.0").sort( ... "plant_id_eia" ... ).select("plant_id_eia", "generator_id", "owner_utility_id_eia").head() shape: (5, 3) ┌──────────────┬──────────────┬──────────────────────┐ │ plant_id_eia ┆ generator_id ┆ owner_utility_id_eia │ │ --- ┆ --- ┆ --- │ │ i64 ┆ str ┆ i64 │ ╞══════════════╪══════════════╪══════════════════════╡ │ 1 ┆ 1 ┆ 63560 │ │ 1 ┆ 2 ┆ 63560 │ │ 1 ┆ 3 ┆ 63560 │ │ 1 ┆ 5.1 ┆ 63560 │ │ 1 ┆ WT1 ┆ 63560 │ └──────────────┴──────────────┴──────────────────────┘
- etoolbox.utils.pudl.conform_pudl_dtypes(df)[source]¶
Conform types of PUDL columns to those in PudlTabl.
- Parameters:
df (pandas.DataFrame) – a dataframe with columns from PUDL
- Return type:
Returns: the pudl table with standardized dtypes
Examples
import pandas as pd import sqlalchemy as sa from etoolbox.utils.pudl import get_pudl_sql_url, conform_pudl_dtypes pd.read_sql_table(table_name, sa.create_engine(get_pudl_sql_url())).pipe( conform_pudl_dtypes )