etoolbox.utils.pudl

Functions and objects for accessing PUDL data.

Classes

PretendPudlTablCore

A DataZip of a PudlTabl can be recreated with this to avoid importing PUDL.

Functions

rmi_pudl_clean([args, dry, legacy])

Remove rmi.pudl local cache.

pudl_cache()

Return info about the contents of the PUDL cache.

pudl_list([release, token, detail])

List PUDL tables in AWS using ls command.

pd_read_pudl(table_name[, release, token, filters, ...])

Read PUDL table from AWS as pandas.DataFrame.

pl_scan_pudl(table_name[, release, token, use_polars])

Read PUDL table from AWS as polars.LazyFrame.

pl_read_pudl(table_name[, release, token, use_polars])

Read PUDL table from AWS as polars.DataFrame.

generator_ownership([year, release])

Generator ownership.

conform_pudl_dtypes(df)

Conform types of PUDL columns to those in PudlTabl.

get_pudl_sql_url([file])

Get the URL for the pudl.sqlite DB.

Module Contents

etoolbox.utils.pudl.rmi_pudl_clean(args=None, *, dry=True, legacy=False)[source]

Remove rmi.pudl local cache.

Parameters:
Return type:

None

etoolbox.utils.pudl.pudl_cache()[source]

Return info about the contents of the PUDL cache.

etoolbox.utils.pudl.pudl_list(release='nightly', token=None, *, detail=False)[source]

List PUDL tables in AWS using ls command.

Parameters:
  • release (str) – nightly, stable or versioned, pass None to list all

  • token (dict | str | None) – ignored

  • detail (bool) – if True, return details of each table, otherwise just names

Return type:

list[str | dict[str, str | int]]

Examples

>>> from etoolbox.utils.pudl import pudl_list

List PUDL releases, the actual release is the part after the /.

>>> pudl_list(None)  
['pudl.catalyst.coop/nightly', 'pudl.catalyst.coop/stable', ...]

For the most recent, you want the last on the list ie releases[-1]

etoolbox.utils.pudl.pd_read_pudl(table_name, release='nightly', token=None, filters=None, *, date_as_object=False, **kwargs)[source]

Read PUDL table from AWS as pandas.DataFrame.

Parameters:
  • table_name (str) – name of table in PUDL sqlite database

  • release (str) – nightly, stable or versioned, use pudl_list() to see releases.

  • token (dict | str | None) – ignored

  • filters – passed to pyarrow.parquet.read_table()

  • date_as_object (bool) – Cast dates to objects. If False, convert to datetime64 dtype with the equivalent time unit (if supported), this is the default here, differing from that in pyarrow.Table.to_pandas().

  • kwargs – passed to pyarrow.Table.to_pandas()

Return type:

pandas.DataFrame

etoolbox.utils.pudl.pl_scan_pudl(table_name, release='nightly', token=None, *, use_polars=False, **kwargs)[source]

Read PUDL table from AWS as polars.LazyFrame.

Note

Accessing PUDL tables directly from AWS using polars requires version 0.20 or higher.

Parameters:
Return type:

polars.LazyFrame

etoolbox.utils.pudl.pl_read_pudl(table_name, release='nightly', token=None, *, use_polars=False, **kwargs)[source]

Read PUDL table from AWS as polars.DataFrame.

Note

Accessing PUDL tables directly from AWS using polars requires version 0.20 or higher.

Parameters:
  • table_name (str) – name of table in PUDL sqlite database

  • release (str) – nightly, stable or versioned, use pudl_list() to see releases.

  • token (str | None) – ignored

  • use_polars – use polars AWS client rather than s3fs, this does not work with local caching (must be false until we fix)

  • kwargs – passed to polars.scan_parquet()

Return type:

polars.DataFrame

etoolbox.utils.pudl.generator_ownership(year=None, release='nightly')[source]

Generator ownership.

Parameters:
  • year (int | None) – year of report date to use

  • release (str) – nightly, stable or versioned, use pudl_list() to see releases.

Return type:

polars.DataFrame

Examples

>>> from etoolbox.utils.pudl import generator_ownership
>>>
>>> generator_ownership(year=2023, release="v2024.10.0").sort(
...     "plant_id_eia"
... ).select("plant_id_eia", "generator_id", "owner_utility_id_eia").head()
shape: (5, 3)
┌──────────────┬──────────────┬──────────────────────┐
│ plant_id_eia ┆ generator_id ┆ owner_utility_id_eia │
│ ---          ┆ ---          ┆ ---                  │
│ i64          ┆ str          ┆ i64                  │
╞══════════════╪══════════════╪══════════════════════╡
│ 1            ┆ 1            ┆ 63560                │
│ 1            ┆ 2            ┆ 63560                │
│ 1            ┆ 3            ┆ 63560                │
│ 1            ┆ 5.1          ┆ 63560                │
│ 1            ┆ WT1          ┆ 63560                │
└──────────────┴──────────────┴──────────────────────┘
etoolbox.utils.pudl.conform_pudl_dtypes(df)[source]

Conform types of PUDL columns to those in PudlTabl.

Parameters:

df (pandas.DataFrame) – a dataframe with columns from PUDL

Return type:

pandas.DataFrame

Returns: the pudl table with standardized dtypes

Examples

import pandas as pd
import sqlalchemy as sa

from etoolbox.utils.pudl import get_pudl_sql_url, conform_pudl_dtypes

pd.read_sql_table(table_name, sa.create_engine(get_pudl_sql_url())).pipe(
    conform_pudl_dtypes
)
etoolbox.utils.pudl.get_pudl_sql_url(file=PUDL_CONFIG)[source]

Get the URL for the pudl.sqlite DB.

Return type:

str

class etoolbox.utils.pudl.PretendPudlTablCore[source]

A DataZip of a PudlTabl can be recreated with this to avoid importing PUDL.

DeprecationWarning

PretendPudlTablCore will be removed in a future version, read tables directly from the sqlite.