API

Table

class pydiverse.transform.Table[source]
__init__(
resource: DataFrame | LazyFrame | DataFrame | Table | str | dict,
backend: Target | None = None,
*,
name: str | None = None,
)[source]

Creates a new table.

Parameters:
  • resource – The data source to construct the table from. This can be a polars or pandas data frame, a python dictionary, a SQLAlchemy table or the name of a table in a SQL database.

  • backend – The execution backend. This must be one of the pydiverse.transform backend objects, see Backends / Export Targets. It may carry additional information how to interpret the resource argument, such as a SQLAlchemy engine.

  • name – The name of the table. It is not required to give the table a name, but may make print output more readable.

Examples

Python dictionary.

>>> t = pdt.Table(
...     {
...         "a": [4, 3, -35, 24, 105],
...         "b": [4, 4, 0, -23, 42],
...     },
...     name="T",
... )
>>> t >> show()
Table T, backend: PolarsImpl
shape: (5, 2)
┌─────┬─────┐
│ a   ┆ b   │
│ --- ┆ --- │
│ i64 ┆ i64 │
╞═════╪═════╡
│ 4   ┆ 4   │
│ 3   ┆ 4   │
│ -35 ┆ 0   │
│ 24  ┆ -23 │
│ 105 ┆ 42  │
└─────┴─────┘

Polars data frame.

>>> df = pl.DataFrame(
...     {
...         "a": [4, 3, -35, 24, 105],
...         "b": ["a", "o", "---", "i23", "  "],
...     },
... )
>>> t = pdt.Table(df, name="T")
>>> t >> show()
Table T, backend: PolarsImpl
shape: (5, 2)
┌─────┬─────┐
│ a   ┆ b   │
│ --- ┆ --- │
│ i64 ┆ str │
╞═════╪═════╡
│ 4   ┆ a   │
│ 3   ┆ o   │
│ -35 ┆ --- │
│ 24  ┆ i23 │
│ 105 ┆     │
└─────┴─────┘

Pandas data frame. Note that the data frame is converted to a polars data frame and the backend is polars.

>>> import pandas as pd
>>> df = pd.DataFrame(
...     {
...         "a": [4, 3, -35, 24, 105],
...         "b": ["a", "o", "---", "i23", "  "],
...     },
... )
>>> t = pdt.Table(df, name="T")
>>> t >> show()
Table T, backend: PolarsImpl
shape: (5, 2)
┌─────┬─────┐
│ a   ┆ b   │
│ --- ┆ --- │
│ i64 ┆ str │
╞═════╪═════╡
│ 4   ┆ a   │
│ 3   ┆ o   │
│ -35 ┆ --- │
│ 24  ┆ i23 │
│ 105 ┆     │
└─────┴─────┘

SQL. Assuming you have a SQLAlchemy engine engine, which is has a connection to a database containing a table t1 in a schema s1, you can create a pydiverse.transform Table from it as follows.

>>> t = pdt.Table("t1", SqlAlchemy(engine, schema="s1"))
>>> t >> show()
Table t1, backend: PostgresImpl
shape: (5, 2)
┌─────┬─────┐
│ a   ┆ b   │
│ --- ┆ --- │
│ i64 ┆ str │
╞═════╪═════╡
│ 4   ┆ a   │
│ 3   ┆ o   │
│ -35 ┆ --- │
│ 24  ┆ i23 │
│ 105 ┆     │
└─────┴─────┘

Note that the name argument to the pdt.Table constructor was not specified, so transform used the name of the SQL table. This example of course assumes that a database connection is set up and the above table is already present in the database. For more information on how to set up a connection, see Database testing.

ColExpr

class pydiverse.transform.ColExpr[source]
dtype() Dtype[source]

Returns the data type of the expression.

Col

class pydiverse.transform.Col[source]
export(
target: Target,
) Series | Series[source]

Exports a column expression.

Parameters:

target – The data frame library to export to. Can be a Polars or Pandas object. The lazy kwarg for polars is ignored.

Returns:

A polars or pandas Series.

Note

Not every column expression can be exported. Unlike mutate or other verbs, there is no ambient table the expression lives in, which is required to resolve C-columns and correctly deal with columns from different tables. Thus, the expression must contain one column whose table contains all other columns appearing in the expression. The table of this column is then used to export the expression.

Examples

>>> t1 = pdt.Table({"h": [2.465, 0.22, -4.477, 10.8, -81.2, 0.0]})
>>> t1.h.export(Polars)
shape: (6,)
Series: 'h' [f64]
[
        2.465
        0.22
        -4.477
        10.8
        -81.2
        0.0
]
>>> t1.h.export(Pandas())
0    2.465
1     0.22
2   -4.477
3     10.8
4    -81.2
5      0.0
Name: h, dtype: double[pyarrow]