collect

pydiverse.transform.collect(
target: Target | None = None,
*,
keep_col_refs: bool = True,
) Pipeable[source]

Execute all accumulated operations and write the result to a new Table.

This verb is only for polars-backed tables. All operations lazily stored in the table are executed and a new table containing the result is returned. The returned table always stored the data in a polars LazyFrame. One can choose whether the following operations on the table are executed via polars or DuckDB on the LazyFrame (see also DuckDB, Polars, and Parquet).

Parameters:

target – The execution engine to be used from here on. Can be either Polars or DuckDb.

Examples

Here, collect does not change anything in the result, but the mutate is executed on the DataFrame when collect is called, whereas the arrange is only executed when export is called. Without collect, the mutate would only have been executed with the export, too.

>>> t = pdt.Table({"a": [4, 2, 1, 4], "b": ["l", "g", "uu", "--   r"]})
>>> (
...     t
...     >> mutate(z=t.a + t.b.str.len())
...     >> collect()
...     >> arrange(C.z, t.a)
...     >> show()
... )
shape: (4, 3)
┌─────┬────────┬─────┐
│ a   ┆ b      ┆ z   │
│ --- ┆ ---    ┆ --- │
│ i64 ┆ str    ┆ i64 │
╞═════╪════════╪═════╡
│ 1   ┆ uu     ┆ 3   │
│ 2   ┆ g      ┆ 3   │
│ 4   ┆ l      ┆ 5   │
│ 4   ┆ --   r ┆ 10  │
└─────┴────────┴─────┘