collect¶
- pydiverse.transform.collect( ) Pipeable[source]¶
Execute all accumulated operations and write the result to a new Table.
This verb is only for polars-backed tables. All operations lazily stored in the table are executed and a new table containing the result is returned. The returned table always stored the data in a polars LazyFrame. One can choose whether the following operations on the table are executed via polars or DuckDB on the LazyFrame (see also DuckDB, Polars, and Parquet).
- Parameters:
target – The execution engine to be used from here on. Can be either
PolarsorDuckDb.
Examples
Here,
collectdoes not change anything in the result, but themutateis executed on the DataFrame whencollectis called, whereas thearrangeis only executed whenexportis called. Withoutcollect, themutatewould only have been executed with theexport, too.>>> t = pdt.Table({"a": [4, 2, 1, 4], "b": ["l", "g", "uu", "-- r"]}) >>> ( ... t ... >> mutate(z=t.a + t.b.str.len()) ... >> collect() ... >> arrange(C.z, t.a) ... >> show() ... ) shape: (4, 3) ┌─────┬────────┬─────┐ │ a ┆ b ┆ z │ │ --- ┆ --- ┆ --- │ │ i64 ┆ str ┆ i64 │ ╞═════╪════════╪═════╡ │ 1 ┆ uu ┆ 3 │ │ 2 ┆ g ┆ 3 │ │ 4 ┆ l ┆ 5 │ │ 4 ┆ -- r ┆ 10 │ └─────┴────────┴─────┘