DuckDB, Polars, and ParquetΒΆ
Pydiverse.transform can swiftly switch between DuckDB and Polars based execution:
from pydiverse.transform.extended import *
import pydiverse.transform as pdt
tbl = pdt.Table(dict(x=[1, 2, 3], y=[4, 5, 6]), name="A")
tbl2 = pdt.Table(dict(x=[2, 3], z=["b", "c"]), name="B") >> collect(DuckDb())
out = (
tbl >> collect(DuckDb()) >> left_join(tbl2, tbl.x == tbl2.x) >> show_query()
>> collect(Polars()) >> mutate(z=tbl.x + tbl.y) >> show()
)
df1 = out >> export(Polars())
print(type(df1))
df2 = out >> export(Polars(lazy=False))
print(type(df2))
In the future, it is also intended to allow both DuckDB and Polars backends to read and write Parquet files.