transfer_col_references

pydiverse.transform.transfer_col_references(table, ref_source)[source]

Transfers the column references from ref_source to table.

The returned table contains all selected columns of table, but its columns are now referenced by the columns from ref_source. All column names selected in table must also be present in ref_source.

Parameters:
  • table – The table from which the data is taken.

  • ref_source – The table from which the column references are taken.

Examples

Materialization without breaking the functional flow. Say you have a function your_materialization_fn that writes a transform table to a database and returns a transform table again. Then you can define a custom verb

>>> @verb
... def materialize(table) -> pdt.Table:
...     new = your_materialization_fn(table)
...     return pdt.transfer_col_references(new, table)

With this verb, it is possible to write

>>> t = pdt.Table(dict(a=[1, 2, 5], b=["x", "y", "z"]), name="t")
>>> t >> filter(t.a >= 2) >> materialize() >> mutate(z=t.a + t.b.str.len())
Table `t` (backend: polars)
shape: (2, 3)
┌─────┬─────┬─────┐
│ a   ┆ b   ┆ z   │
│ --- ┆ --- ┆ --- │
│ i64 ┆ str ┆ i64 │
╞═════╪═════╪═════╡
│ 2   ┆ y   ┆ 3   │
│ 5   ┆ z   ┆ 6   │
└─────┴─────┴─────┘

Without transfer_col_references, it would not be possible to use t.a and t.b in the mutate. (Of course, you would normally have a SQL backend when materializing, not a polars backend like in the example here.)