join

pydiverse.transform.join(
right: Table,
on: ColExpr[Bool] | str | list[ColExpr[Bool] | str],
how: Literal['inner', 'left', 'full'],
*,
validate: Literal['1:1', '1:m', 'm:1', 'm:m'] = 'm:m',
suffix: str | None = None,
) Pipeable[source]

Joins two tables on a boolean expression.

The left table in the join comes through the pipe >> operator from the left.

Parameters:
  • right – The right table to join with.

  • on – The join condition. If this is a list, the elements are joined via AND. Strings are interpreted as an equality condition on the columns with that name. See the note below for more information on which expressions are allowed.

  • how – The join type.

  • validate – Only relevant for polars. When set to "m:m", this does nothing. If set to "1:m", it is checked whether each right row matches at most one left row. In case this does not hold, an error is raised. Symmetrically, if set to "m:1" it is checked whether each left row matches at most one right row. If set to "1:1" both "1:m" and "m:1" are checked.

  • suffix – A string that is appended to all column names from the right table. If no suffix is specified and there are no column name collisions, columns will retain their original name. If there are name collisions, the name of the right table is appended to all columns of the right table. If this still does not resolve all name collisions, additionally an integer is appended to the column names of the right table. If the join condition only contains equalities of columns of the same name, only the columns of the right table colliding with a column of the left table are renamed.

Note

Not all backends can handle arbitrary boolean expressions in on with every join type.

polars:

For everything except conjunctions of equalities, it depends on whether polars join_asof can handle the join condition.

postgres:

For full joins, the join condition must be hashable or mergeable. See the postgres documentation for more details.

Tip

Two tables cannot be joined if one is derived from the other. In particular, before a self-join, the alias verb has to be applied to one table.

Examples

>>> t1 = pdt.Table({"a": [3, 1, 4, 1, 5, 9, 4]}, name="t1")
>>> t2 = pdt.Table({"a": [4, 4, 1, 7], "b": ["f", "g", "h", "i"]}, name="t2")
>>> t1 >> join(t2, t1.a == t2.a, how="left") >> show()
shape: (9, 3)
┌─────┬──────┬──────┐
│ a   ┆ a_t2 ┆ b_t2 │
│ --- ┆ ---  ┆ ---  │
│ i64 ┆ i64  ┆ str  │
╞═════╪══════╪══════╡
│ 3   ┆ null ┆ null │
│ 1   ┆ 1    ┆ h    │
│ 4   ┆ 4    ┆ f    │
│ 4   ┆ 4    ┆ g    │
│ 1   ┆ 1    ┆ h    │
│ 5   ┆ null ┆ null │
│ 9   ┆ null ┆ null │
│ 4   ┆ 4    ┆ f    │
│ 4   ┆ 4    ┆ g    │
└─────┴──────┴──────┘