Intro to Snowpark API

Polars users might find the Snowpark/PySpark API verbose. But the high-level abstractions roughly map: snowpark.Column is like polars.Expr, and snowpark.DataFrame is like polars.LazyFrame.

Caveat: the list of methods on Column is much smaller than on Expr, and there’s no clear namespace separating methods by dtype. If you don’t see a method for your use case, check the gigantic list under snowpark.functions.

Let’s explore.

First, imports

from snowflake import snowpark as sp
from snowflake.snowpark import functions as F
from snowflake.snowpark import types as T

`Column` class

Accessible via F.col, e.g. lf.select(F.col("A")).

A few notable methods on Column instances:

cast
name (alias)
asc / desc
is_null
is_in
over
substr

Unlike Polars, there’s no method namespace separation by dtype. Still, Column feels a lot like polars.Expr.

`F` namespace (functions)

The F namespace provides many ways to manipulate Columns.

Note: all functions return a Column, not just F.col, e.g.

F.concat
F.contains
F.array_size
F.year
F.when(...).otherwise(...)
F.explode and F.flatten

Again, no subnamespaces by dtype — everything is dumped into functions.

`Window` class

Provides a bunch of class methods:

Window.partition_by
Window.order_by
Window.range_between (time or int)

And constants like:

Window.CURRENT_ROW

used as an argument to the over method.

`T` (types)

For type hints and casting, use:

T.ArrayType
T.DateType
T.StructType

Caveat: these are callables — instantiate them like T.DateType().

`DataFrame` class

It’s called a DataFrame, but really behaves like a LazyFrame — call .collect() to materialize.

Notable methods (all return a new DataFrame):

select
filter
pivot / unpivot
join
union
with_columns (takes list[str], list[Column])
with_column
distinct
fillna

and the .columns attribute.

Note: after joining, columns with the same name need to be aliased explicitly to save/view the joined table. e.g. t1.col.alias("c1"), t2.col.alias("c2").

IO

Ways to bring data in/out:

session.table("TABLE_NAME") → DataFrame
session.sql("SELECT ...") → DataFrame
session.read.format(...).schema(...).options(...).load(...)
lf.write.format(...).save(...), etc.

`Row` class

You get a list of Rows after collect() The .as_dict() method on Row makes it easy to interoperate with Polars: just pass a list of dicts to construct a Polars DataFrame.

Testing

sp.testing.assert_dataframe_equal

Horizontal ops in snowpark

Horizontal ops such as horizontal any (think np.any(..., axis=1)) can be achieved with a chain of logical OR e.g. (... | ... | ... ) or by using F.when().otherwise(). When the number of conditions/columns increases, I would like to use something similar to polars

import polars as pl
lf.select(pl.any_horizontal(col_names)) # col_names: list[str]

or the general purpose reduce/fold function for horizontal ops in polars.

lf.select(pl.reduce(lambda a,b:a|b, exprs = pl.col(col_names)))

In snowpark there is no any_horizontal, nor reduce function. But one can use python functools.reduce.

from functools import reduce

any_expr = reduce(lambda a, b: a | b, map(F.col, col_names))
lf.select(any_expr)

Exception

The most common is sp.exceptions.SnowparkSQLException, corresponding to 13xx SQL error.

Intro to Snowpark API

First, imports

Column class

F namespace (functions)

Window class

T (types)

DataFrame class

IO

Row class