Generate test dataframe with polars
(powered by hypothesis
)
The sub-module polars.testing.parametric
provide tools for generating fake data for testing purposes.
Here is an example showing what can be done with just dataframes
and column
functions in this module
import polars as pl
from polars.testing.parametric import dataframes, column
def generate(size=5):
return dataframes(
[
column("id", dtype=pl.UInt16, unique=True, allow_null=False),
column("value", dtype=pl.Int16, allow_null=True),
column("cat", dtype =pl.Enum("XYZ"), allow_null=False)
],
min_size=size, max_size=size)
original = generate().example()
The output is random, i.e. evey call to the example
method would generate a new dataframe with the prescribed characteristics (this method is for interactive use only). One can test their data pipelines on fake data with precise schema and simulated data quality deficiencies (eg null values, nan, inf, etc).
For unittesting, here is an example from the offical docs