Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How do you convert a dataframe to a great_expectations dataset?

I have a pandas or pyspark dataframe df where I want to run an expectation against. I already have my dataframe in memory. How can I convert my dataframe to a great_expectations dataset?

so that i can do for example:

df.expect_column_to_exist("my_column")
like image 237
Vincent Claes Avatar asked Sep 02 '25 07:09

Vincent Claes


1 Answers

import great_expectations as ge

for pandas:

df_ge = ge.from_pandas(df)

or

df_ge = ge.dataset.PandasDataset(df)

for pyspark:

df_ge = ge.dataset.SparkDFDataset(df)

now you can run your expectation

df_ge.expect_column_to_exist("my_column")

Note that the great_expectations SparkDFDataset does not inherit the functions from the pyspark DataFrame. You can access the original pyspark DataFrame by df_ge.spark_df

like image 151
Vincent Claes Avatar answered Sep 04 '25 21:09

Vincent Claes



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!