Currently, I am validating the table schema with expect_table_columns_to_match_set by feeding in a list of columns. However, I want to validate the schema associated with each column such as string. The only available Great Expectations rule expect_column_values_to_be_of_type has to be written for each column name and also creates redundancy by repeating the column names.
Is there any rule that I am missing that I can validate both the name and the schema at the same time?
For exmaple, given column a: string, b: int, c: boolean, I want to pass that whole info into one function instead of having to break it into [a,b,c] and validating [a], string` separately for each column.
Ideally, it will be something like expect_column_schmea([(column_name_a, column_type_a), (column_name_b, column_type_b)]
You can use expect_column_values_to_match_json_schema (or regex / pattern - depending on what you are more comfortable with). Here is the list of expectations that are possible to use.
With expect_column_values_to_match_json_schema you can define your schema in a json format:
schema = {
"column_name_a": {"type": "string"},
"column_name_b": {"type": "integer"},
"column_name_c": {"type": "boolean"},
}
Create a new ExpectColumnValuesToMatchSchema instance (import for that was from great_expectations.expectations.core.expect_column_values_to_match_schema import ( ExpectColumnValuesToMatchSchema, )):
expectation = ExpectColumnValuesToMatchSchema(schema=schema)
And finally validate it to get your results: `result = expectation.validate(dataset)!
You will get a ExpectationSuiteValidationResult as a return and can accordingly check whether the columns you provided match / do not match the schema!
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With