My problem is that I have to find in a list, these which are not empty. When I use the filter function is not null, than I get also every row.
My program code looks like this:
...
val csc = new CassandraSQLContext(sc)
val df = csc.sql("SELECT * FROM test").toDF()
val wrapped = df.select("fahrspur_liste")
wrapped.printSchema
The column fahrspur_liste
contains the wrapped arrays and this column I have to analyze. When I run the code, than I get this structure for my wrapped array and these entries:
root
|-- fahrspur_liste: array (nullable = true)
| |-- element: long (containsNull = true)
+--------------+
|fahrspur_liste|
+--------------+
| []|
| []|
| [56]|
| []|
| [36]|
| []|
| []|
| [34]|
| []|
| []|
| []|
| []|
| []|
| []|
| []|
| [103]|
| []|
| [136]|
| []|
| [77]|
+--------------+
only showing top 20 rows
Now I want to filter these rows, so that I have only the entries [56],[36],[34],[103], ...
How can I write a filter function, that I get only these rows, which contains a number?
You can filter out empty strings in your dataframe like this: df = df [df ['str_field'].str.len () > 0]
To filter out the rows of pandas dataframe that has missing values in Last_Namecolumn, we will first find the index of the column with non null values with pandas notnull() function. It will return a boolean series, where True for not null and False for null values or missing values.
df.filter (condition) : This function returns the new dataframe with the values which satisfies the given condition. df.column_name.isNotNull () : This function is used to filter the rows that are not NULL/None in the dataframe column. Example 1: Filtering PySpark dataframe column with None value
You can use the .str.contains () method to filter down rows in a dataframe using regular expressions (regex). For example, if you wanted to filter to show only records that end in “th” in the Region field, you could write: To learn more about regex, check out this link.
I don't think you need to use a UDF here.
You can just use size
method and filter all those rows with array size = 0
df.filter(""" size(fahrspur_liste) != 0 """)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With