I hope it wasn't asked before, at least I couldn't find. I'm trying to exclude rows where Key column does not contain 'sd' value. Below is the working example for when it contains. <pre class="prettyprint"><code>values = [("sd123","2"),("kd123","1")] columns = ['Key', 'V1'] df2 = spark.createDataFrame(values, columns) df2.where(F.col('Key').contains('sd')).show() </code></pre> how to do the opposite?

Use <code>~</code> as bitwise NOT: <pre class="prettyprint"><code>df2.where(~F.col('Key').contains('sd')).show() </code></pre>

Pyspark filter dataframe if column does not contain string

Tags:

python

apache-spark

apache-spark-sql

pyspark

I hope it wasn't asked before, at least I couldn't find. I'm trying to exclude rows where Key column does not contain 'sd' value. Below is the working example for when it contains.

values = [("sd123","2"),("kd123","1")] 
columns = ['Key', 'V1']
df2 = spark.createDataFrame(values, columns)

df2.where(F.col('Key').contains('sd')).show()

how to do the opposite?

561

asked Dec 17 '20 08:12

AlienDeg

1 Answers

Use ~ as bitwise NOT:

df2.where(~F.col('Key').contains('sd')).show()

127

answered Nov 15 '22 00:11

mck

Related questions
                            
                                Tensorflow 2.3 and libcublas.so.10
                            
                                Finding local minimum values in pandas
                            
                                How to compare data from the same column in a dataframe (Pandas)
                            
                                how to give tuple via command line in python
                            
                                Order Pandas DataFrame by groups and Timestamp
                            
                                Azure kubernetes - python to read configmap?
                            
                                Airflow - call a operator inside a function
                            
                                How to apply regex over all the rows of a dataset?
                            
                                Using Playwright for Python, how do I select (or find) an element?
                            
                                How should a NamedTemporaryFile be annotated?
                            
                                How to properly insert pandas NaT datetime values to my postgresql table
                            
                                How to override hydra working dir from within a script?
                            
                                Numba data type error: Cannot unify array
                            
                                Get maximum subset in multidimensional array [closed]
                            
                                How do you list local profiles with boto3 from ~/.aws/.credentials and ~/.aws/.config files?
                            
                                How to extract info within a #shadow-root (open) using Selenium Python?
                            
                                Copying a section of a string from one column and putting it into a new pandas column
                            
                                Why is my confusion matrix returning only one number?
                            
                                How to understand creating leaf tensors in PyTorch?
                            
                                Can this code to find the neighborhood of a string be sped up?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With