<pre class="prettyprint"><code>val df = sc.parallelize(Seq((1,"Emailab"), (2,"Phoneab"), (3, "Faxab"),(4,"Mail"),(5,"Other"),(6,"MSL12"),(7,"MSL"),(8,"HCP"),(9,"HCP12"))).toDF("c1","c2") +---+-------+ | c1| c2| +---+-------+ | 1|Emailab| | 2|Phoneab| | 3| Faxab| | 4| Mail| | 5| Other| | 6| MSL12| | 7| MSL| | 8| HCP| | 9| HCP12| +---+-------+ </code></pre> I want to filter out records which have first 3 characters of column 'c2' either 'MSL' or 'HCP'. So the output should be like below. <pre class="prettyprint"><code>+---+-------+ | c1| c2| +---+-------+ | 1|Emailab| | 2|Phoneab| | 3| Faxab| | 4| Mail| | 5| Other| +---+-------+ </code></pre> Can any one please help on this? I knew that <code>df.filter($"c2".rlike("MSL"))</code> -- This is for selecting the records but how to exclude the records. ? Version: Spark 1.6.2 Scala : 2.10

This works too. Concise and very similar to SQL. <pre class="prettyprint"><code>df.filter("c2 not like 'MSL%' and c2 not like 'HCP%'").show +---+-------+ | c1| c2| +---+-------+ | 1|Emailab| | 2|Phoneab| | 3| Faxab| | 4| Mail| | 5| Other| +---+-------+ </code></pre>

Spark dataframe filter

Tags:

scala

apache-spark

apache-spark-sql

val df = sc.parallelize(Seq((1,"Emailab"), (2,"Phoneab"), (3, "Faxab"),(4,"Mail"),(5,"Other"),(6,"MSL12"),(7,"MSL"),(8,"HCP"),(9,"HCP12"))).toDF("c1","c2")

+---+-------+
| c1|     c2|
+---+-------+
|  1|Emailab|
|  2|Phoneab|
|  3|  Faxab|
|  4|   Mail|
|  5|  Other|
|  6|  MSL12|
|  7|    MSL|
|  8|    HCP|
|  9|  HCP12|
+---+-------+

I want to filter out records which have first 3 characters of column 'c2' either 'MSL' or 'HCP'.

So the output should be like below.

+---+-------+
| c1|     c2|
+---+-------+
|  1|Emailab|
|  2|Phoneab|
|  3|  Faxab|
|  4|   Mail|
|  5|  Other|
+---+-------+

Can any one please help on this?

I knew that df.filter($"c2".rlike("MSL")) -- This is for selecting the records but how to exclude the records. ?

Version: Spark 1.6.2 Scala : 2.10

868

asked Mar 22 '17 12:03

Ramesh

1 Answers

This works too. Concise and very similar to SQL.

df.filter("c2 not like 'MSL%' and c2 not like 'HCP%'").show
+---+-------+
| c1|     c2|
+---+-------+
|  1|Emailab|
|  2|Phoneab|
|  3|  Faxab|
|  4|   Mail|
|  5|  Other|
+---+-------+

168

answered Oct 01 '22 09:10

Jegan

Related questions
                            
                                Spark Dataframes UPSERT to Postgres Table
                            
                                Generic wildcards in variable declarations in Scala
                            
                                How can I add unmanaged JARs in sbt-assembly to the final fat JAR?
                            
                                How to make a jar file from scala
                            
                                Nested iteration in Scala
                            
                                Return type in If expression
                            
                                Loaner Pattern in Scala
                            
                                spark sql window function lag
                            
                                Is nested function efficient?
                            
                                How does Scala Slick translate Scala code into JDBC?
                            
                                Create DataFrame with null value for few column
                            
                                Implicit conversion to Runnable?
                            
                                Simplest way to sort list of objects
                            
                                Extract values from Array into Tuple
                            
                                How to update SBT version using homebrew?
                            
                                Can Spray.io routes be split into multiple "Controllers"?
                            
                                What's the purpose of Function.const?
                            
                                How can I access the last result in Scala REPL?
                            
                                How to wait for Akka actor system to terminate?
                            
                                Scala Compiliation error with intellij

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With