I am trying improve the accuracy of Logistic regression algorithm implemented in Spark using Java. For this I'm trying to replace Null or invalid values present in a column with the most frequent value of that column. For Example:- <pre class="prettyprint"><code>Name|Place a |a1 a |a2 a |a2 |d1 b |a2 c |a2 c | | d |c1 </code></pre> In this case I'll replace all the NULL values in column "Name" with 'a' and in column "Place" with 'a2'. Till now I am able to extract only the most frequent columns in a particular column. Can you please help me with the second step on how to replace the null or invalid values with the most frequent values of that column.

You can use <code>.na.fill</code> function (it is a function in org.apache.spark.sql.DataFrameNaFunctions). Basically the function you need is: <code>def fill(value: String, cols: Seq[String]): DataFrame</code> You can choose the columns, and you choose the value you want to replace the null or NaN. In your case it will be something like: <pre class="prettyprint"><code>val df2 = df.na.fill("a", Seq("Name")) .na.fill("a2", Seq("Place")) </code></pre>

How to replace null values with a specific value in Dataframe using spark in Java?

Tags:

java

apache-spark

I am trying improve the accuracy of Logistic regression algorithm implemented in Spark using Java. For this I'm trying to replace Null or invalid values present in a column with the most frequent value of that column. For Example:-

Name|Place a   |a1 a   |a2 a   |a2     |d1 b   |a2 c   |a2 c   |     | d   |c1

In this case I'll replace all the NULL values in column "Name" with 'a' and in column "Place" with 'a2'. Till now I am able to extract only the most frequent columns in a particular column. Can you please help me with the second step on how to replace the null or invalid values with the most frequent values of that column.

486

asked Jun 21 '17 09:06

PirateJack

1 Answers

You can use .na.fill function (it is a function in org.apache.spark.sql.DataFrameNaFunctions).

Basically the function you need is: def fill(value: String, cols: Seq[String]): DataFrame

You can choose the columns, and you choose the value you want to replace the null or NaN.

In your case it will be something like:

val df2 = df.na.fill("a", Seq("Name"))             .na.fill("a2", Seq("Place"))

answered Sep 25 '22 06:09

Rami

Related questions
                            
                                Handle null value using Guava MapMaker/CacheBuilder
                            
                                Java package in package?
                            
                                Javac is not found
                            
                                Need simple explanation how "lock striping" works with ConcurrentHashMap
                            
                                How to check a timeperiod is overlapping another time period in java
                            
                                How to execute bash command with sudo privileges in Java?
                            
                                Find the nearest/closest value in a sorted List [duplicate]
                            
                                Visual Studio Code - Java Classpath is incomplete. Only syntax errors will be reported
                            
                                Card flip animation in android [closed]
                            
                                How to change Android O / Oreo / api 26 app language
                            
                                Gradle 6.0 deprecation warning for JacocoReport configuration
                            
                                Java: Writing a DOM to an XML file (formatting issues)
                            
                                Populate an enum with values from database
                            
                                How to print out returned message from HttpResponse?
                            
                                Does Java have a "IN" operator or function like SQL? [duplicate]
                            
                                How do I make 2 comparable methods in only one class?
                            
                                How do I increment a java.sql.Timestamp by 14 days?
                            
                                How to create a android.net.Uri from a java.net.URL?
                            
                                Is Void really uninstantiable?
                            
                                what is meaning of instance in programming?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With