Hello I have created a spark dataframe, and I am trying to remove duplicates:
df.drop_duplicates(subset='id')
I get the following error:
Py4JError: An error occurred while calling z:org.apache.spark.api.python.PythonUtils.toSeq. Trace:
py4j.Py4JException: Method toSeq([class java.lang.String]) does not exist
at py4j.reflection.ReflectionEngine.getMethod(ReflectionEngine.java:335)
at py4j.reflection.ReflectionEngine.getMethod(ReflectionEngine.java:360)
at py4j.Gateway.invoke(Gateway.java:254)
at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:133)
at py4j.commands.CallCommand.execute(CallCommand.java:79)
at py4j.GatewayConnection.run(GatewayConnection.java:209)
at java.lang.Thread.run(Thread.java:745)
Am using osx 10.11.4, spark 1.6.1
I ran a jupyter notebook like this
PYSPARK_DRIVER_PYTHON=jupyter PYSPARK_DRIVER_PYTHON_OPTS='notebook' pyspark
Is there some other configurations that I might have missed out or got wrong?
dropduplicates(): Pyspark dataframe provides dropduplicates() function that is used to drop duplicate occurrences of data inside a dataframe. The function takes Column names as parameters concerning which the duplicate values have to be removed.
Note: In other SQL languages, Union eliminates the duplicates but UnionAll merges two datasets including duplicate records. But, in PySpark both behave the same and recommend using DataFrame duplicate() function to remove duplicate rows.
Duplicate rows could be remove or drop from Spark SQL DataFrame using distinct() and dropDuplicates() functions, distinct() can be used to remove rows that have the same values on all columns whereas dropDuplicates() can be used to remove rows that have the same values on multiple selected columns.
Argument for drop_duplicates
/ dropDuplicates
should be a collection of names, which Java equivalent can be converted to Scala Seq
, not a single string. You can use either a list
:
df.drop_duplicates(subset=['id'])
or a tuple
:
df.drop_duplicates(subset=('id', ))
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With