I have imported data using comma in float numbers and I am wondering how can I 'convert' comma into dot. I am using pyspark dataframe so I tried this :
commaToDot = udf(lambda x : str(x).replace(',', '.'), FloatType())
myData.withColumn('area',commaToDot(myData.area))
And it definitely does not work. So can we replace directly it in dataframe from spark or should we switch in numpy type or something else ?
Thanks !
Using the replace() function to replace comma with space in list in Python. In Python, we can replace substrings within a string using the replace() function. Using this function, we can replace comma with space in list in Python. It returns a new string with the substituted substring.
DoubleType – A floating-point double value. IntegerType – An integer value. LongType – A long integer value. NullType – A null value. ShortType – A short integer value.
data["column_name"]=data["column_name"]. str. replace(',','. ')
By using PySpark SQL function regexp_replace() you can replace a column value with a string for another string/substring. regexp_replace() uses Java regex for matching, if the regex does not match it returns an empty string, the below example replace the street name Rd value with Road string on address column.
Another way to do it (without using UDFs) is:
myData = myData.withColumn('area', regexp_replace('area', ',', '.').cast('float'))
I think you are missing
from pyspark.sql.types import FloatType
As Pushkr suggested udf with replace will give you back string column if you don't convert result to float
from pyspark import SQLContext
from pyspark.sql.functions import udf
from pyspark.sql.types import FloatType
from pyspark import SparkConf, SparkContext
conf = SparkConf().setAppName("ReadCSV")
sc = SparkContext(conf=conf)
sqlctx = SQLContext(sc)
df = sqlctx.read.option("delimiter", ";").load("test.csv", format="csv")
df.show()
commaToDot = udf(lambda x : float(str(x).replace(',', '.')), FloatType())
df2=df.withColumn('area',commaToDot(df._c0))
df2.printSchema()
df2.show()
I used single column file , tested on spark 2.11/python 3.6
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With