Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

do dplyr mutate support runif

I want to generate normally distributed random numbers as a column using mutate. I tried using runif() but it throws error on a large scale data.

extract_grp <- extract_grp %>%
mutate(rand = runif(sdf_nrow(extract_grp)))
glimpse(extract_grp)

The error that am getting is:

Error: org.apache.spark.sql.AnalysisException: Undefined function: 'RUNIF'. This function is neither a registered temporary function nor a permanent function registered in the database 'temp_data'.; line 1 pos 101 at org.apache.spark.sql.catalyst.catalog.SessionCatalog.failFunctionLookup(SessionCatalog.scala:999) at org.apache.spark.sql.hive.HiveSessionCatalog.lookupFunction0(HiveSessionCatalog.scala:202) at org.apache.spark.sql.hive.HiveSessionCatalog.lookupFunction(HiveSessionCatalog.scala:174) at org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveFunctions$$anonfun$apply$13$$anonfun$applyOrElse$6$$anonfun$applyOrElse$39.apply(Analyzer.scala:897)

like image 217
Anil Kumar Avatar asked Jan 30 '26 20:01

Anil Kumar


1 Answers

rand() solved my issue to an extent.

extract_grp <- extract_grp %>%
    mutate(rand = rand())
    glimpse(extract_grp)

I can able to generate random sequence for my hive table. But what am stuck at is to use seeding. set.seed() works for local R but is does perform on sparklyr.

like image 194
Anil Kumar Avatar answered Feb 01 '26 12:02

Anil Kumar



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!