How to do pandas equivalent of pd.concat([df1,df2],axis='columns') using Pyspark dataframes? I googled and couldn't find a good solution.
DF1
var1
3
4
5
DF2
var2 var3
23 31
44 45
52 53
Expected output dataframe
var1 var2 var3
3 23 31
4 44 45
5 52 53
Edited to include expected output
Equivalent of accepted answer using pyspark
would be
from pyspark.sql.types import StructType
spark = SparkSession.builder().master("local").getOrCreate()
df1 = spark.sparkContext.parallelize([(1, "a"),(2, "b"),(3, "c")]).toDF(["id", "name"])
df2 = spark.sparkContext.parallelize([(7, "x"),(8, "y"),(9, "z")]).toDF(["age", "address"])
schema = StructType(df1.schema.fields + df2.schema.fields)
df1df2 = df1.rdd.zip(df2.rdd).map(lambda x: x[0]+x[1])
spark.createDataFrame(df1df2, schema).show()
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With