Add new rows to pyspark Dataframe

Tags:

Am very new pyspark but familiar with pandas. I have a pyspark Dataframe

# instantiate Spark
spark = SparkSession.builder.getOrCreate()

# make some test data
columns = ['id', 'dogs', 'cats']
vals = [
     (1, 2, 0),
     (2, 0, 1)
]

# create DataFrame
df = spark.createDataFrame(vals, columns)

wanted to add new Row (4,5,7) so it will output:

Click to copy

df.show()
+---+----+----+
| id|dogs|cats|
+---+----+----+
|  1|   2|   0|
|  2|   0|   1|
|  4|   5|   7|
+---+----+----+

920

asked Oct 07 '18 05:10

Roushan

2 Answers

As thebluephantom has already said union is the way to go. I'm just answering your question to give you a pyspark example:

Click to copy

# if not already created automatically, instantiate Sparkcontext
spark = SparkSession.builder.getOrCreate()

columns = ['id', 'dogs', 'cats']
vals = [(1, 2, 0), (2, 0, 1)]

df = spark.createDataFrame(vals, columns)

newRow = spark.createDataFrame([(4,5,7)], columns)
appended = df.union(newRow)
appended.show()

Please have also a lookat the databricks FAQ: https://kb.databricks.com/data/append-a-row-to-rdd-or-dataframe.html

answered Oct 01 '22 11:10

cronoik

From something I did, using union, showing a block partial coding - you need to adapt of course to your own situation:

Click to copy

val dummySchema = StructType(
StructField("phrase", StringType, true) :: Nil)
var dfPostsNGrams2 = spark.createDataFrame(sc.emptyRDD[Row], dummySchema)
for (i <- i_grams_Cols) {
    val nameCol = col({i})
    dfPostsNGrams2 = dfPostsNGrams2.union(dfPostsNGrams.select(explode({nameCol}).as("phrase")).toDF )
 }

union of DF with itself is the way to go.

answered Oct 01 '22 11:10

thebluephantom

Related questions
                            
                                Python 2.7 creating a multidimensional list
                            
                                Search in PyCharm interactive console command history
                            
                                How to run celery as a daemon in production?
                            
                                UnicodeEncodeError：'latin-1' codec can't encode characters in position 0-1: ordinal not in range(256)
                            
                                Receiving Import Error: No Module named ***, but has __init__.py
                            
                                django - how to sort objects alphabetically by first letter of name field
                            
                                Python read website data line by line when available
                            
                                Using python multiprocessing Pool in the terminal and in code modules for Django or Flask
                            
                                Python: Selenium with PhantomJS empty page source
                            
                                Change value in ini file using ConfigParser Python
                            
                                Log log plot linear regression
                            
                                Map object is not JSON serializable
                            
                                In python, how do I create a timezone aware datetime from a date and time?
                            
                                threshold in 2D numpy array
                            
                                Python SQLAlchemy: Data source name not found and no default driver specified
                            
                                python matplotlib update scatter plot from a function
                            
                                Plot negative values on a log scale
                            
                                How to map a function with additional parameter using the new Dataset api in TF1.3?
                            
                                Running a python script on Google Cloud Compute Engine
                            
                                How to access SparkContext from SparkSession instance?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Add new rows to pyspark Dataframe

Tags:

python

apache-spark

pyspark

Roushan

People also ask

2 Answers

cronoik

thebluephantom

Recent Activity

Donate For Us