How to read csv without header and name them with names while reading in pyspark?

Tags:

pyspark

100000,20160214,93374987
100000,20160214,1925301
100000,20160216,1896542
100000,20160216,84167419
100000,20160216,77273616
100000,20160507,1303015

I want to read the csv file which has no column names in first row. How to read it and name the columns with my specified names in the same time ? for now, I just renamed the original columns with my specified names like this:

df = spark.read.csv("user_click_seq.csv",header=False)
df = df.withColumnRenamed("_c0", "member_srl")
df = df.withColumnRenamed("_c1", "click_day")
df = df.withColumnRenamed("_c2", "productid")

Any better way ?

961

asked Jun 15 '17 03:06

yanachen

3 Answers

You can import the csv file into a dataframe with a predefined schema. The way you define a schema is by using the StructType and StructField objects. Assuming your data is all IntegerType data:

from pyspark.sql.types import StructType, StructField, IntegerType  schema = StructType([     StructField("member_srl", IntegerType(), True),     StructField("click_day", IntegerType(), True),     StructField("productid", IntegerType(), True)])  df = spark.read.csv("user_click_seq.csv",header=False,schema=schema)

should work.

answered Oct 04 '22 19:10

DavidWayne

For those who would like to do this in scala and may not want to add types:

val df = spark.read.format("csv")
                   .option("header","false")
                   .load("hdfs_filepath")
                   .toDF("var0","var1","var2","var3")

answered Oct 04 '22 19:10

Climbs_lika_Spyder

You can read the data with header=False and then pass the column names with toDF as bellow:

data = spark.read.csv('data.csv', header=False)
data = data.toDF('name1', 'name2', 'name3')

answered Oct 04 '22 20:10

Mohammad Reza Malekpour

Related questions
                            
                                Get the min of two columns
                            
                                Impute entire DataFrame (all columns) using Scikit-learn (sklearn) without iterating over columns
                            
                                Read all but last line of CSV file in pandas
                            
                                Pandas dataframe select rows where a list-column contains any of a list of strings
                            
                                Reshaping an array to data.frame
                            
                                Create DataFrame from multiple Series
                            
                                Multiple condition filter on dataframe
                            
                                Efficiently replace values from a column to another column Pandas DataFrame
                            
                                Remove row with null value from pandas data frame
                            
                                rbind dataframes in a list of lists
                            
                                Unique values of two columns for pandas dataframe [duplicate]
                            
                                Print Visually Pleasing DataFrames in For Loop in Jupyter Notebook Pandas
                            
                                Repeat rows in a pandas DataFrame based on column value
                            
                                What is the right way to multiply data frame by vector?
                            
                                How can I concat multiple dataframes in Python? [duplicate]
                            
                                Pandas Dataframe: Replacing NaN with row average
                            
                                How to transpose a dataframe in tidyverse?
                            
                                Saving and loading data.frames [duplicate]
                            
                                Boxplot with pandas groupby multiindex, for specified sublevels from multiindex
                            
                                Count unique values for every column

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With