Data frame showing _c0,_c1 instead my original column names in first row. i want to show My column name which is on first row of my CSV. <pre class="prettyprint"><code> dff = spark.read.csv("abfss://dir@acname.dfs.core.windows.net/ diabetes.csv") dff:pyspark.sql.dataframe.DataFrame _c0:string _c1:string _c2:string _c3:string _c4:string _c5:string _c6:string _c7:string _c8:string </code></pre>

Very simple solution is to have a header=True while you read the file: <pre class="prettyprint"><code>dff = spark.read.csv("abfss://dir@acname.dfs.core.windows.net/diabetes.csv", header=True) </code></pre>

Set header as true while loading the CSV file. <pre class="prettyprint"><code>spark.read.format("csv") .option("delimiter", ",") .option("header", "true") .option("inferSchema", "true") .load("file.csv") </code></pre>

How to show my existing column name instead '_c0', '_c1', '_c2', '_c3', '_c4' in first row?

Tags:

apache-spark-sql

pyspark

spark-notebook

azure-databricks

Data frame showing _c0,_c1 instead my original column names in first row.
i want to show My column name which is on first row of my CSV.

    dff = 
    spark.read.csv("abfss://[email protected]/
    diabetes.csv")
    dff:pyspark.sql.dataframe.DataFrame
    _c0:string
    _c1:string
    _c2:string
    _c3:string
    _c4:string
    _c5:string
    _c6:string
    _c7:string
    _c8:string

894

asked Aug 01 '19 12:08

Gaurav Gangwar

2 Answers

Very simple solution is to have a header=True while you read the file:

dff = spark.read.csv("abfss://[email protected]/diabetes.csv", header=True)

108

answered Dec 28 '22 13:12

Kishan Vyas

Set header as true while loading the CSV file.

spark.read.format("csv")
                   .option("delimiter", ",")
                   .option("header", "true")
                   .option("inferSchema", "true")
                   .load("file.csv")

answered Dec 28 '22 14:12

Aman Sehgal

Related questions
                            
                                run pyspark locally
                            
                                Python: How to convert Pyspark column to date type if there are null values
                            
                                Filtering pyspark dataframe if text column includes words in specified list
                            
                                PySpark sampleBy using multiple columns
                            
                                with pyspark.sql.functions unix_timestamp get null
                            
                                PySpark: Handing NULL in Joins
                            
                                Spark DataFrame operators (nunique, multiplication)
                            
                                How can I convert a list of lists in a Dataframe in Pyspark, being each list the values of each attribute?
                            
                                Pyspark Dataframe - Map Strings to Numerics
                            
                                After installing sparknlp, cannot import sparknlp
                            
                                PySpark - Create DataFrame from Numpy Matrix
                            
                                PySpark: how to get the maximum absolute value of a column in a data frame?
                            
                                Trying to install pandas for Pyspark running on Amazon EMR
                            
                                Spark's .count() function is different to the contents of the dataframe when filtering on corrupt record field
                            
                                What does pyspark need psutil for? (faced "UserWarning: Please install psutil to have better support with spilling")?
                            
                                'CrossValidatorModel' object has no attribute 'featureImportances'
                            
                                contains pyspark SQL: TypeError: 'Column' object is not callable
                            
                                How to use Pandas UDFs on macOS Mojave? (that fails due to [__NSPlaceholderDictionary initialize] may have been in progress...)
                            
                                PySpark replace value in several column at once
                            
                                I have an error "java.io.FileNotFoundException: No such file or directory" while trying to create a dynamic frame using a notebook in AWS Glue

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With