When i write a following pyspark command: <pre class="prettyprint"><code># comment 1 df = df.withColumn('explosion', explode(col('col1'))).filter(col('explosion')['sub_col1'] == 'some_string') \ # comment 2 .withColumn('sub_col2', from_unixtime(col('explosion')['sub_col2'])) \ # comment 3 .withColumn('sub_col3', from_unixtime(col('explosion')['sub_col3'])) </code></pre> I get the following error: <pre class="prettyprint"><code>.withColumn('sub_col2', from_unixtime(col('explosion')['sub_col2'])) ^ IndentationError: unexpected indent </code></pre> Is there a way to write comments in between the lines of multiple-line commands in pyspark?

This is not a <code>pyspark</code> issue, but rather a violation of python syntax. Consider the following example: <pre class="prettyprint"><code>a, b, c = range(3) a +\ # add b b +\ # add c c </code></pre> This results in: <pre class="prettyprint"><code> a +# add b ^ SyntaxError: invalid syntax </code></pre> The <code>\</code> is a continuation character and python interprets anything on the next line as occurring immediately after, causing your error. One way around this is to use parentheses instead: <pre class="prettyprint"><code>(a + # add b b + # add c c) </code></pre> When assigning to a variable this would look like <pre class="prettyprint"><code># do a sum of 3 numbers addition = (a + # add b b + # add c c) </code></pre> Or in your case: <pre class="prettyprint"><code># comment 1 df = (df.withColumn('explosion', explode(col('col1'))) .filter(col('explosion')['sub_col1'] == 'some_string') # comment 2 .withColumn('sub_col2', from_unixtime(col('explosion')['sub_col2'])) # comment 3 .withColumn('sub_col3', from_unixtime(col('explosion')['sub_col3']))) </code></pre>

Put comments in between multi-line statement (with line continuation)

Tags:

python

comments

pyspark

When i write a following pyspark command:

# comment 1
df = df.withColumn('explosion', explode(col('col1'))).filter(col('explosion')['sub_col1'] == 'some_string') \
    # comment 2
    .withColumn('sub_col2', from_unixtime(col('explosion')['sub_col2'])) \
    # comment 3
    .withColumn('sub_col3', from_unixtime(col('explosion')['sub_col3']))

I get the following error:

.withColumn('sub_col2', from_unixtime(col('explosion')['sub_col2']))
^
IndentationError: unexpected indent

Is there a way to write comments in between the lines of multiple-line commands in pyspark?

865

asked Aug 23 '18 14:08

ira

1 Answers

This is not a pyspark issue, but rather a violation of python syntax.

Consider the following example:

a, b, c = range(3)
a +\
# add b
b +\
# add c
c

This results in:

    a +# add b
              ^
SyntaxError: invalid syntax

The \ is a continuation character and python interprets anything on the next line as occurring immediately after, causing your error.

One way around this is to use parentheses instead:

(a +
# add b
b +
# add c
c)

When assigning to a variable this would look like

# do a sum of 3 numbers
addition = (a +
            # add b
            b +
            # add c
            c)

Or in your case:

# comment 1
df = (df.withColumn('explosion', explode(col('col1')))
    .filter(col('explosion')['sub_col1'] == 'some_string')
    # comment 2
    .withColumn('sub_col2', from_unixtime(col('explosion')['sub_col2']))
    # comment 3
    .withColumn('sub_col3', from_unixtime(col('explosion')['sub_col3'])))

answered Oct 05 '22 17:10

pault

Related questions
                            
                                How to change protocol to https on wagtail sitemaps?
                            
                                Append rows to groups in pandas
                            
                                Using next() on generator function
                            
                                assert self._state in (CLOSE, TERMINATE) when using python multiprocess
                            
                                Beautiful Soup Find Tags based on partial attribute value
                            
                                `shutil.rmtree` does not work on `tempfile.TemporaryDirectory()`
                            
                                Replace the year in pandas.datetime column
                            
                                Serialize model fields into nested object/dict
                            
                                How to calculate number of years between two dates in different pandas columns
                            
                                Pandas "read_csv" Function Returns NAN for All Blocks in My Table
                            
                                PEP8 Does Not Allow Try Except Block [duplicate]
                            
                                How to ensure tensorflow is using the GPU
                            
                                tf.keras.models.save_model and optimizer warning
                            
                                Django Rest Framework override viewset list() method without loosing filter_backends functionality
                            
                                How do you understand the ioloop in tornado?
                            
                                Python pretty print nested objects
                            
                                how to put column name into data frame cell with specific conditions in pandas
                            
                                How to use different data augmentation for Subsets in PyTorch
                            
                                Keras：load_model ValueError: axes don't match array
                            
                                Convenient way to deal with ValueError: cannot reindex from a duplicate axis

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With