When i write a following pyspark command:
# comment 1
df = df.withColumn('explosion', explode(col('col1'))).filter(col('explosion')['sub_col1'] == 'some_string') \
# comment 2
.withColumn('sub_col2', from_unixtime(col('explosion')['sub_col2'])) \
# comment 3
.withColumn('sub_col3', from_unixtime(col('explosion')['sub_col3']))
I get the following error:
.withColumn('sub_col2', from_unixtime(col('explosion')['sub_col2']))
^
IndentationError: unexpected indent
Is there a way to write comments in between the lines of multiple-line commands in pyspark?
To write multiline comments in Python, prepend a # to each line to block comments. That means writing consecutive single-line comments. Start every line with the # sign consecutively, and you will achieve multiline comments. If you work with Java, C, or C++, you can write multiline codes.
Single-line comments begin with a double hyphen ( - - ) anywhere on a line and extend to the end of the line. Multi-line comments begin with a slash-asterisk ( /* ), end with an asterisk-slash ( */ ), and can span multiple lines.
This is not a pyspark
issue, but rather a violation of python syntax.
Consider the following example:
a, b, c = range(3)
a +\
# add b
b +\
# add c
c
This results in:
a +# add b
^
SyntaxError: invalid syntax
The \
is a continuation character and python interprets anything on the next line as occurring immediately after, causing your error.
One way around this is to use parentheses instead:
(a +
# add b
b +
# add c
c)
When assigning to a variable this would look like
# do a sum of 3 numbers
addition = (a +
# add b
b +
# add c
c)
Or in your case:
# comment 1
df = (df.withColumn('explosion', explode(col('col1')))
.filter(col('explosion')['sub_col1'] == 'some_string')
# comment 2
.withColumn('sub_col2', from_unixtime(col('explosion')['sub_col2']))
# comment 3
.withColumn('sub_col3', from_unixtime(col('explosion')['sub_col3'])))
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With