In pyspark 1.6.2, I can import <code>col</code> function by <pre class="prettyprint"><code>from pyspark.sql.functions import col </code></pre> but when I try to look it up in the Github source code I find no <code>col</code> function in <code>functions.py</code> file, how can python import a function that doesn't exist?

It exists. It just isn't explicitly defined. Functions exported from <code>pyspark.sql.functions</code> are thin wrappers around JVM code and, with a few exceptions which require special treatment, are generated automatically using helper methods. If you carefully check the source you'll find <code>col</code> listed among other <code>_functions</code>. This dictionary is further iterated and <code>_create_function</code> is used to generate wrappers. Each generated function is directly assigned to a corresponding name in the <code>globals</code>. Finally <code>__all__</code>, which defines a list of items exported from the module, just exports all <code>globals</code> excluding ones contained in the blacklist. If this mechanisms is still not clear you can create a toy example: <ul> <li> Create Python module called <code>foo.py</code> with a following content: <pre class="prettyprint"><code># Creates a function assigned to the name foo globals()["foo"] = lambda x: "foo {0}".format(x) # Exports all entries from globals which start with foo __all__ = [x for x in globals() if x.startswith("foo")] </code></pre> </li> <li>Place it somewhere on the Python path (for example in the working directory).</li> <li> Import <code>foo</code>: <pre class="prettyprint"><code>from foo import foo foo(1) </code></pre> </li> </ul> An undesired side effect of such metaprogramming approach is that defined functions might not be recognized by the tools depending purely on static code analysis. This is not a critical issue and can be safely ignored during development process. Depending on the IDE installing type annotations might resolve the problem (see for example zero323/pyspark-stubs#172).

Cannot find col function in pyspark

Tags:

python

apache-spark

apache-spark-sql

pyspark

pyspark-sql

In pyspark 1.6.2, I can import col function by

Click to copy

from pyspark.sql.functions import col

but when I try to look it up in the Github source code I find no col function in functions.py file, how can python import a function that doesn't exist?

389

asked Oct 20 '16 19:10

Bamqf

1 Answers

It exists. It just isn't explicitly defined. Functions exported from pyspark.sql.functions are thin wrappers around JVM code and, with a few exceptions which require special treatment, are generated automatically using helper methods.

If you carefully check the source you'll find col listed among other _functions. This dictionary is further iterated and _create_function is used to generate wrappers. Each generated function is directly assigned to a corresponding name in the globals.

Finally __all__, which defines a list of items exported from the module, just exports all globals excluding ones contained in the blacklist.

If this mechanisms is still not clear you can create a toy example:

Create Python module called foo.py with a following content:

Click to copy

# Creates a function assigned to the name foo globals()["foo"] = lambda x: "foo {0}".format(x)  # Exports all entries from globals which start with foo __all__ = [x for x in globals() if x.startswith("foo")]

Place it somewhere on the Python path (for example in the working directory).
Import foo:

Click to copy
```
from foo import foo  foo(1) 
```

An undesired side effect of such metaprogramming approach is that defined functions might not be recognized by the tools depending purely on static code analysis. This is not a critical issue and can be safely ignored during development process.

Depending on the IDE installing type annotations might resolve the problem (see for example zero323/pyspark-stubs#172).

115

answered Sep 23 '22 05:09

zero323

Related questions
                            
                                Pandas merge two dataframes with different columns
                            
                                see if two files have the same content in python [duplicate]
                            
                                Impute categorical missing values in scikit-learn
                            
                                Python histogram outline
                            
                                Matplotlib: How to force integer tick labels?
                            
                                Difference between dir(…) and vars(…).keys() in Python?
                            
                                Python urllib2: Reading content body even during HTTPError exception?
                            
                                How to correctly call base class methods (and constructor) from inherited classes in Python? [duplicate]
                            
                                How to iterate over pandas multiindex dataframe using index
                            
                                Python format throws KeyError
                            
                                setup.py and adding file to /bin/
                            
                                How to specify Python 3 source in Cython's setup.py?
                            
                                Python lambda does not accept tuple argument [duplicate]
                            
                                Is there a way to copy only the structure (not the data) of a Pandas DataFrame?
                            
                                Elasticsearch : How to delete an Index using python
                            
                                Random strings in Python 2.6 (Is this OK?)
                            
                                shutil.rmtree fails on Windows with 'Access is denied' [duplicate]
                            
                                Alternative to list comprehension if there will be only one result
                            
                                python format string thousand separator with spaces
                            
                                Execute a function after Flask returns response

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With