Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

TypeError: Invalid argument, not a string or column: <function <lambda> at 0x7f1f357c6160> of type <class 'function'>

I'm using the following snippet which creates a list of all .csv files in a directory in Databricks.

csv_dir = '/my_dir/'
csv_paths = list(filter(lambda x: '.csv' in x, os.listdir(csv_dir)))

However it yields the following error

TypeError: Invalid argument, not a string or column: <function <lambda> at 0x7f1f357c6160> of type <class 'function'>. For column literals, use 'lit', 'array', 'struct' or 'create_map' function.

I'm guessing my pure Python code has been mistaken for PySpark code. I tried using %python at the top of the cell and it still yielded the same result.

Yes, I have used PySpark and Python interchangeably in the notebook but I've never faced this issue when using lambda functions.

Is there a workaround to escape this behavior?

Please Advise

like image 556
The Singularity Avatar asked Feb 26 '26 14:02

The Singularity


1 Answers

It's most likely as you guessed that your code uses Pyspark's filter function instead of Python's built-in filter. The best practice to import Pyspark functions is to use an alias, e.g import pyspark.sql.functions as F so that those functions will not be conflicted with the built-ins with the same names.

But, if you already imported from pyspark.sql.functions import *, you can call the built-in filter explicitly using __builtin__.filter

csv_paths = list(__builtin__.filter(lambda x: '.csv' in x, os.listdir(csv_dir)))
like image 75
AdibP Avatar answered Mar 01 '26 02:03

AdibP



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!