Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Get first example element from filtered aggregation pySpark

Table:

+---------+---------+---------+---------+
|id       |path     |error    |message  |
+---------+---------+---------+---------+
|      1  |    a.a.a|   true  |   "aaa" |
+---------+---------+---------+---------+
|      2  |    a.a.a|   true  |   "bbb" |
+---------+---------+---------+---------+
|      2  |    a.a.a|   true  |   "bbc" |
+---------+---------+---------+---------+
|      2  |    a.a.b|   false |   "ccc" |
+---------+---------+---------+---------+

I have pySpark query:

data.groupBy('id', 'path')\
.agg(
    sum(when(col('error') == 'true', 1).otherwise(0)).alias('count'),
).show()

How can I add a new column with first example element that have col('error') == 'true'? I want a table with elements id, path, count, exampleItem.
Function first().alias('exampleItem') works, but returns elements that do not necessarily match the condition above.

like image 506
B_Osipiuk Avatar asked May 12 '26 11:05

B_Osipiuk


1 Answers

What you can do is define a column that only contains the message in the case where the error is true, and null otherwise using the when function. Then, the first function used with ignorenulls set to true will give you what you expect.

d = [(1, "a.a.a", True, "aaa"), (2, "a.a.a", True, "bbb"),
     (2, "a.a.a", True, "bbc"), (2, "a.a.b", False, "ccc")]
data = spark.createDataFrame(d, ['id', 'path', 'error', 'message'])

data\
    .groupBy('id', 'path')\
    .agg(F.sum(F.when(F.col('error') == 'true', 1).otherwise(0)).alias('count'),
         F.first(F.when(F.col('error'), F.col('message')), ignorenulls=True).alias('exampleItem'))\
    .show()
+---+-----+-----+-----------+
| id| path|count|exampleItem|
+---+-----+-----+-----------+
|  2|a.a.a|    2|        bbb|
|  1|a.a.a|    1|        aaa|
|  2|a.a.b|    0|       null|
+---+-----+-----+-----------+

For the last row, the null value is due to the fact that no message meet the desired requirement.

like image 161
Oli Avatar answered May 15 '26 03:05

Oli