I've the following database structure:
id idproperty idgbs
1 1 136
2 1 128
3 1 10
4 1 1
5 2 136
6 2 128
7 2 10
8 2 1
9 3 561
10 3 560
11 3 10
12 3 1
13 4 561
14 4 560
15 4 10
16 4 1
17 5 234
18 5 120
19 5 1
20 6 234
21 6 120
22 6 1
Here are the details:
The table refers idproperty with different geographic location. For example:
idgbs
1 refers to United States
10 refers to Alabama with parentid 1 (United States)
128 refers to Alabama Gulf Coast with parentid 10 (Alabama)
136 Dauphin Island with parentid 128 (Alabama Gulf Coast)
So, the structure is:
United States > Alabama > Alabama Gulf Coast > Dauphin Island
I want to delete all entries for idproperty EXCEPT the first with the set of idgbs 136, 128, 10, 1 i.e. leave atleast 1 property in all GBS and delete others.
Also, sometimes it is 4 level of geographic entries, sometimes it is 3 level.
Please share the logic & SQL query to delete all entries except one in every unique GBS.
GBS 1, 10, 128, 136 is one unique, so database should only contain 1 property id with these GBS.
After the query, the table would look like this:
id idproperty idgbs
1 1 136
2 1 128
3 1 10
4 1 1
9 3 561
10 3 560
11 3 10
12 3 1
17 5 234
18 5 120
19 5 1
Rephrasing the question:
I want to keep properties in every root level GBS i.e. there should be only ONE property in Dauphin Island.
Whew... I think I understand what you are after now. I couldn't let this one go ;-)
I had to realize that in the question, you wanted property 2 deleted, because it shared a hierarchy with property 1. Once I realized that, I got the following idea. Basically, we join to an aggregated version of self twice: the first one tells us what our "gbs hierarchy path" is, and the second one matches any previous properties with the same hierarchy. Rows which find that there are no "previous" properties that share their hierarchy are spared, the rest with that hierarchy are deleted. It's possible that this could be further tweaked, but I wanted to share this now. I have tested it with the data you showed, and I got the results you posted.
DELETE
each_row.*
FROM property_gbs AS each_row
JOIN ( SELECT
idproperty,
GROUP_CONCAT(idgbs ORDER BY idgbs DESC SEPARATOR "/") AS idgbs_path
FROM property_gbs
GROUP BY idproperty
) AS mypath
USING(idproperty)
LEFT JOIN ( SELECT
idproperty,
GROUP_CONCAT(idgbs ORDER BY idgbs DESC SEPARATOR "/") AS idgbs_path
FROM property_gbs
GROUP BY idproperty
) AS previous_property
ON mypath.idgbs_path = previous_property.idgbs_path
AND previous_property.idproperty < each_row.idproperty
WHERE previous_property.idproperty
Note that the last line is not a typo, we are just checking if there is a previous property with the same path. If there is, then delete the currently-evaluated row.
Cheers!
The thought here is to associate every row with it's hierarchy, even if it's a row which represents somewhere in the middle of the hierarchy (such as row: {2, 1, 128} in the question). With the first join to the aggregate, each row now "knows" what it's path is (so that row would get "136/128/10/1"). We can then use that value in the second join to find other properties with the same path, but only if they have a LOWER property id. This allows us to check for the existence of a lower-ID property with the same "path", and delete any row which represents a property which does have such a "lower-order path-sibling".
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With