I am running Postgres 9.1.3 32-bit on Windows 7 x64. (Have to use 32 bit because there is no Windows PostGIS release compatible with 64 bit Postgres.) (EDIT: As of PostGIS 2.0, it is compatible with Postgres 64 bit on windows.)
I have a query that left joins a table (consistent.master
) with a temporary table, then inserts the resulting data into a third table (consistent.masternew
).
Since this is a left join
, the resulting table should have the same number of rows as the left table in the query. However, if I run this:
SELECT count(*)
FROM consistent.master
I get 2085343
. But if I run this:
SELECT count(*)
FROM consistent.masternew
I get 2085703
.
How can masternew
have more rows than master
? Shouldn't masternew
have the same number of rows as master
, the left table in the query?
Below is the query. The master
and masternew
tables should be identically-structured.
--temporary table created here
--I am trying to locate where multiple tickets were written on
--a single traffic stop
WITH stops AS (
SELECT citation_id,
rank() OVER (ORDER BY offense_timestamp,
defendant_dl,
offense_street_number,
offense_street_name) AS stop
FROM consistent.master
WHERE citing_jurisdiction=1
)
--Here's the insert statement. Below you'll see it's
--pulling data from a select query
INSERT INTO consistent.masternew (arrest_id,
citation_id,
defendant_dl,
defendant_dl_state,
defendant_zip,
defendant_race,
defendant_sex,
defendant_dob,
vehicle_licenseplate,
vehicle_licenseplate_state,
vehicle_registration_expiration_date,
vehicle_year,
vehicle_make,
vehicle_model,
vehicle_color,
offense_timestamp,
offense_street_number,
offense_street_name,
offense_crossstreet_number,
offense_crossstreet_name,
offense_county,
officer_id,
offense_code,
speed_alleged,
speed_limit,
work_zone,
school_zone,
offense_location,
source,
citing_jurisdiction,
the_geom)
--Here's the select query that the insert statement is using.
SELECT stops.stop,
master.citation_id,
defendant_dl,
defendant_dl_state,
defendant_zip,
defendant_race,
defendant_sex,
defendant_dob,
vehicle_licenseplate,
vehicle_licenseplate_state,
vehicle_registration_expiration_date,
vehicle_year,
vehicle_make,
vehicle_model,
vehicle_color,
offense_timestamp,
offense_street_number,
offense_street_name,
offense_crossstreet_number,
offense_crossstreet_name,
offense_county,
officer_id,
offense_code,
speed_alleged,
speed_limit,
work_zone,
school_zone,
offense_location,
source,
citing_jurisdiction,
the_geom
FROM consistent.master LEFT JOIN stops
ON stops.citation_id = master.citation_id
In case it matters, I have run a VACUUM FULL ANALYZE
and reindexed both tables. (Not sure of exact commands; did it through pgAdmin III.)
A left join does not necessarily have the same number of rows as the number of rows in the left table. Basically, it is like a normal join, except rows of the left table that would not appear in the normal join are also added. So, if you have more than one row in the right table that matches one row in the left table, you can have more rows in your results than the number of rows of the left table.
In order to do what you want to do, you should use a group by, and a count to detect multiples.
select citation_id
from stops join master on stops.citation_id = master.citation_id
group by citation_id
having count(*) > 1
Sometimes you know there are multiples, but don't care. You just want to take the first or top entry.
If so, you can use SELECT DISTINCT ON
:
FROM consistent.master LEFT JOIN (SELECT DISTINCT ON (citation_id) * FROM stops) s
ON s.citation_id = master.citation_id
Where citation_id
is the column that you want to take the first (any) row for each match.
You might want to ensure this is deterministic and use ORDER BY
with some other orderable column:
SELECT DISTINCT ON (citation_id) * FROM stops ORDER BY citation_id, created_at
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With