I have a field that contains windows file paths, like so:
\\fs1\foo\bar\snafu.txt
c:\this\is\why\i\drink\snafu.txt
\\fs2\bippity\baz.zip
\\fs3\boppity\boo\baz.zip
c:\users\chris\donut.c
What I need to do is find then number of duplicated files names (regardless of what directory they are in). So I want to find "snafu.txt" and "baz.zip", but not donut.c.
Is there a way in PostgreSQL (8.4) to find the last part of a file path? If I can do that, then I can use count/group to find my problem children.
You can easily strip the path up to the last directory separator with an expression like
regexp_replace(path, '^.+[/\\]', '')
This will match the ocassional forward slashes produced by some software as well. Then you just count the remaining file names like
WITH files AS (
SELECT regexp_replace(my_path, '^.+[/\\]', '') AS filename
FROM my_table
)
SELECT filename, count(*) AS count
FROM files
GROUP BY filename
HAVING count(*) >= 2;
select regexp_replace(path_field, '.+/', '') from files_table;
CREATE OR REPLACE FUNCTION basename(text) RETURNS text
AS $basename$
declare
FILE_PATH alias for $1;
ret text;
begin
ret := regexp_replace(FILE_PATH,'^.+[/\\]', '');
return ret;
end;
$basename$ LANGUAGE plpgsql;
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With