So this is essentially a follow-up question on Finding duplicate records.
We perform data imports from text files everyday and we ended up importing 10163 records spread across 182 files twice. On running the query mentioned above to find duplicates, the total count of records we got is 10174, which is 11 records more than what are contained in the files. I assumed about the posibility of 2 records that are exactly the same and are valid ones being accounted for as well in the query. So I thought it would be best to use a timestamp field and simply find all the records that ran today (and hence ended up adding duplicate rows). I used ORA_ROWSCN using the following query:
select count(*) from my_table
where TRUNC(SCN_TO_TIMESTAMP(ORA_ROWSCN)) = '01-MAR-2012'
;
However, the count is still more i.e. 10168. Now, I am pretty sure that the total lines in the file is 10163 by running the following command in the folder that contains all the files. wc -l *.txt
.
Is it possible to find out which rows are actually inserted twice?
By default, ORA_ROWSCN
is stored at the block level, not at the row level. It is only stored at the row level if the table was originally built with ROWDEPENDENCIES
enabled. Assuming that you can fit many rows of your table in a single block and that you're not using the APPEND
hint to insert the new data above the existing high water mark of the table, you are likely inserting new data into blocks that already have some existing data in them. By default, that is going to change the ORA_ROWSCN
of every row in the block causing your query to count more rows than were actually inserted.
Since ORA_ROWSCN
is only guaranteed to be an upper-bound on the last time there was DML on a row, it would be much more common to determine how many rows were inserted today by adding a CREATE_DATE
column to the table that defaults to SYSDATE
or to rely on SQL%ROWCOUNT
after your INSERT
ran (assuming, of course, that you are using a single INSERT
statement to insert all the rows).
Generally, using the ORA_ROWSCN
and the SCN_TO_TIMESTAMP
function is going to be a problematic way to identify when a row was inserted even if the table is built with ROWDEPENDENCIES
. ORA_ROWSCN
returns an Oracle SCN which is a System Change Number. This is a unique identifier for a particular change (i.e. a transaction). As such, there is no direct link between a SCN and a time-- my database might be generating SCN's a million times more quickly than yours and my SCN 1 may be years different from your SCN 1. The Oracle background process SMON
maintains a table that maps SCN values to approximate timestamps but it only maintains that data for a limited period of time-- otherwise, your database would end up with a multi-billion row table that was just storing SCN to timestamp mappings. If the row was inserted more than, say, a week ago (and the exact limit depends on the database and database version), SCN_TO_TIMESTAMP
won't be able to convert the SCN to a timestamp and will return an error.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With