i am pulling urls from my database with a perl script where i employ fetchrow_array to pull URL from the database which worked fine until i encountered a very long URL georgelog24.blog.iskreni.net/?bid=6744d9dcf85991ed2e4b8a258153a1ab&lid=ff9963b9a798ea335b75b5f7c0c295d1
then it started to give me this error.
DBD::ODBC::st fetchrow_array failed: st_fetch/SQLFetch (long truncated DBI attribute LongTruncOk not set and/or LongReadLen too small) (SQL-HY000) [state was HY000 now 01004]
[Microsoft][ODBC SQL Server Driver]String data, right truncation (SQL-01004) at C:\test\multihashtest2.pl line 44.
I believe this is on the database side as the code i have been using to pull URL has worked before. The database that i am using is MSSQL server 2005.
the URL column in the database uses text type currently, but i have tried changing it to varchar(max) and nvarchar(max) but the error still stands.
After a bit of trial and error i found that the maximum length of the url then i could query successfully with fetchrow_array was 81 characters. And since URLs can span ridiculous lengths sometimes, i cannot put a restriction on URL length.
Can anybody help me understand and suggest a fix for this?
FYI: line 44 is the first line in my code below
while (($myid,$url) = $statement_handle->fetchrow_array()) { # executes as many threads as there are jobs to do
my $thread = threads->create(\&webcrawl); #initiate thread
my $tid = $thread->tid;
print " - Thread $tid started\n"; #obtain thread no. and print
push (@Threads, $thread); #push thread into array for "housekeeping" later on
}
Try with:
#not anymore errors if content is truncated - you don't necessarily want this
$statement_handle->{'LongTruncOk'} = 1;
#nice, hard coded constant for the length of data to be read from Longs
$statement_handle->{'LongReadLen'} = 20000;
while (($myid,$url) = $statement_handle->fetchrow_array()) { # executes as many threads as there are jobs to do
my $thread = threads->create(\&webcrawl); #initiate thread
my $tid = $thread->tid;
print " - Thread $tid started\n"; #obtain thread no. and print
push (@Threads, $thread); #push thread into array for "housekeeping" later on
}
Also, I'd recommend you to try Parallel::ForkManager
for parallelizing jobs - I find it much more intuitive and easy to use than threads
Please look at the DBI attributes LongTruncOk and LongReadlen
You will NEED to either accept truncation or set a max size as text and varchar(max) columns can be massive so if it was left to the DBD it would have no choice but to allocate massive amounts of memory in case the column is the max size of that column.
Important point: you need to set the LongReadLen and/or LongTruncOk attributes on the database handle prior to preparing the statement, as noted here.
Attempting to set it on the prepared statement handle prior to fetching data will have no effect on truncation of the returned data.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With