index and rindex work great on short strings:
$ perl -e '$a=("x"x((2<<30)-2))."b";print index($a,"b")'
2147483646
$ perl -e '$a=("x"x((2<<30)-2))."b";print rindex($a,"b")'
2147483646
However, on long strings they behave quite differently:
$ perl -e '$a=("x"x((2<<30)-1))."b";print index($a,"b")'
Segmentation fault (core dumped)
$ perl -e '$a=("x"x((2<<30)-1))."b";print rindex($a,"b")'
-1
Is this expected? Is there a nice workaround?
$ perl -V
Summary of my perl5 (revision 5 version 18 subversion 2) configuration:
Platform:
osname=linux, osvers=3.2.0-58-generic, archname=x86_64-linux-gnu-thread-multi
uname='linux brownie 3.2.0-58-generic #88-ubuntu smp tue dec 3 17:37:58 utc 2013 x86_64 x86_64 x86_64 gnulinux '
config_args='-Dusethreads -Duselargefiles -Dccflags=-DDEBIAN -D_FORTIFY_SOURCE=2 -g -O2 -fstack-protector --param=ssp-buffer-size=4 -Wformat -Werror=format-security -Dldflags= -Wl,-Bsymbolic-functions -Wl,-z,relro -Dlddlflags=-shared -Wl,-Bsymbolic-functions -Wl,-z,relro -Dcccdlflags=-fPIC -Darchname=x86_64-linux-gnu -Dprefix=/usr -Dprivlib=/usr/share/perl/5.18 -Darchlib=/usr/lib/perl/5.18 -Dvendorprefix=/usr -Dvendorlib=/usr/share/perl5 -Dvendorarch=/usr/lib/perl5 -Dsiteprefix=/usr/local -Dsitelib=/usr/local/share/perl/5.18.2 -Dsitearch=/usr/local/lib/perl/5.18.2 -Dman1dir=/usr/share/man/man1 -Dman3dir=/usr/share/man/man3 -Dsiteman1dir=/usr/local/man/man1 -Dsiteman3dir=/usr/local/man/man3 -Duse64bitint -Dman1ext=1 -Dman3ext=3perl -Dpager=/usr/bin/sensible-pager -Uafs -Ud_csh -Ud_ualarm -Uusesfio -Uusenm -Ui_libutil -Uversiononly -DDEBUGGING=-g -Doptimize=-O2 -Duseshrplib -Dlibperl=libperl.so.5.18.2 -des'
hint=recommended, useposix=true, d_sigaction=define
useithreads=define, usemultiplicity=define
useperlio=define, d_sfio=undef, uselargefiles=define, usesocks=undef
use64bitint=define, use64bitall=define, uselongdouble=undef
usemymalloc=n, bincompat5005=undef
Compiler:
cc='cc', ccflags ='-D_REENTRANT -D_GNU_SOURCE -DDEBIAN -fstack-protector -fno-strict-aliasing -pipe -I/usr/local/include -D_LARGEFILE_SOURCE -D_FILE_OFFSET_BITS=64',
optimize='-O2 -g',
cppflags='-D_REENTRANT -D_GNU_SOURCE -DDEBIAN -fstack-protector -fno-strict-aliasing -pipe -I/usr/local/include'
ccversion='', gccversion='4.8.2', gccosandvers=''
intsize=4, longsize=8, ptrsize=8, doublesize=8, byteorder=12345678
d_longlong=define, longlongsize=8, d_longdbl=define, longdblsize=16
ivtype='long', ivsize=8, nvtype='double', nvsize=8, Off_t='off_t', lseeksize=8
alignbytes=8, prototype=define
Linker and Libraries:
ld='cc', ldflags =' -fstack-protector -L/usr/local/lib'
libpth=/usr/local/lib /lib/x86_64-linux-gnu /lib/../lib /usr/lib/x86_64-linux-gnu /usr/lib/../lib /lib /usr/lib
libs=-lgdbm -lgdbm_compat -ldb -ldl -lm -lpthread -lc -lcrypt
perllibs=-ldl -lm -lpthread -lc -lcrypt
libc=, so=so, useshrplib=true, libperl=libperl.so.5.18.2
gnulibc_version='2.19'
Dynamic Linking:
dlsrc=dl_dlopen.xs, dlext=so, d_dlsymun=undef, ccdlflags='-Wl,-E'
cccdlflags='-fPIC', lddlflags='-shared -L/usr/local/lib -fstack-protector'
Characteristics of this binary (from libperl):
Compile-time options: HAS_TIMES MULTIPLICITY PERLIO_LAYERS
PERL_DONT_CREATE_GVSV
PERL_HASH_FUNC_ONE_AT_A_TIME_HARD
PERL_IMPLICIT_CONTEXT PERL_MALLOC_WRAP
PERL_PRESERVE_IVUV PERL_SAWAMPERSAND USE_64_BIT_ALL
USE_64_BIT_INT USE_ITHREADS USE_LARGE_FILES
USE_LOCALE USE_LOCALE_COLLATE USE_LOCALE_CTYPE
USE_LOCALE_NUMERIC USE_PERLIO USE_PERL_ATOF
USE_REENTRANT_API
Locally applied patches:
DEBPKG:debian/cpan_definstalldirs - Provide a sensible INSTALLDIRS default for modules installed from CPAN.
DEBPKG:debian/db_file_ver - http://bugs.debian.org/340047 Remove overly restrictive DB_File version check.
DEBPKG:debian/doc_info - Replace generic man(1) instructions with Debian-specific information.
DEBPKG:debian/enc2xs_inc - http://bugs.debian.org/290336 Tweak enc2xs to follow symlinks and ignore missing @INC directories.
DEBPKG:debian/errno_ver - http://bugs.debian.org/343351 Remove Errno version check due to upgrade problems with long-running processes.
DEBPKG:debian/libperl_embed_doc - http://bugs.debian.org/186778 Note that libperl-dev package is required for embedded linking
DEBPKG:fixes/respect_umask - Respect umask during installation
DEBPKG:debian/writable_site_dirs - Set umask approproately for site install directories
DEBPKG:debian/extutils_set_libperl_path - EU:MM: Set location of libperl.a to /usr/lib
DEBPKG:debian/no_packlist_perllocal - Don't install .packlist or perllocal.pod for perl or vendor
DEBPKG:debian/prefix_changes - Fiddle with *PREFIX and variables written to the makefile
DEBPKG:debian/fakeroot - Postpone LD_LIBRARY_PATH evaluation to the binary targets.
DEBPKG:debian/instmodsh_doc - Debian policy doesn't install .packlist files for core or vendor.
DEBPKG:debian/ld_run_path - Remove standard libs from LD_RUN_PATH as per Debian policy.
DEBPKG:debian/libnet_config_path - Set location of libnet.cfg to /etc/perl/Net as /usr may not be writable.
DEBPKG:debian/mod_paths - Tweak @INC ordering for Debian
DEBPKG:debian/module_build_man_extensions - http://bugs.debian.org/479460 Adjust Module::Build manual page extensions for the Debian Perl policy
DEBPKG:debian/prune_libs - http://bugs.debian.org/128355 Prune the list of libraries wanted to what we actually need.
DEBPKG:fixes/net_smtp_docs - [rt.cpan.org #36038] http://bugs.debian.org/100195 Document the Net::SMTP 'Port' option
DEBPKG:debian/perlivp - http://bugs.debian.org/510895 Make perlivp skip include directories in /usr/local
DEBPKG:debian/cpanplus_definstalldirs - http://bugs.debian.org/533707 Configure CPANPLUS to use the site directories by default.
DEBPKG:debian/cpanplus_config_path - Save local versions of CPANPLUS::Config::System into /etc/perl.
DEBPKG:debian/deprecate-with-apt - http://bugs.debian.org/702096 Point users to Debian packages of deprecated core modules
DEBPKG:debian/squelch-locale-warnings - http://bugs.debian.org/508764 Squelch locale warnings in Debian package maintainer scripts
DEBPKG:debian/skip-upstream-git-tests - Skip tests specific to the upstream Git repository
DEBPKG:debian/patchlevel - http://bugs.debian.org/567489 List packaged patches for 5.18.2-2ubuntu1 in patchlevel.h
DEBPKG:debian/skip-kfreebsd-crash - http://bugs.debian.org/628493 [perl #96272] Skip a crashing test case in t/op/threads.t on GNU/kFreeBSD
DEBPKG:fixes/document_makemaker_ccflags - http://bugs.debian.org/628522 [rt.cpan.org #68613] Document that CCFLAGS should include $Config{ccflags}
DEBPKG:debian/find_html2text - http://bugs.debian.org/640479 Configure CPAN::Distribution with correct name of html2text
DEBPKG:debian/hurd_test_skip_stack - http://bugs.debian.org/650175 Disable failing GNU/Hurd tests dist/threads/t/stack.t
DEBPKG:fixes/manpage_name_Test-Harness - http://bugs.debian.org/650451 [rt.cpan.org #73399] cpan/Test-Harness: add NAME headings in modules with POD
DEBPKG:debian/makemaker-pasthru - http://bugs.debian.org/660195 [rt.cpan.org #28632] Make EU::MM pass LD through to recursive Makefile.PL invocations
DEBPKG:debian/perl5db-x-terminal-emulator.patch - http://bugs.debian.org/668490 Invoke x-terminal-emulator rather than xterm in perl5db.pl
DEBPKG:debian/cpan-missing-site-dirs - http://bugs.debian.org/688842 Fix CPAN::FirstTime defaults with nonexisting site dirs if a parent is writable
DEBPKG:fixes/memoize_storable_nstore - [rt.cpan.org #77790] http://bugs.debian.org/587650 Memoize::Storable: respect 'nstore' option not respected
DEBPKG:fixes/net_ftp_failed_command - [rt.cpan.org #37700] http://bugs.debian.org/491062 Net::FTP: cope gracefully with a failed command
DEBPKG:fixes/perlbug-patchlist - [3541c11] http://bugs.debian.org/710842 [perl #118433] Make perlbug look up the list of local patches at run time
DEBPKG:fixes/module_metadata_security_doc - [68cdd4b] CVE-2013-1437 documentation fix
DEBPKG:fixes/module_metadata_taint_fix - [bff978f] http://bugs.debian.org/722210 [rt.cpan.org #88576] untaint version, if needed, in Module::Metadata
DEBPKG:fixes/IPC-SysV-spelling - http://bugs.debian.org/730558 [rt.cpan.org #86736] Fix spelling of IPC_CREAT in IPC-SysV documentation
DEBPKG:fixes/fix-undef-source -
Built under linux
Compiled at Mar 27 2014 18:30:28
%ENV:
PERL_MB_OPT="--install_base "/home/tange/perl5""
PERL_MM_OPT="INSTALL_BASE=/home/tange/perl5"
@INC:
/etc/perl
/usr/local/lib/perl/5.18.2
/usr/local/share/perl/5.18.2
/usr/lib/perl5
/usr/share/perl5
/usr/lib/perl/5.18
/usr/share/perl/5.18
/usr/local/lib/site_perl
.
This bug was fixed and will be in 5.22.0 which should be out in May. It didn't make it into the 5.20.x series. That patch is simple and you should be able to apply it to the 5.20.2 or 5.18.2 code base and recompile.
Regexes have the same problem, and it's also been fixed in 5.21.
The work around is to not have 2 gig strings in memory, that's good practice in general. If it's read from a file, perhaps read
it in blocks and use index on each block.
If you must have a 2 gig string, use substr() to check it in blocks. Unfortunately this means you have to copy the string in pieces. The code below deliberately initializes $substr
outside the loop so Perl doesn't reallocate the memory multiple times.
use v5.18;
use strict;
use warnings;
sub big_index {
my $ref = shift;
my $match = shift;
state $block_size = 2**29;
my $strlen = length($$ref);
my $matchlen = length($match);
# No point in doing extra work if we don't need to.
return index($$ref, $match) if $strlen < $block_size;
my $substr = '';
my $offset = 0;
for(
my $offset = 0;
$offset < $strlen;
$offset += ($block_size - $matchlen - 1)
) {
$substr = substr($$ref, $offset, $block_size);
my $ret = index $substr, $match;
return $ret + $offset if $ret != -1;
}
return -1;
}
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With