Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Fastest way to search string in javascript

Tags:

javascript

I have a hidden field on my page that stores space separated list of emails. I can have maximum 500 emails in that field.

What will be the fastest way to search if a given email already exists in that list? I need to search multiple emails in a loop

  1. use RegEx to find a match

  2. use indexOf()

  3. convert the list to a javascript dictionary and then search

If this is an exact duplicate, please let me know the other question. Thanks

EDIT: Thanks everyone for your valuable comments and answers. Basically my user has a list of emails(0-500) in db. User is presented with his own contact list. User can then choose one\more emails from his contact list to add to the list. I want to ensure at client side that he is not adding duplicate emails. Whole operation is driven by ajax, so jsvascript is required.

like image 793
kheya Avatar asked Jun 12 '11 11:06

kheya


People also ask

Is regex fast JavaScript?

A regular expression (also called regex for short) is a fast way to work with strings of text. By formulating a regular expression with a special syntax, you can: search for text in a string.

How do you search a string for a pattern in JavaScript?

Using test() The test() method is a RegExp expression method. It searches a string for a pattern, and returns true or false, depending on the result.

How do I search for a word in JavaScript?

JavaScript Code: function search_word(text, word){ var x = 0, y=0; for (i=0;i< text. length;i++) { if(text[i] == word[0]) { for(j=i;j< i+word. length;j++) { if(text[j]==word[j-i]) { y++; } if (y==word. length){ x++; } } y=0; } } return "'"+word+"' was found "+x+" times.

Which method is used to search for a substring in JavaScript?

JavaScript String indexOf() Method.


3 Answers

The answer is: It depends.

  • It depends on what you actually want to measure.
  • It depends on the relationship between how many you're searching for vs. how many you're searching.
  • It depends on the JavaScript implementation. Different implementations usually have radically different performance characteristics. This is one of the many reasons why the rule "Don't optimize prematurely" applies especially to cross-implementation JavaScript.

...but provided you're looking for a lot fewer than you have in total, it's probably String#indexOf unless you can create the dictionary once and reuse it (not just this one loop of looking for X entries, but every loop looking for X entries, which I tend to doubt is your use-case), in which case that's hands-down faster to build the 500-key dictionary and use that.

I put together a test case on jsperf comparing the results of looking for five strings buried in a string containing 500 space-delimited, unique entries. Note that that jsperf page compares some apples and oranges (cases where we can ignore setup and what kind of setup we're ignoring), but jsperf was being a pain about splitting it and I decided to leave that as an exercise for the reader.

In my tests of what I actually think you're doing, Chrome, Firefox, IE6, IE7 and IE9 did String#indexOf fastest. Opera did RegExp alternation fastest. (Note that IE6 and IE7 don't have Array#indexOf; the others do.) If you can ignore dictionary setup time, then using a dictionary is the hands-down winner.

Here's the prep code:

// ==== Main Setup
var toFind = ["aaaaa100@zzzzz", "aaaaa200@zzzzz", "aaaaa300@zzzzz", "aaaaa400@zzzzz", "aaaaa500@zzzzz"];
var theString = (function() {
 var m, n;

 m = [];
 for (n = 1; n <= 500; ++n) {
  m.push("aaaaa" + n + "@zzzzz");
 }
 return m.join(" ");
})();

// ==== String#indexOf (and RegExp) setup for when we can ignore setup
var preppedString = " " + theString + " ";

// ==== RegExp setup for test case ignoring RegExp setup time
var theRegExp = new RegExp(" (?:" + toFind.join("|") + ") ", "g");

// ==== Dictionary setup for test case ignoring Dictionary setup time
var theDictionary = (function() {
 var dict = {};
 var index;
 var values = theString.split(" ");
 for (index = 0; index < values.length; ++index) {
  dict[values[index]] = true;
 }
 return dict;
})();

// ==== Array setup time for test cases where we ignore array setup time
var theArray = theString.split(" ");

The String#indexOf test:

var index;
for (index = 0; index < toFind.length; ++index) {
 if (theString.indexOf(toFind[index]) < 0) {
  throw "Error";
 }
}

The String#indexOf (ignore setup) test, in which we ignore the (small) overhead of putting spaces at either end of the big string:

var index;
for (index = 0; index < toFind.length; ++index) {
 if (preppedString.indexOf(toFind[index]) < 0) {
  throw "Error";
 }
}

The RegExp alternation test:

// Note: In real life, you'd have to escape the values from toFind
// to make sure they didn't have special regexp chars in them
var regexp = new RegExp(" (?:" + toFind.join("|") + ") ", "g");
var match, counter = 0;
var str = " " + theString + " ";
for (match = regexp.exec(str); match; match = regexp.exec(str)) {
 ++counter;
}
if (counter != 5) {
 throw "Error";
}

The RegExp alternation (ignore setup) test, where we ignore the time it takes to set up the RegExp object and putting spaces at either end of the big string (I don't think this applies to your situation, the addresses you're looking for would be static):

var match, counter = 0;
for (match = theRegExp.exec(preppedString); match; match = theRegExp.exec(preppedString)) {
 ++counter;
}
if (counter != 5) {
 throw "Error";
}

The Dictionary test:

var dict = {};
var index;
var values = theString.split(" ");
for (index = 0; index < values.length; ++index) {
 dict[values[index]] = true;
}
for (index = 0; index < toFind.length; ++index) {
 if (!(toFind[index] in dict)) {
  throw "Error";
 }
}

The Dictionary (ignore setup) test, where we don't worry about the setup time for the dictionary; note that this is different than the RegExp alternation (ignore setup) test because it assumes the overall list is invariant:

var index;
for (index = 0; index < toFind.length; ++index) {
 if (!(toFind[index] in theDictionary)) {
  throw "Error";
 }
}

The Array#indexOf test (note that some very old implementations of JavaScript may not have Array#indexOf):

var values = theString.split(" ");
var index;
for (index = 0; index < toFind.length; ++index) {
 if (values.indexOf(toFind[index]) < 0) {
  throw "Error";
 }
}

The Array#indexOf (ignore setup) test, which like Dictionary (ignore setup) assumes the overall list is invariant:

var index;
for (index = 0; index < toFind.length; ++index) {
 if (theArray.indexOf(toFind[index]) < 0) {
  throw "Error";
 }
}
like image 66
T.J. Crowder Avatar answered Oct 26 '22 15:10

T.J. Crowder


Instead of looking for the fastest solution, you first need to make sure that you’re actually having a correct solution. Because there are four cases an e-mail address can appear and a naive search can fail:

  1. Alone: [email protected]
  2. At the begin: [email protected] ...
  3. At the end: ... [email protected]
  4. In between: ... [email protected] ...

Now let’s analyze each variant:

  1. To allow arbitrary input, you will need to escape the input properly. You can use the following method to do so:

    RegExp.quote = function(str) {
        return str.toString().replace(/(?=[.?*+^$[\]\\(){}-])/g, "\\");
    };
    

    To match all four cases, you can use the following pattern:

    /(?:^|\ )user@example\.com(?![^\ ])/
    

    Thus:

    var inList = new RegExp("(?:^| )" + RegExp.quote(needle) + "(?![^ ])").test(haystack);
    
  2. Using indexOf is a little more complex as you need to check the boundaries manually:

    var pos = haystack.indexOf(needle);
    if (pos != -1 && (pos != 0 && haystack.charAt(pos-1) !== " " || haystack.length < (pos+needle.length) && haystack.charAt(pos+needle.length) !== " ")) {
        pos = -1;
    }
    var inList = pos != -1;
    
  3. This one is rather quite simple:

    var dict = {};
    haystack.match(/[^\ ]+/g).map(function(match) { dict[match] = true; });
    var inList = dict.hasOwnProperty(haystack);
    

Now to test what variant is the fastest, you can do that at jsPerf.

like image 44
Gumbo Avatar answered Oct 26 '22 16:10

Gumbo


indexOf() is most probably the fastest just keep in mind you need to search for two possible cases:

var existingEmails = "email1, email2, ...";
var newEmail = "[email protected]";
var exists = (existingEmails.indexOf(newEmail + " ") >= 0) || (existingEmails.indexOf(" " + newEmail ) > 0);
like image 26
Shadow Wizard Hates Omicron Avatar answered Oct 26 '22 16:10

Shadow Wizard Hates Omicron