Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Best way/algorithm to find out if a string consists of only a given set of characters

I was asked this in an interview, If you to find out if a string consists of only a given set of characters. For example, let the set of strings be all strings over {0,1,2,3,4,5,6,7,8,9} ie all "numeric" strings. Among this, if the set of strings over {3,8,5} are only valid ones, how do I check if the string consists of only valid characters. Say:

Input 8888338385
     Output VALID
Input 887837348234 
Output : Invalid

The way that I suggested was brute force, that required checking every character in the given string against a list of invalid characters. If any one of the character was invalid, I'd skip checking all other characters and display the failure message. However, as suggested here, there may be better algorithms. Please help.

like image 292
uyetch Avatar asked Feb 02 '12 13:02

uyetch


People also ask

How do you check if a string contains a sequence of characters?

The contains() method checks whether a string contains a sequence of characters. Returns true if the characters exist and false if not.

How do you check if a string contains all the characters of another string?

You can use contains(), indexOf() and lastIndexOf() method to check if one String contains another String in Java or not. If a String contains another String then it's known as a substring. The indexOf() method accepts a String and returns the starting position of the string if it exists, otherwise, it will return -1.

Which method is used to determine number of characters in a string?

String Length() Method in Java: How to find with Example This function is used to get the length of string in Java. The string length method returns the number of characters written in the String.

How do you check if a string has only digits in it?

Check if String Contains Only Numbers using isdigit() method Python String isdigit() method returns “True” if all characters in the string are digits, Otherwise, It returns “False”.


2 Answers

EDIT: Thanks to Luc Touraille for a vast improvement to the original algorithm.

Create an array a[10] of booleans. For each expected digit e, set a[e] = true.

Now for each digit d in your input, check if a[d] is true. If it's not, return false. If they all succeed, return true.

You can generalise this to all ASCII characters with a 256-element array.

If your input string is length N, your comparison string is length M, and the number of letters in your alphabet is A, then the complexity is O(N+M) (to scan your two strings) plus O(A) (to initialise the boolean array). So unless your string length is close to or greater than your alphabet size, this might not be optimal.

It's worth pointing out, with respect to Niklas Baumstark's excellent performance comparison, that our two solutions are actually the same. The boolean array constructed here is identical to the transition table you'd build in a two-state DFA accepting [c1c2...]*. I'd imagine the only difference is that Java's implementation, being much more general, carries a lot more overhead.

like image 157
Nick Barnes Avatar answered Nov 15 '22 22:11

Nick Barnes


DISCLAIMER: Against my assumptions, Java seems to suck at optimizing the regular expression used here, which results in unperformant code. Even Javascript's regular expressions seem to be faster than this. The benchmark also shows that Nick's solution is really fast.

This is definitely a task for a regular expression. In Java:

public boolean isValidString(String str) {
  return str.matches("[358]*");
}

This should be O(n) worst case, and it can't be better than that, because every character has to be looked at.

If performance is critical, you probably want to cache the pre-compiled pattern matcher:

import java.util.regex.Pattern;

public class Matcher {
  private Pattern pattern;

  public Matcher() {
    this.pattern = Pattern.compile("[358]*");
  }

  public isValid(String str) {
    return pattern.matcher(str).matches();
  }
}
like image 24
Niklas B. Avatar answered Nov 15 '22 22:11

Niklas B.