Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Interview Question: Finding Next and Previous characters in a given string?

Tags:

algorithm

We have a language X, which has one byte and two byte characters. This language has following characteristics.

  1. Single byte character value will always be less than or equal to 127.
  2. In Two byte character, First byte will always be greater than 127 and second byte value can be anything.

The problem is, We are given an arbitrary length string and pointer pointing to some byte in the string and we have to find out what is the previous character and what is the next character.

One simple approach will be start from beginning of the string, checking the value of the byte and comparing the pointers until we reach the given pointer. But in the worst case that is, if the given pointer pointing to last byte in the given string, we have to loop through all the charaters.

I would like to know is there any better algorithm which will give the result in constant time irrespective of the length of the string?

like image 274
raj Avatar asked Aug 31 '09 14:08

raj


4 Answers

No, constant time is impossible as the worst case is, as Olexiy states, that almost the whole string is top-bit-set bytes and you need to backtrack right to the start to find out which is the first top-bit-set byte in the first two-byte sequence.

Hopefully this pathological case is rare and you can just step back a byte at a time until you meet any low byte, in which case you can be sure that the byte after it is the start of a character. You can then walk forwards again until you meet your original pointer.

like image 141
bobince Avatar answered Nov 09 '22 19:11

bobince


It seems in the worst case, we need to go through the whole string. Just consider chars A = 100 and B = 200, C = 201 and the following strings of length N:

S1 = ABCBCBC...BC

S2 = BBCBCBC...BC

like image 23
Olexiy Avatar answered Nov 09 '22 18:11

Olexiy


Scan backwards until you either hit two consecutive bytes less than 127 or you hit the beginning of the string. You can now count characters up to where you are and on to one after the current character.

like image 1
stonemetal Avatar answered Nov 09 '22 20:11

stonemetal


Let's see... You're already pointing to a character which can be a single byte or double byte. In the latter case, you could be on the first or second byte of this character. So first you need to know if you're at the start of a character or not. To make matters worse, the second byte of a two-bytes character can be any value, thus there's a chance that all bytes are greater than 127! That makes it nasty to find the previous character. But first, determine if you're at the start of a new character or in the middle of one.

  • Determining character start: Go back one byte until you find a byte that doesn't have it's highest bit set. (Or the beginning of the string.) Based on the number of bytes with the highest bit set, you will know if you're at the beginning or not. An odd number of high bits set before the current byte means you have to go backwards one byte for the current character.

  • Determining previous character: Go back two bytes. If the highest bit is set, you found it. If not, go one byte forwards.

  • Determining next character: Check the highest bit of the current byte. If set, go two bytes forwards. Otherwise, just one.

  • Determine number of characters means you go to the start of the string and check the highest bit of every byte. If set, add one to your count and skip one byte. If not set, just add one to your count. Unfortunately, you will have to go through the whole string.

I do assume that you have some way to indicate the start and end of the string. If not, there's no indication of where it starts and stops. (Unless you use a null byte to indicate start/stop in which case you just can't skip bytes because you might skip such a null byte.)

Is there a faster way? Well, if you know start/end locations then you can calculate the number of bytes in this string. The number of characters would be the number of bytes in this string with their highest bits not set. So only cound the number of bytes lower than 128!

like image 1
Wim ten Brink Avatar answered Nov 09 '22 19:11

Wim ten Brink