Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Is it a bad idea to convert byte arrays to strings then parse with regular expressions? [closed]

Tags:

c#

regex

Here's the scenario: I've been recently tasked to write a rs232 serial device communication interface for our existing application. This application has base classes in place to do the actual communication. Basically all I do is accept a byte array into my class then process it.

Part of the issue is that the byte array delivered can be no more than 1000 bytes at a time yet there could be more data waiting to come in that belongs to that transaction. So I have no idea if what was delivered to me is complete. What I am doing is converting that 1000 byte array into a string and stuffing it into a buffer. This buffer then runs a regex to see if what was added creates a complete transaction. I know it's complete if it matches a particular signature (basically a series of control codes at the beginning and end). This buffer will only append data up to 3 times before giving up if no match is found in case of garbage data coming in and no match is ever possible. This isn't a high data volume device so I don't expect tons of data to come pouring in constantly. And the regular expression is only ever executed on, at most, 3000 characters.

So far it works pretty good, but my question is are regular expressions terrible for this? Are there any ramifications in regards to performance for what I'm using them for? My understanding is that regular expressions are typically bad for large volumes of data but I feel this is quite small.

like image 642
Jake Shakesworth Avatar asked Sep 22 '14 21:09

Jake Shakesworth


People also ask

How do you convert a byte array into a string?

There are two ways to convert byte array to String: By using String class constructor. By using UTF-8 encoding.

Can we convert byte to string in Java?

Given a Byte value in Java, the task is to convert this byte value to string type. One method is to create a string variable and then append the byte value to the string variable with the help of + operator. This will directly convert the byte value to a string and add it in the string variable.

Why you cant parse HTML with regex?

Regular expressions are a tool that is insufficiently sophisticated to understand the constructs employed by HTML. HTML is not a regular language and hence cannot be parsed by regular expressions. Regex queries are not equipped to break down HTML into its meaningful parts.


2 Answers

are regular expressions terrible for this?

On the contrary, regular expressions are great for matching patterns in data sequences.

Are there any ramifications in regards to performance for what I'm using them for?

Regular expressions can be written in really inefficient ways, but that is usually a problem with a particular regular expression, not with regular expressions as a technique.

My understanding is that regular expressions are typically bad for large volumes of data but I feel this is quite small.

There is no universal definition of "large" and "small". Depending on a regex engine, your expression is usually translated into a state machine described by the expression. These machines are really efficient at what they do, in which case the size of the data block can be very considerable. On the other hand, one could write a regex with a lot of backtracking, causing unacceptable performance even on input strings of hundred characters or less.

like image 82
Sergey Kalinichenko Avatar answered Oct 03 '22 08:10

Sergey Kalinichenko


nothing about what you're doing is raising any red flags.

Some things to keep in mind

  • Don't preoccupy yourself with performance. Just design your program first, and optimize for performance afterwards, and do so only if you have a performance problem.

  • Some tasks are unsuitable for regular expressions. Regular expressions can't parse XML very well, and they also can't parse patterns like XnYn Without knowing specifically what you're trying to match for with your regex, I can't really analyze whether it's suitable for your problem. Just be careful that you don't have any odd edge cases.

  • Regex being bad for large amounts of data is not something that I've heard before, and I've been looking around for it online, I'm still not finding much warning against it.

  • Normally, the most simple solution is the best one. If you can think of a more straight forward and simple solution to your problem, then go ahead with that. If not, then don't worry too much.

like image 30
Sam I am says Reinstate Monica Avatar answered Oct 03 '22 07:10

Sam I am says Reinstate Monica