I know the easiest way is using a regular expression, but I wonder if there are other ways to do this check. Why do I need this? I am writing a Python script that reads text messages (SMS) from a SIM card. In some situations, hex messages arrives and I need to do some processing for them, so I need to check if a received message is hexadecimal. When I send following SMS: <pre class="prettyprint"><code>Hello world! </code></pre> And my script receives <pre class="prettyprint"><code>00480065006C006C006F00200077006F0072006C00640021 </code></pre> But in some situations, I receive normal text messages (not hex). So I need to do a if hex control. I am using Python 2.6.5. UPDATE: The reason of that problem is, (somehow) messages I sent are received as <code>hex</code> while messages sent by operator (info messages and ads.) are received as a normal string. So I decided to make a check and ensure that I have the message in the correct string format. Some extra details: I am using a Huawei 3G modem and PyHumod to read data from the SIM card. Possible best solution to my situation: The best way to handle such strings is using <code>a2b_hex</code> (a.k.a. <code>unhexlify</code>) and <code>utf-16 big endian encoding</code> (as @JonasWielicki mentioned): <pre class="prettyprint"><code>from binascii import unhexlify # unhexlify is another name of a2b_hex mystr = "00480065006C006C006F00200077006F0072006C00640021" unhexlify(mystr).encode("utf-16-be") >> u'Hello world!' </code></pre>

(1) Using int() works nicely for this, and Python does all the checking for you :) <pre class="prettyprint"><code>int('00480065006C006C006F00200077006F0072006C00640021', 16) 6896377547970387516320582441726837832153446723333914657L </code></pre> will work. In case of failure you will receive a <code>ValueError</code> exception. Short example: <pre class="prettyprint"><code>int('af', 16) 175 int('ah', 16) ... ValueError: invalid literal for int() with base 16: 'ah' </code></pre> (2) An alternative would be to traverse the data and make sure all characters fall within the range of <code>0..9</code> and <code>a-f/A-F</code>. <code>string.hexdigits</code> (<code>'0123456789abcdefABCDEF'</code>) is useful for this as it contains both upper and lower case digits. <pre class="prettyprint"><code>import string all(c in string.hexdigits for c in s) </code></pre> will return either <code>True</code> or <code>False</code> based on the validity of your data in string <code>s</code>. Short example: <pre class="prettyprint"><code>s = 'af' all(c in string.hexdigits for c in s) True s = 'ah' all(c in string.hexdigits for c in s) False </code></pre> Notes: As @ScottGriffiths notes correctly in a comment below, the <code>int()</code> approach will work if your string contains <code>0x</code> at the start, while the character-by-character check will fail with this. Also, checking against a set of characters is faster than a string of characters, but it is doubtful this will matter with short SMS strings, unless you process many (many!) of them in sequence in which case you could convert stringhexditigs to a set with <code>set(string.hexdigits)</code>.

You can: <ol> <li>test whether the string contains only hexadecimal digits (0…9,A…F)</li> <li>try to convert the string to integer and see whether it fails.</li> </ol> Here is the code: <pre class="prettyprint"><code>import string def is_hex(s): hex_digits = set(string.hexdigits) # if s is long, then it is faster to check against a set return all(c in hex_digits for c in s) def is_hex(s): try: int(s, 16) return True except ValueError: return False </code></pre>

One more simple and short solution based on transformation of string to set and checking for subset (doesn't check for '0x' prefix): <pre class="prettyprint"><code>import string def is_hex_str(s): return set(s).issubset(string.hexdigits) </code></pre> More information here.

Check if a string is hexadecimal

Tags:

python

hex

I know the easiest way is using a regular expression, but I wonder if there are other ways to do this check.

Why do I need this? I am writing a Python script that reads text messages (SMS) from a SIM card. In some situations, hex messages arrives and I need to do some processing for them, so I need to check if a received message is hexadecimal.

When I send following SMS:

Hello world!

And my script receives

00480065006C006C006F00200077006F0072006C00640021

But in some situations, I receive normal text messages (not hex). So I need to do a if hex control.

I am using Python 2.6.5.

UPDATE:

The reason of that problem is, (somehow) messages I sent are received as hex while messages sent by operator (info messages and ads.) are received as a normal string. So I decided to make a check and ensure that I have the message in the correct string format.

Some extra details: I am using a Huawei 3G modem and PyHumod to read data from the SIM card.

Possible best solution to my situation:

The best way to handle such strings is using a2b_hex (a.k.a. unhexlify) and utf-16 big endian encoding (as @JonasWielicki mentioned):

from binascii import unhexlify  # unhexlify is another name of a2b_hex

mystr = "00480065006C006C006F00200077006F0072006C00640021"
unhexlify(mystr).encode("utf-16-be")
>> u'Hello world!'

210

asked Jul 21 '12 12:07

FallenAngel

4 Answers

(1) Using int() works nicely for this, and Python does all the checking for you :)

int('00480065006C006C006F00200077006F0072006C00640021', 16)
6896377547970387516320582441726837832153446723333914657L

will work. In case of failure you will receive a ValueError exception.

Short example:

int('af', 16)
175

int('ah', 16)
 ...
ValueError: invalid literal for int() with base 16: 'ah'

(2) An alternative would be to traverse the data and make sure all characters fall within the range of 0..9 and a-f/A-F. string.hexdigits ('0123456789abcdefABCDEF') is useful for this as it contains both upper and lower case digits.

import string
all(c in string.hexdigits for c in s)

will return either True or False based on the validity of your data in string s.

Short example:

s = 'af'
all(c in string.hexdigits for c in s)
True

s = 'ah'
all(c in string.hexdigits for c in s)
False

Notes:

As @ScottGriffiths notes correctly in a comment below, the int() approach will work if your string contains 0x at the start, while the character-by-character check will fail with this. Also, checking against a set of characters is faster than a string of characters, but it is doubtful this will matter with short SMS strings, unless you process many (many!) of them in sequence in which case you could convert stringhexditigs to a set with set(string.hexdigits).

180

answered Oct 27 '22 16:10

Levon

You can:

test whether the string contains only hexadecimal digits (0…9,A…F)
try to convert the string to integer and see whether it fails.

Here is the code:

import string
def is_hex(s):
     hex_digits = set(string.hexdigits)
     # if s is long, then it is faster to check against a set
     return all(c in hex_digits for c in s)

def is_hex(s):
    try:
        int(s, 16)
        return True
    except ValueError:
        return False

answered Oct 27 '22 16:10

eumiro

I know the op mentioned regular expressions, but I wanted to contribute such a solution for completeness' sake:

def is_hex(s):
    return re.fullmatch(r"^[0-9a-fA-F]$", s or "") is not None

Performance

In order to evaluate the performance of the different solutions proposed here, I used Python's timeit module. The input strings are generated randomly for three different lengths, 10, 100, 1000:

s=''.join(random.choice('0123456789abcdef') for _ in range(10))

Levon's solutions:

# int(s, 16)
  10: 0.257451018987922
 100: 0.40081690801889636
1000: 1.8926858339982573

# all(_ in string.hexdigits for _ in s)
  10:  1.2884491360164247
 100: 10.047717947978526
1000: 94.35805322701344

Other answers are variations of these two. Using a regular expression:

# re.fullmatch(r'^[0-9a-fA-F]$', s or '')
  10: 0.725040541990893
 100: 0.7184272820013575
1000: 0.7190397029917222

Picking the right solution thus depends on the length on the input string and whether exceptions can be handled safely. The regular expression certainly handles large strings much faster (and won't throw a ValueError on overflow), but int() is the winner for shorter strings.

answered Oct 27 '22 15:10

Jens

One more simple and short solution based on transformation of string to set and checking for subset (doesn't check for '0x' prefix):

import string
def is_hex_str(s):
    return set(s).issubset(string.hexdigits)

More information here.

answered Oct 27 '22 17:10

Roman

Related questions
                            
                                Matplotlib log scale tick label number formatting
                            
                                convert csv file to list of dictionaries
                            
                                Executing "SELECT ... WHERE ... IN ..." using MySQLdb
                            
                                When to use which fuzz function to compare 2 strings
                            
                                Get a list from a set in python
                            
                                Find the min/max excluding zeros in a numpy array (or a tuple) in python
                            
                                IPython Notebook output cell is truncating contents of my list
                            
                                How to transform negative elements to zero without a loop?
                            
                                Dictionary to lowercase in Python
                            
                                How to uninstall pip on OSX?
                            
                                How to get multiline input from user [duplicate]
                            
                                Panda's Write CSV - Append vs. Write
                            
                                How to read a config file using python
                            
                                IPython Notebook ipywidgets does not show
                            
                                .doc to pdf using python
                            
                                Python: Unable to Render Tex in Matplotlib
                            
                                Changing hostname in a url
                            
                                Using a variable while calling logger.setLevel
                            
                                How to adjust the quality of a resized image in Python Imaging Library?
                            
                                Upgrade python without breaking yum

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With