example_strings = ["10.43 KB", "11 GB", "343.1 MB"]
I want to convert all this strings into bytes. So far I came up with this:
def parseSize(size):
if size.endswith(" B"):
size = int(size.rstrip(" B"))
elif size.endswith(" KB"):
size = float(size.rstrip(" KB")) * 1000
elif size.endswith(" MB"):
size = float(size.rstrip(" MB")) * 1000000
elif size.endswith(" GB"):
size = float(size.rstrip(" GB")) * 10000000000
elif size.endswith(" TB"):
size = float(size.rstrip(" TB")) * 10000000000000
return int(size)
But I don't like it and also I don't think it works. I could find only modules that do the opposite thing.
To answer the OPs question, there does seem to be a module for this, humanfriendly:
pip install humanfriendly
then,
>>> import humanfriendly
>>> user_input = raw_input("Enter a readable file size: ")
Enter a readable file size: 16G
>>> num_bytes = humanfriendly.parse_size(user_input)
>>> print num_bytes
16000000000
>>> print "You entered:", humanfriendly.format_size(num_bytes)
You entered: 16 GB
>>> print "You entered:", humanfriendly.format_size(num_bytes, binary=True)
You entered: 14.9 GiB
I liked Denziloe's answer compared to everything else that came up in google, but it
kb
was 1000 instead of 1024, etc. (Kudos to mlissner for trying to point that out years ago. Maybe our assumptions are too old school, but I don't see most software catching up to the new assumptions either.)So I tweaked it into this:
import re
# based on https://stackoverflow.com/a/42865957/2002471
units = {"B": 1, "KB": 2**10, "MB": 2**20, "GB": 2**30, "TB": 2**40}
def parse_size(size):
size = size.upper()
#print("parsing size ", size)
if not re.match(r' ', size):
size = re.sub(r'([KMGT]?B)', r' \1', size)
number, unit = [string.strip() for string in size.split()]
return int(float(number)*units[unit])
example_strings = ["1024b", "10.43 KB", "11 GB", "343.1 MB", "10.43KB", "11GB", "343.1MB", "10.43 kb", "11 gb", "343.1 mb", "10.43kb", "11gb", "343.1mb"]
for example_string in example_strings:
print(example_string, parse_size(example_string))
which we can verify by checking the output:
$ python humansize.py
('1024b', 1024)
('10.43 KB', 10680)
('11 GB', 11811160064)
('343.1 MB', 359766425)
('10.43KB', 10680)
('11GB', 11811160064)
('343.1MB', 359766425)
('10.43 kb', 10680)
('11 gb', 11811160064)
('343.1 mb', 359766425)
('10.43kb', 10680)
('11gb', 11811160064)
('343.1mb', 359766425)
Here's a slightly prettier version. There's probably no module for this, just define the function inline. It's very small and readable.
units = {"B": 1, "KB": 10**3, "MB": 10**6, "GB": 10**9, "TB": 10**12}
# Alternative unit definitions, notably used by Windows:
# units = {"B": 1, "KB": 2**10, "MB": 2**20, "GB": 2**30, "TB": 2**40}
def parse_size(size):
number, unit = [string.strip() for string in size.split()]
return int(float(number)*units[unit])
example_strings = ["10.43 KB", "11 GB", "343.1 MB"]
for example_string in example_strings:
print(parse_size(example_string))
10680
11811160064
359766426
(Note that different places use slightly different conventions for the definitions of KB, MB, etc -- either using powers of 10**3 = 1000
or powers of 2**10 = 1024
. If your context is Windows, you will want to use the latter. If your context is Mac OS, you will want to use the former.)
Based on chicks answer, only use regular expression to parse the size and accept the size in integer.
UNITS = {None: 1, "B": 1, "KB": 2 ** 10, "MB": 2 ** 20, "GB": 2 ** 30, "TB": 2 ** 40}
def parse_human_size(size):
"""
>>> examples = [12345, "123214", "1024b", "10.43 KB", "11 GB", "343.1 MB", "10.43KB", "11GB", "343.1MB", "10.43 kb"]
>>> for s in examples:
print('[', s, ']', parse_human_size(s))
"""
if isinstance(size, int):
return size
m = re.match(r'^(\d+(?:\.\d+)?)\s*([KMGT]?B)?$', size.upper())
if m:
number, unit = m.groups()
return int(float(number) * UNITS[unit])
raise ValueError("Invalid human size")
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With