Parse human-readable filesizes into bytes

Question

example_strings = ["10.43 KB", "11 GB", "343.1 MB"]

I want to convert all this strings into bytes. So far I came up with this:

def parseSize(size):
    if size.endswith(" B"):
        size = int(size.rstrip(" B"))
    elif size.endswith(" KB"):
        size = float(size.rstrip(" KB")) * 1000
    elif size.endswith(" MB"):
        size = float(size.rstrip(" MB")) * 1000000
    elif size.endswith(" GB"):
        size = float(size.rstrip(" GB")) * 10000000000
    elif size.endswith(" TB"):
        size = float(size.rstrip(" TB")) * 10000000000000
    return int(size)

But I don't like it and also I don't think it works. I could find only modules that do the opposite thing.

shellcat_zero · Accepted Answer

To answer the OPs question, there does seem to be a module for this, humanfriendly:

pip install humanfriendly

then,

>>> import humanfriendly
>>> user_input = raw_input("Enter a readable file size: ")
Enter a readable file size: 16G
>>> num_bytes = humanfriendly.parse_size(user_input)
>>> print num_bytes
16000000000
>>> print "You entered:", humanfriendly.format_size(num_bytes)
You entered: 16 GB
>>> print "You entered:", humanfriendly.format_size(num_bytes, binary=True)
You entered: 14.9 GiB

chicks · Answer

I liked Denziloe's answer compared to everything else that came up in google, but it

required spaces between the number and units
didn't handle lower case units
assumed a kb was 1000 instead of 1024, etc. (Kudos to mlissner for trying to point that out years ago. Maybe our assumptions are too old school, but I don't see most software catching up to the new assumptions either.)

So I tweaked it into this:

import re

# based on https://stackoverflow.com/a/42865957/2002471
units = {"B": 1, "KB": 2**10, "MB": 2**20, "GB": 2**30, "TB": 2**40}

def parse_size(size):
    size = size.upper()
    #print("parsing size ", size)
    if not re.match(r' ', size):
        size = re.sub(r'([KMGT]?B)', r' \1', size)
    number, unit = [string.strip() for string in size.split()]
    return int(float(number)*units[unit])

example_strings = ["1024b", "10.43 KB", "11 GB", "343.1 MB", "10.43KB", "11GB", "343.1MB", "10.43 kb", "11 gb", "343.1 mb", "10.43kb", "11gb", "343.1mb"]

for example_string in example_strings:
        print(example_string, parse_size(example_string))

which we can verify by checking the output:

$ python humansize.py 
('1024b', 1024)
('10.43 KB', 10680)
('11 GB', 11811160064)
('343.1 MB', 359766425)
('10.43KB', 10680)
('11GB', 11811160064)
('343.1MB', 359766425)
('10.43 kb', 10680)
('11 gb', 11811160064)
('343.1 mb', 359766425)
('10.43kb', 10680)
('11gb', 11811160064)
('343.1mb', 359766425)

Denziloe · Answer

Here's a slightly prettier version. There's probably no module for this, just define the function inline. It's very small and readable.

units = {"B": 1, "KB": 10**3, "MB": 10**6, "GB": 10**9, "TB": 10**12}

# Alternative unit definitions, notably used by Windows:
# units = {"B": 1, "KB": 2**10, "MB": 2**20, "GB": 2**30, "TB": 2**40}

def parse_size(size):
    number, unit = [string.strip() for string in size.split()]
    return int(float(number)*units[unit])


example_strings = ["10.43 KB", "11 GB", "343.1 MB"]

for example_string in example_strings:
    print(parse_size(example_string))

10680
11811160064
359766426

(Note that different places use slightly different conventions for the definitions of KB, MB, etc -- either using powers of 10**3 = 1000 or powers of 2**10 = 1024. If your context is Windows, you will want to use the latter. If your context is Mac OS, you will want to use the former.)

Jruv · Answer

Based on chicks answer, only use regular expression to parse the size and accept the size in integer.

UNITS = {None: 1, "B": 1, "KB": 2 ** 10, "MB": 2 ** 20, "GB": 2 ** 30, "TB": 2 ** 40}


def parse_human_size(size):
    """
    >>> examples = [12345, "123214", "1024b", "10.43 KB", "11 GB", "343.1 MB", "10.43KB", "11GB", "343.1MB", "10.43 kb"]
    >>> for s in examples:
        print('[', s, ']', parse_human_size(s))
    """
    if isinstance(size, int):
        return size
    m = re.match(r'^(\d+(?:\.\d+)?)\s*([KMGT]?B)?$', size.upper())
    if m:
        number, unit = m.groups()
        return int(float(number) * UNITS[unit])
    raise ValueError("Invalid human size")

Parse human-readable filesizes into bytes

Tags:

python

Hyperion

4 Answers

shellcat_zero

chicks

Denziloe

Jruv

Recent Activity

Donate For Us

Parse human-readable filesizes into bytes

Tags:

python

Hyperion

4 Answers

shellcat_zero

chicks

Denziloe

Jruv

Related questions

Recent Activity

Donate For Us