Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Parse human-readable filesizes into bytes

Tags:

python

example_strings = ["10.43 KB", "11 GB", "343.1 MB"]

I want to convert all this strings into bytes. So far I came up with this:

def parseSize(size):
    if size.endswith(" B"):
        size = int(size.rstrip(" B"))
    elif size.endswith(" KB"):
        size = float(size.rstrip(" KB")) * 1000
    elif size.endswith(" MB"):
        size = float(size.rstrip(" MB")) * 1000000
    elif size.endswith(" GB"):
        size = float(size.rstrip(" GB")) * 10000000000
    elif size.endswith(" TB"):
        size = float(size.rstrip(" TB")) * 10000000000000
    return int(size)

But I don't like it and also I don't think it works. I could find only modules that do the opposite thing.

like image 234
Hyperion Avatar asked Mar 17 '17 19:03

Hyperion


4 Answers

To answer the OPs question, there does seem to be a module for this, humanfriendly:

pip install humanfriendly

then,

>>> import humanfriendly
>>> user_input = raw_input("Enter a readable file size: ")
Enter a readable file size: 16G
>>> num_bytes = humanfriendly.parse_size(user_input)
>>> print num_bytes
16000000000
>>> print "You entered:", humanfriendly.format_size(num_bytes)
You entered: 16 GB
>>> print "You entered:", humanfriendly.format_size(num_bytes, binary=True)
You entered: 14.9 GiB
like image 184
shellcat_zero Avatar answered Nov 18 '22 01:11

shellcat_zero


I liked Denziloe's answer compared to everything else that came up in google, but it

  • required spaces between the number and units
  • didn't handle lower case units
  • assumed a kb was 1000 instead of 1024, etc. (Kudos to mlissner for trying to point that out years ago. Maybe our assumptions are too old school, but I don't see most software catching up to the new assumptions either.)

So I tweaked it into this:

import re

# based on https://stackoverflow.com/a/42865957/2002471
units = {"B": 1, "KB": 2**10, "MB": 2**20, "GB": 2**30, "TB": 2**40}

def parse_size(size):
    size = size.upper()
    #print("parsing size ", size)
    if not re.match(r' ', size):
        size = re.sub(r'([KMGT]?B)', r' \1', size)
    number, unit = [string.strip() for string in size.split()]
    return int(float(number)*units[unit])

example_strings = ["1024b", "10.43 KB", "11 GB", "343.1 MB", "10.43KB", "11GB", "343.1MB", "10.43 kb", "11 gb", "343.1 mb", "10.43kb", "11gb", "343.1mb"]

for example_string in example_strings:
        print(example_string, parse_size(example_string))

which we can verify by checking the output:

$ python humansize.py 
('1024b', 1024)
('10.43 KB', 10680)
('11 GB', 11811160064)
('343.1 MB', 359766425)
('10.43KB', 10680)
('11GB', 11811160064)
('343.1MB', 359766425)
('10.43 kb', 10680)
('11 gb', 11811160064)
('343.1 mb', 359766425)
('10.43kb', 10680)
('11gb', 11811160064)
('343.1mb', 359766425)
like image 40
chicks Avatar answered Nov 18 '22 00:11

chicks


Here's a slightly prettier version. There's probably no module for this, just define the function inline. It's very small and readable.

units = {"B": 1, "KB": 10**3, "MB": 10**6, "GB": 10**9, "TB": 10**12}

# Alternative unit definitions, notably used by Windows:
# units = {"B": 1, "KB": 2**10, "MB": 2**20, "GB": 2**30, "TB": 2**40}

def parse_size(size):
    number, unit = [string.strip() for string in size.split()]
    return int(float(number)*units[unit])


example_strings = ["10.43 KB", "11 GB", "343.1 MB"]

for example_string in example_strings:
    print(parse_size(example_string))

10680
11811160064
359766426

(Note that different places use slightly different conventions for the definitions of KB, MB, etc -- either using powers of 10**3 = 1000 or powers of 2**10 = 1024. If your context is Windows, you will want to use the latter. If your context is Mac OS, you will want to use the former.)

like image 17
Denziloe Avatar answered Nov 18 '22 01:11

Denziloe


Based on chicks answer, only use regular expression to parse the size and accept the size in integer.

UNITS = {None: 1, "B": 1, "KB": 2 ** 10, "MB": 2 ** 20, "GB": 2 ** 30, "TB": 2 ** 40}


def parse_human_size(size):
    """
    >>> examples = [12345, "123214", "1024b", "10.43 KB", "11 GB", "343.1 MB", "10.43KB", "11GB", "343.1MB", "10.43 kb"]
    >>> for s in examples:
        print('[', s, ']', parse_human_size(s))
    """
    if isinstance(size, int):
        return size
    m = re.match(r'^(\d+(?:\.\d+)?)\s*([KMGT]?B)?$', size.upper())
    if m:
        number, unit = m.groups()
        return int(float(number) * UNITS[unit])
    raise ValueError("Invalid human size")
like image 1
Jruv Avatar answered Nov 18 '22 01:11

Jruv