My 8 year old niece had a lesson on Morse code in school yesterday, and her assignment was to convert various phrases to Morse code. One of the phrases included her age, and instead of writing <code>---..</code>, she wrote <code>3-2.</code> because (in her words), "it's less writing that way." This rudimentary "compression algorithm" sparked my curiosity, so I wrote a bit of code to implement it. However, we made a few changes along the way. I pointed out to her that if you wrote just <code>.....-----</code>, there isn't any way to tell if the writer meant <code>50</code> or <code>eeeeettttt</code>. In reality, there is a pause in between each letter of each word and every word so this isn't a problem, but our scheme didn't have that. I pulled out some graph paper and suggested padding the Morse code for each symbol with another symbol to facilitate coding and remove ambiguity from the scheme. My nice suggested using <code>+</code> because "no one ever writes those in sentences." (Ouch, I recently graduated with a math degree, but fair enough.) Since some of us do write with <code>+</code>, and we all use hyphens and periods/dots, which would conflict with our standard definition of Morse code, these symbols are replaced with <code>p</code>, <code>h</code>, and <code>d</code>, respectively. Of course, this brings us to the problem of what to do with symbols that aren't defined in our extended Morse code. My niece wanted to simply ignore them, so that's what we did. For the sake of case sensitive preservation of the textual message, upper case letters aren't lower cased in the code; they're just carried through as is and padded with <code>+</code>. <h3>Summary of the algorithm:</h3> <ol> <li>Morse codes are right-padded to 5 characters with <code>+</code> </li> <li>We extended Morse code to substitute <code>p</code> for <code>+</code>, <code>d</code> for <code>.</code>, and <code>h</code> for <code>-</code>.</li> <li>Symbols that aren't defined in our "extended" Morse code are passed through intact.</li> <li>Runs of symbols are replaced with unless only one consecutive character occurs, in which case the number is omitted.</li> </ol> <h3>Potential pitfalls:</h3> <ol> <li>My padding scheme probably reduces the effectiveness of the compression.</li> <li>Compression might be improved by using blocks larger than 5 characters</li> <li>If either my niece or I knew anything about compression algorithms, we could probably use that makes them successful.</li> <li>This obviously isn't suitable for production, but since there are numerous efficient compression algorithms for such purposes, I'm ignoring that problem for the time being.</li> <li>???</li> </ol> <h3>Example:</h3> In our algorithm, "Hello, World" translates to <pre class="prettyprint"><code>H++++.++++.-..+.-..+---++,+++++++++W++++---++.-.++.-..+-..++ </code></pre> and compresses to <pre class="prettyprint"><code>H4+.4+.-2.+.-2.+3-2+,9+W4+3-2+.-.2+.-2.+-2.2+ </code></pre> Here is the Python code I threw together: <pre class="prettyprint"><code>#!/usr/bin/python3 import itertools import string class MorseString(str): def __init__(self, string): # or, pad values during iteration but this seems neater self._char_morse_map = {"a":".-+++", "b":"-...+", "c":"-.-.+", "d":"-..++", \ "e":".++++", "f":"..-.+", "g":"--.++", "h":"....+", \ "i":"..+++", "j":".---+", "k":"-.-++", "l":".-..+", \ "m":"--+++", "n":"-.+++", "o":"---++", "p":".--.+", \ "q":"--.-+", "r":".-.++", "s":"...++", "t":"-++++", \ "u":"..-++", "v":"...-+", "w":".--++", "x":"-..-+", \ "y":"-.--+", "z":"--..+", "1":".----", "2":"..---", \ "3":"...--", "4":"....-", "5":".....", "6":"-....", \ "7":"--...", "8":"---..", "9":"----.", "0":"-----", " ":"+++++", ".":"d++++", "+":"p++++", "-":"h++++"} self._morse_char_map = dict() for letter, code in self._char_morse_map.items(): self._morse_char_map[code] = letter self._string = string # convert the string to "Morse code". Could also use c.lower() self._morse_string = "".join([self._char_morse_map.get(c, c.ljust(5, "+")) for c in self._string]) def compress(self): def grouper(n, k): return str(n) + k if n > 1 else k # could just use lambda return "".join([grouper(len(list(g)), k) for k, g in itertools.groupby(self._morse_string)]) def decompress(self): i = 0 start = 0 chars = list() sc = self.compress() while i < len(sc): curr = sc[i] i += 1 if not(curr in string.digits): num = 1 if start + 1 == i else int(sc[start:i-1]) chars.append("".join(curr * num)) start = i code = "".join(chars) chars = list() for i in range(0, len(code), 5): piece = "".join(code[i:i+5]) chars.append(self._morse_char_map.get(piece, piece[0])) return "".join(chars) def main(): s0 = "Hello, World" ms0 = MorseString(s0) print(ms0._morse_string) print(ms0.compress()) assert(s0 == ms0.decompress()) s1 = "Hello 2 world." ms1 = MorseString(s1) assert(s1 == ms1.decompress()) s2 = "The quick brown fox jumped over the lazy dog." ms2 = MorseString(s2) assert(s2 == ms2.decompress()) s3 = "abc -.+" ms3 = MorseString(s3) ms3 assert(s3 == ms3.decompress()) if __name__ == "__main__": main() </code></pre> What are some simple methods that would a) improve our algorithm, and b) be relatively easy to explain to my 8-year old niece? While the last point is clearly subjective, I'm nevertheless trying to indulge her curiosity as much as possible. I welcome any improvements to the code as well since it's not structured terribly well (I'm fairly certain it's structured quite poorly, actually, but it's quick and dirty), although that's strictly for my benefit since I haven't gotten my niece to use Python (YET). <h3>Update</h3> Here is an updated version of the code, which attempts to incorporate both user1884905's modifications to the algorithm and Karl's improvements to the code itself. <pre class="prettyprint"><code>import itertools import string _char_morse_map = {"a":".-", "b":"-...", "c":"-.-.", "d":"-..", \ "e":".", "f":"..-.", "g":"--.", "h":"....", \ "i":"..", "j":".---", "k":"-.-", "l":".-..", \ "m":"--", "n":"-.", "o":"---", "p":".--.", \ "q":"--.-", "r":".-.", "s":"...", "t":"-", \ "u":"..-", "v":"...-", "w":".--", "x":"-..-", \ "y":"-.--", "z":"--..", "1":".----", "2":"..---", \ "3":"...--", "4":"....-", "5":".....", "6":"-....", \ "7":"--...", "8":"---..", "9":"----.", "0":"-----", " ":"", ".":"d", "+":"p", "-":"h"} _morse_char_map = { code: letter for letter, code in _char_morse_map.items() } def encode(s): return "".join(_char_morse_map.get(c, c) + "+" for c in s) def decode(encoded): return "".join(decode_gen(encoded)) def decode_gen(encoded): word = "" for c in encoded: if c != "+": word += c else: yield _morse_char_map.get(word, word) if word != "" else " " word = "" def compress(s): def grouper(n, k): return str(n) + k if n > 1 else k return "".join(grouper(len(list(g)), k) for k, g in itertools.groupby(s)) def decompress(compressed): return "".join(decompress_gen(compressed)) def decompress_gen(compressed): digits = "" for c in compressed: if c in string.digits: digits += c else: number = int(digits) if digits else 1 yield "".join(c * number) digits = "" </code></pre>

A simple change that can be made to compress your output (at least in most cases) is to keep the idea of a pause between two letters and to let your <code>+</code>-signs denote this pause (i.e. the pluses are used as a new-character-character instead of using them as padding). This will make all numbers <code>0-9</code> one character longer but make quite a few commonly occurring letters shorter. For example <code>a</code>, <code>e</code>, <code>i</code> and <code>t</code> will become <code>.-+</code>, <code>.+</code>, <code>..+</code> and <code>-+</code> instead of <code>.-+++</code>, <code>.++++</code>, <code>..+++</code> and <code>-++++</code>. (And <code>space</code> can be denoted <code>+</code> instead of <code>+++++</code>) So your <code>Hello, World</code> example would become: <pre class="prettyprint"><code>H+.+.-..+.-..+---+,++W+---+.-.+.-..+-..+ </code></pre> instead of <pre class="prettyprint"><code>H++++.++++.-..+.-..+---++,+++++++++W++++---++.-.++.-..+-..++ </code></pre> and compress to <pre class="prettyprint"><code>H+.+.-2.+.-2.+3-+,2+W+3-+.-.+.-2.+-2.+ </code></pre> instead of <pre class="prettyprint"><code>H4+.4+.-2.+.-2.+3-2+,9+W4+3-2+.-.2+.-2.+-2.2+ </code></pre> Understanding this reasoning, Huffman encoding will only seem like the natural next step, where the basic idea is to make the most common characters take up as little amount of space as possible. Edit: See also this wikipedia-image of a dichotomatic search table showing the most commonly occurring characters close to the top of the tree. <img src="https://i.stack.imgur.com/AvjtQ.png" alt="Morse code tree">

How can I analyze or improve my niece's simple compression algorithm that is based on Morse code?

Tags:

python

algorithm

python-3.x

morse-code

My 8 year old niece had a lesson on Morse code in school yesterday, and her assignment was to convert various phrases to Morse code. One of the phrases included her age, and instead of writing ---.., she wrote 3-2. because (in her words), "it's less writing that way." This rudimentary "compression algorithm" sparked my curiosity, so I wrote a bit of code to implement it.

However, we made a few changes along the way. I pointed out to her that if you wrote just .....-----, there isn't any way to tell if the writer meant 50 or eeeeettttt. In reality, there is a pause in between each letter of each word and every word so this isn't a problem, but our scheme didn't have that. I pulled out some graph paper and suggested padding the Morse code for each symbol with another symbol to facilitate coding and remove ambiguity from the scheme. My nice suggested using + because "no one ever writes those in sentences." (Ouch, I recently graduated with a math degree, but fair enough.)

Since some of us do write with +, and we all use hyphens and periods/dots, which would conflict with our standard definition of Morse code, these symbols are replaced with p, h, and d, respectively. Of course, this brings us to the problem of what to do with symbols that aren't defined in our extended Morse code. My niece wanted to simply ignore them, so that's what we did. For the sake of case sensitive preservation of the textual message, upper case letters aren't lower cased in the code; they're just carried through as is and padded with +.

Summary of the algorithm:

Morse codes are right-padded to 5 characters with +
We extended Morse code to substitute p for +, d for ., and h for -.
Symbols that aren't defined in our "extended" Morse code are passed through intact.
Runs of symbols are replaced with unless only one consecutive character occurs, in which case the number is omitted.

Potential pitfalls:

My padding scheme probably reduces the effectiveness of the compression.
Compression might be improved by using blocks larger than 5 characters
If either my niece or I knew anything about compression algorithms, we could probably use that makes them successful.
This obviously isn't suitable for production, but since there are numerous efficient compression algorithms for such purposes, I'm ignoring that problem for the time being.
???

Example:

In our algorithm, "Hello, World" translates to

H++++.++++.-..+.-..+---++,+++++++++W++++---++.-.++.-..+-..++

and compresses to

H4+.4+.-2.+.-2.+3-2+,9+W4+3-2+.-.2+.-2.+-2.2+

Here is the Python code I threw together:

#!/usr/bin/python3

import itertools
import string

class MorseString(str):
    def __init__(self, string):
        # or, pad values during iteration but this seems neater
        self._char_morse_map = {"a":".-+++", "b":"-...+", "c":"-.-.+", "d":"-..++", \
                                "e":".++++", "f":"..-.+", "g":"--.++", "h":"....+", \
                                "i":"..+++", "j":".---+", "k":"-.-++", "l":".-..+", \
                                "m":"--+++", "n":"-.+++", "o":"---++", "p":".--.+", \
                                "q":"--.-+", "r":".-.++", "s":"...++", "t":"-++++", \
                                "u":"..-++", "v":"...-+", "w":".--++", "x":"-..-+", \
                                "y":"-.--+", "z":"--..+", "1":".----", "2":"..---", \
                                "3":"...--", "4":"....-", "5":".....", "6":"-....", \
                                "7":"--...", "8":"---..", "9":"----.", "0":"-----",
                                " ":"+++++", ".":"d++++", "+":"p++++", "-":"h++++"}
        
        self._morse_char_map = dict()
        for letter, code in self._char_morse_map.items():
            self._morse_char_map[code] = letter

        self._string = string

        # convert the string to "Morse code". Could also use c.lower()
        self._morse_string = "".join([self._char_morse_map.get(c, c.ljust(5, "+")) for c in self._string])

    def compress(self):
        def grouper(n, k):
            return str(n) + k if n > 1 else k

        # could just use lambda
        return "".join([grouper(len(list(g)), k) for k, g in itertools.groupby(self._morse_string)])

    def decompress(self):
        i = 0
        start = 0
        chars = list()
        sc = self.compress()
        while i < len(sc):
            curr = sc[i]
            i += 1
            if not(curr in string.digits):
                num = 1 if start + 1 == i else int(sc[start:i-1])
                chars.append("".join(curr * num))
                start = i

        code = "".join(chars)
        chars = list()

        for i in range(0, len(code), 5):
            piece = "".join(code[i:i+5])
            chars.append(self._morse_char_map.get(piece, piece[0]))
            
        return "".join(chars)


def main():
    s0 = "Hello, World"
    ms0 = MorseString(s0)
    print(ms0._morse_string)
    print(ms0.compress())
    assert(s0 == ms0.decompress())
    
    s1 = "Hello  2 world."
    ms1 = MorseString(s1)
    assert(s1 == ms1.decompress())

    s2 = "The quick brown fox jumped over the lazy dog."
    ms2 = MorseString(s2)
    assert(s2 == ms2.decompress())

    s3 = "abc  -.+"
    ms3 = MorseString(s3)
    ms3
    assert(s3 == ms3.decompress())

if __name__ == "__main__":
    main()

What are some simple methods that would a) improve our algorithm, and b) be relatively easy to explain to my 8-year old niece? While the last point is clearly subjective, I'm nevertheless trying to indulge her curiosity as much as possible.

I welcome any improvements to the code as well since it's not structured terribly well (I'm fairly certain it's structured quite poorly, actually, but it's quick and dirty), although that's strictly for my benefit since I haven't gotten my niece to use Python (YET).

Update

Here is an updated version of the code, which attempts to incorporate both user1884905's modifications to the algorithm and Karl's improvements to the code itself.

import itertools
import string

_char_morse_map = {"a":".-", "b":"-...", "c":"-.-.", "d":"-..", \
                   "e":".", "f":"..-.", "g":"--.", "h":"....", \
                   "i":"..", "j":".---", "k":"-.-", "l":".-..", \
                   "m":"--", "n":"-.", "o":"---", "p":".--.", \
                   "q":"--.-", "r":".-.", "s":"...", "t":"-", \
                   "u":"..-", "v":"...-", "w":".--", "x":"-..-", \
                   "y":"-.--", "z":"--..", "1":".----", "2":"..---", \
                   "3":"...--", "4":"....-", "5":".....", "6":"-....", \
                   "7":"--...", "8":"---..", "9":"----.", "0":"-----",
                   " ":"", ".":"d", "+":"p", "-":"h"}

_morse_char_map = {
    code: letter
    for letter, code in _char_morse_map.items()
}


def encode(s):
    return "".join(_char_morse_map.get(c, c) + "+" for c in s)

def decode(encoded):
    return "".join(decode_gen(encoded))

def decode_gen(encoded):
    word = ""
    for c in encoded:
        if c != "+":
            word += c
        else:
            yield _morse_char_map.get(word, word) if word != "" else " "
            word = ""

def compress(s):
    def grouper(n, k):
        return str(n) + k if n > 1 else k

    return "".join(grouper(len(list(g)), k) for k, g in itertools.groupby(s))

def decompress(compressed):
    return "".join(decompress_gen(compressed))

def decompress_gen(compressed):
    digits = ""
    for c in compressed:
        if c in string.digits:
            digits += c
        else:
            number = int(digits) if digits else 1
            yield "".join(c * number)
            digits = ""

690

asked Mar 28 '13 03:03

John Bensin

1 Answers

A simple change that can be made to compress your output (at least in most cases) is to keep the idea of a pause between two letters and to let your +-signs denote this pause (i.e. the pluses are used as a new-character-character instead of using them as padding).

This will make all numbers 0-9 one character longer but make quite a few commonly occurring letters shorter.

For example a, e, i and t will become .-+, .+, ..+ and -+ instead of .-+++, .++++, ..+++ and -++++. (And space can be denoted + instead of +++++)

So your Hello, World example would become:

H+.+.-..+.-..+---+,++W+---+.-.+.-..+-..+

instead of

H++++.++++.-..+.-..+---++,+++++++++W++++---++.-.++.-..+-..++

and compress to

H+.+.-2.+.-2.+3-+,2+W+3-+.-.+.-2.+-2.+

instead of

H4+.4+.-2.+.-2.+3-2+,9+W4+3-2+.-.2+.-2.+-2.2+

Understanding this reasoning, Huffman encoding will only seem like the natural next step, where the basic idea is to make the most common characters take up as little amount of space as possible.

Edit:

See also this wikipedia-image of a dichotomatic search table showing the most commonly occurring characters close to the top of the tree.

Morse code tree

105

answered Oct 27 '22 14:10

user1884905

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

How can I analyze or improve my niece's simple compression algorithm that is based on Morse code?

Tags:

python

algorithm

python-3.x

morse-code

Summary of the algorithm:

Potential pitfalls:

Example:

Update

John Bensin

People also ask

1 Answers

user1884905

Recent Activity

Donate For Us

How can I analyze or improve my niece's simple compression algorithm that is based on Morse code?

Tags:

python

algorithm

python-3.x

morse-code

Summary of the algorithm:

Potential pitfalls:

Example:

Update

John Bensin

People also ask

1 Answers

user1884905

Related questions

Recent Activity

Donate For Us