Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How would you write a test for the `Iconv.new("UTF8//IGNORE", ...)` idiom?

This Iconv idiom transcodes a string to UTF-8 and drops characters that can't be transliterated:

require "iconv"

def normalize(text)
  Iconv.new('UTF-8//IGNORE', 'UTF-8').iconv(text.dup)
end

How would you actually write a test for this?

Edit: I ended up simplifying the question since I realized the context of trying to test this in a Rails # encoding: utf-8 spec file was complicating the issue. So now the bounty is kind of silly but I'll reward it anyways if someone can show a test I can work off of.

like image 694
danneu Avatar asked Oct 06 '22 01:10

danneu


1 Answers

You can construct Strings from a byte array using the #pack method. This way, you can easily generate an invalid/bad string and use it in a test.

Example:

describe "#normalize" do
  it "should remove/ignore invalid characters" do
    # this "string" equals "Mandados de busca do caso Megaupload considerados inv\xE1lidos - Tecnologia - Sol"
    bad_string = [77, 97, 110, 100, 97, 100, 111, 115, 32, 100, 101, 32, 98, 117, 115, 99, 97, 32, 100, 111, 32, 99, 97, 115, 111, 32, 77, 101, 103, 97, 117, 112, 108, 111, 97, 100, 32, 99, 111, 110, 115, 105, 100, 101, 114, 97, 100, 111, 115, 32, 105, 110, 118, 225, 108, 105, 100, 111, 115, 32, 45, 32, 84, 101, 99, 110, 111, 108, 111, 103, 105, 97, 32, 45, 32, 83, 111, 108].pack('c*').force_encoding('UTF-8')

    normalize(bad_string).should == 'Mandados de busca do caso Megaupload considerados invlidos - Tecnologia - Sol'
  end
end

(I'm sorry for the rather long test string, I just couldn't find a shorter example in my code)

like image 73
severin Avatar answered Oct 13 '22 10:10

severin