Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Generating a fake ISBN from book title? (Or: How to hash a string into a 6-digit numeric ID)

Short version: How can I turn an arbitrary string into a 6-digit number with minimal collisions?

Long version:

I'm working with a small library that has a bunch of books with no ISBNs. These are usually older, out-of-print titles from tiny publishers that never got an ISBN to begin with, and I'd like to generate fake ISBNs for them to help with barcode scanning and loans.

Technically, real ISBNs are controlled by commercial entities, but it is possible to use the format to assign numbers that belong to no real publisher (and so shouldn't cause any collisions).

The format is such that:

978-0-01-######-?

Gives you 6 digits to work with, from 000000 to 999999, with the ? at the end being a checksum.

Would it be possible to turn an arbitrary book title into a 6-digit number in this scheme with minimal chance of collisions?

like image 593
i-know-nothing Avatar asked Nov 13 '22 20:11

i-know-nothing


1 Answers

After using code snippets for making a fixed-length hash and calculating the ISBN-13 checksum, I managed to create really ugly C# code that seems to work. It'll take an arbitrary string and convert it into a valid (but fake) ISBN-13:

       public int GetStableHash(string s)
       {
           uint hash = 0;
           // if you care this can be done much faster with unsafe 
           // using fixed char* reinterpreted as a byte*
           foreach (byte b in System.Text.Encoding.Unicode.GetBytes(s))
           {   
               hash += b;
               hash += (hash << 10);
               hash ^= (hash >> 6);    
           }
           // final avalanche
           hash += (hash << 3);
           hash ^= (hash >> 11);
           hash += (hash << 15);
           // helpfully we only want positive integer < MUST_BE_LESS_THAN
           // so simple truncate cast is ok if not perfect
           return (int)(hash % MUST_BE_LESS_THAN);
       }

       public int CalculateChecksumDigit(ulong n)
       {
           string sTemp = n.ToString();
           int iSum = 0;
           int iDigit = 0;

           // Calculate the checksum digit here.
           for (int i = sTemp.Length; i >= 1; i--)
           {
               iDigit = Convert.ToInt32(sTemp.Substring(i - 1, 1));
               // This appears to be backwards but the 
               // EAN-13 checksum must be calculated
               // this way to be compatible with UPC-A.
               if (i % 2 == 0)
               { // odd  
                   iSum += iDigit * 3;
               }
               else
               { // even
                   iSum += iDigit * 1;
               }
           }
           return (10 - (iSum % 10)) % 10;
       }


       private void generateISBN()
       {
           string titlehash = GetStableHash(BookTitle.Text).ToString("D6");
           string fakeisbn = "978001" + titlehash;
           string check = CalculateChecksumDigit(Convert.ToUInt64(fakeisbn)).ToString();

            SixDigitID.Text = fakeisbn + check;
       }
like image 78
2 revs Avatar answered Dec 18 '22 05:12

2 revs