Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to get book metadata?

My application needs to retrieve information about any published book based on a provided ISBN, title, or author. This is hardly a unique requirement---sites like Amazon.com, Chegg.com, and even software like Book Collector seem to be able to do this easily. But I have not been able to replicate it.

To clarify, I do not need to search the entire database of books---only a limited subset which have been inputted, as in a book collection. The database would simply allow me to tag the inputted books with the necessary metadata to enable search on that subset of books. So scale is not the issue here---getting the metadata is.

The options I have tried are:

  1. Scrape Amazon. Scraping the regular Amazon pages was not very robust to things like missing authors, and while scraping the smaller mobile pages was faster, they shared the same issues with robustness of extraction. Plus, building this into an application is a clear violation of Amazon's Terms of Service.
  2. Scrape the Library of Congress. While this seems to have fewer legal ramifications, ease and robustness were again issues.
  3. ISBNdb.com API. While the service is free up to a point, and does a good job of returning the necessary metadata, I need to do this for over 500 books on a daily basis, at which point this service costs money proportional to use. I'd prefer a free or one-time payment solution that allows me to do the same.
  4. Google Book Data API. While this seems to provide the information I need, I cannot display the book preview as their terms of service requires.
  5. Buy a license to a database of books. For example, companies like Ingram or Baker & Taylor provide these catalogs to retailers and libraries. This solution is obviously expensive, so I'm hoping that there's a more elegant solution I've missed. But if not, and someone on SO has had a good experience with a particular database, I'm willing to go with that.

I've tried to describe my approach in detail so others with fewer books can take advantage of the above solutions. But given my requirements, I'm at my wits' end for retrieving book metadata, so any pointers are greatly appreciated.

like image 617
Saketh Avatar asked Jul 20 '10 06:07

Saketh


People also ask

What is metadata for a book?

Metadata: Simply put, your book metadata is any data that describes your book—including title, subtitle, price, publication date, ISBN, and any other relevant information that readers use to find your book.

Where can I get book data?

ISBNdb: The World's largest book database™ Get a FREE 7 day trial and get access to the full database of 32 + million books and all data points including title, author, publisher, publish date, binding, pages, list price, and more.

Is ISBN A metadata?

As the underlying identifier, your ISBN is the starting point to begin providing metadata. When you purchase an ISBN via Publisher Service, from either the Self Publisher Program ($55) or Publisher Program ($129), you get access to a title management portal.

Is a book index metadata?

An index is ancillary to the primary content of the authored work, whatever form that primary content takes (e.g., book, journal or newspaper article, website, blog, newsletter). As such, we must remember that the index is metadata.


1 Answers

Since it is unlikely that you have to retrieve the same 500 books every day: store the data retrieved from isbndb.com in a database and fill it up book by book.

like image 153
akira Avatar answered Sep 25 '22 01:09

akira