Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Statistics about "Microformat vs HTML+RDFa" adoption

Are there some recent and reliable statistics about "Web use" (webpages using one standard or another) of these standards?

Or an specific statistic about vCard (person and/or organization) scope of use?

Only statistics, this question is not about "what the best ideia?" or "how to use it?". Looking for statistics numbers to compare Microformats adoption with (any kind of) RDFa in HTML adoption.

We can considere, for "counting pages" statistics, that Microdata is a kind of RDFa-HTML.


NOTES

Explain context

The RDFa Lite is the only W3C recommendation, when we talk about "Microdata vs Microformat", and Microdata have a better map to RDFa Lite. HTML5 has become a W3C Recommendation in 2014-10-28, and neither one was blessed by W3C. I understand that schema.org is the best way to adopt (reuse community-schemas) RDFa.

By other hand Microformats is older, and the most simple; so, perhaps, the most used in the Web (!? is it?).

About "vCard data statistics"

If we need some scope for the statistics, let's use vCard as scope:

  • Microformat's hCard and h-Card are standards for display vCards on (any) HTML, and was used for people and organizations.

  • schema.org's Person and Organization encodes vCard information with (standard) RDFa Lite or Microdata.

Other notes

Wikipedia express an old (2012's) and not-confirmable assertion (no source!), "Microformats such as hCard, however, continue to be published more than schema and others on the web", and Webdatacommons is a mess, no statistical report.

(edit) now Wikipedia's citation error is fixed.


(edit after @sashoalm comment) Note for those who disagree that this question is valid.

This question is a software problem, not a "request for off-site resource"...

PROBLEM: to decide what library, framework, data-model, etc. in a project, we need to use tools that are in use today and in the next few years... To make project decisions in a software development, we need statistics about user tendency, framework adoption, etc.

PS: here in Stackoverflow there are a lot of discussions about language statistics, that is the same "set of problems". Example: 1, 2, 3,4, 5, 6. See also the questions tagged with [usage-statistics].

like image 482
Peter Krauss Avatar asked Feb 19 '15 14:02

Peter Krauss


2 Answers

Now I see, there are some statistics (!!), the link of Wikipedia was lost... I corrected. It isn't updated, is from "Winter 2013" (~1.5 or 2 years old collected data), but show reality and tendencies.

http://webdatacommons.org/structureddata/index.html#toc2

This is the chart at the report (with RDFa+HTML dominance!):

enter image description here

enter image description here

Interpreting:

  • the section 5, "Extraction Process", say that "on each page, we run our RDF extractor based on the Anything To Triples (Any23) library", so all (RDF and Microformat) resulted in "triples" (not only RDF).

  • The ideia for "per domain" statistics is that domains use uniform politics for all pages... But I think this uniformity is false, only few pages per domain adopt "semantic markup" ... It is not more unbiased than URLs, is only another picture. Anyway, the outcome was dead heat, ~57% vs 43%.

  • Only 21% of the "semantic markup URLs" of 2013 was Microformat, all other are RDFa-HTML (Microdata is also a kind of RDFa).

  • using the average of percentuals of Domains (Ds) and URLs (Us), (Ds+Us)/2, the outcome is ~60% for RDFs and ~40% for Microformats.

  • before 2013 there was a dominance of Microformats, so, is evident the big growing of "RDFa-HTML" since 2011... The tendency is clear.

  • If we adopt the arithmetic mean of "per domain" and "per URL" countings, we have Microformats and RDFa-HTML near each other, with but with little less Microformat (and the strong tendency to RDFa-HTML grow in 2014).

Here a table for @sashoalm discussion, showing the percentuals and totals

enter image description here


NOTE1: HTML5 was released only 2014-10-28, so only ~2015-10 we will can check the real (definitive) impact of the new standard on the Web. An important expected impact is that Microdata not was blessed by HTML5, so the only standard is HTML+RDFa (that recommends RDFa Lite)... In the future perhaps there will less Microdata and more schema.org.

NOTE2: methodological problem of counting web-pages, of boilerplate text with some huge-cloned "semantic markup": I think that the "next generation" of statiscs can use some "per domain analisys" to make URL substatistics (sampling) of diversity (of semantically marked pages). Ideal is to weigh (p. ex. count once the non-clones and use 1+SQRT(count) of clones) the boilerplate.

Conclusion

Today perhaps some people use Microformat, but there are more pages in the Web using RDFa-HTML (Microdata, RDFa, RDFa Lite, etc.), and the tendency is to grow.

If your project is for next years, the statistics say to use RDFa.


NOTE

Another insteresting counting for RDFa is not the use, but the reuse of vocabularies (!). See Linked Open Vocabularies (LOV)

LOV

like image 182
5 revs Avatar answered May 16 '23 07:05

5 revs


The last statistics from the WebDataCommons as follows:

Source: http://webdatacommons.org/structureddata/2016-10/stats/stats.html

Number of domain parsed: 34 million pay-level-domains
Number of domains with RDFa, Microdata and Microformats: 5.63 million (16.5%)

Popularity of different formats: enter image description here

like image 35
alperovich Avatar answered May 16 '23 09:05

alperovich