For music data in audio format, there's The Million Song Dataset (http://labrosa.ee.columbia.edu/millionsong/), for example. Is there a similar one for music in symbolic form (that is, where the notes - not the sound - is stored)? Any format (like MIDI or MusicXML) would be fine.
I'm not aware of a "standard" dataset. However, the places I know of for music scores in symbolic form are:
.mscz
format, but support many others. [Added December 2019]
- Wikifonia, a repository for lead sheets of songs. [As of December 2019, this site announces that it has closed.] A lead sheet is a simplified music score, perhaps enough to sing at a piano with friends, but not enough to publish a vocal score. They use MusicXML as their standard format. I estimate they have over 4000 scores. Interestingly, they have an arrangement to pay royalties for music they host. This is probably the best home for re-typeset scores of non-free/libre music. [This site was in operation in January 2012, when the answer was first written, but has ceased operation by December 2019, when this edit was made. Since the question is also old and closed, it's worth leaving this legacy entry in the answer.]
You can find a list of sites with sheet music in MusicXML and MusicXML-compatible formats at:
http://www.recordare.com/musicxml/music
Many of those sites include MIDI files and other formats as well.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With