For a personal interest I try to define a simulated AI who is based on information that he learned and internet search in order to give more details than what the system know.
I took the example of a child, when he's born he need to learn everything, he heard a lot and then propose some answers. His mom/dad tell him if the answer are suitable or not.
In order to do that I wanted to stock a lot of chat conversations in an hadoop system and parse all of those conversation in order to determine which are the most frequent answer given. With that I want to construct a neuronal database who contains conversations types with the determined answers.
So my question is can I find somewhere legally on the internet one or more chat/conversation database in any format? (file, database, csv, ...)
The most data I have the best my chance are to be able to determine correctly the answers ;)
Thanks for the help and cheers, Frédéric
PS: English is not my mother tongue
There is a collection of conversational datasets. Most of them are collected from publicly available sources. For you the most interesting ones could be the Santa Barbara corpus (although it's a transcript of speech conversations) or the movie dialog dataset.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With