Looking at Neo4j, and the 32 billion relationship limit has me worried (imagine 40 million users who upload 500 photos, have 500 friends, make 500 comments etc and before you know it you are past 32 billion).. So I have some concerns and have to make sure I'm making the best choice on which database to use.
Not looking for subjective answers nor debate here - ie. which one is better etc - rather, since I'm betting a startup's future on what graph database is uses, I need to know the risks the different databases present, such as Neo4j not having more than 32billion relationships.
Now, several companies have called their graph databases the "leading graph database".. but let's look past the hype -which one has the most financial backing? Which db enjoys a large community support? Which one has a solid company behind it for commercial support?
Which one is most likely to be mature enough so if you wanted, you could easily create facebook with minimal effort?
It's easy to choose a graph database on technical features or familiarity - but I'm looking for more than that - I want to make sure a few years from the company is still around. I want to make sure I'm not choosing to go with Neo4j based on hype and the momentum it currently (temporarily?) has...
And What other graphs can contend with Neo4gj to create a full fledged social network similar to facebook (again, not looking for better, just looking for a solid competitor ).
Please don't let this turn into a subjective Neo vs Dex debate - just facts and solids answers please..
Neo4j is the leading graph database technology that drives innovation and competitive advantage at Airbus, Comcast, eBay, NASA, UBS and more.
Graphs and graph databases provide graph models to represent relationships. They allow users to apply pattern recognition, classification, statistical analysis, and machine learning to these models, which enables more efficient analysis at scale against massive amounts of data.
Graph databases are not as useful for operational use cases because they are not efficient at processing high volumes of transactions and they are not good at handling queries that span the entire database.
Disclaimer: I work for/with Neo4j
Just talking about the maturity here (not technicalities) - Neo Technology as a company with more than 50 employees, $25M funding and a thriving user-base with half a million downloads, 30k new databases running each month and an active community won't go away. You can also check the SO questions to see the community activity.
We have a healthy set of customers in many domains from big ones like Adobe (runs creative cloud on Neo4j), Cisco (Org-Management, MDM), social networks like Viadeo and many Job search companies (GlassDoor, and others) to startups like fiftythree who published the popular "Paper" app on iOS.
Our community site neo4j.org should be a good place to go, to get started, you find there introductory content as well as information on programming languages, drivers and deployments that should help you get started.
Emil, Ian and Jim wrote an introductory book about "graph databases" with O'Reilly which is currently available as a free ebook download.
So you see we're not just taking care about our own product but also the bigger graph ecosystem, also with many conference talks, meetup groups (41 worldwide) and support of the open source ecosystem.
Hope that helps you deciding.
P.S. Regarding your concerns: The size limits (which are artificially anyway) will be increased this year.
So I've tested and been working with graph databases for the last year. I think only you know your data well enough to be able to make an educated guess as to whether you're going to have any nodes needing more than 32 billion relationships. I would argue there are not a lot of use cases right for most people where this is a limitation. But that's not absolute.
Neo4j is a brilliant product. Well documented and with folks like maxdemarzi writing excellent blog posts - such as: http://maxdemarzi.com/ - which will bring anyone up to speed on the power and sophistication of neo4j pretty quickly. (Plus he's a nice guy who'll answer your questions if you have them)
If scale is an issue I'd also recommend you take a look at Titan - http://thinkaurelius.github.com/titan/. The guys behind this are brilliant and it's intended for massive scale. It's not as established in the market as neo4j but it has a lot of power and gives you some flexibility on priorities by letting you select between Cassandra, Hbase and BerkeleyDB for underlying storage.
Neo4j is a well backed, well funded company with real revenues. It isn't going anywhere. Titan is smaller but I think is on a rapid upward curve.
The truth is though it's all a new space. You're not getting anything as established as Postgres, MySql or the corporate strength of Oracle. Let's not kid ourselves.
However the graph database community is relatively small, friendly and helpful. It runs great events - I was at Neo4j's GraphCon event which was awesome, and I've been to some talks by the Titan guys which were great. Ultimately if you want to be Facebook though, whatever you start with you'll end up building your own infrastructure. There's scale and then there's you-need-to-own-datacenters-the-size-of-small-countries scale.
One final thought. The problem of 40 million users and your underlying infrastructure challenges is a problem for a well established well funded company. You don't get to 40 million users and not attract the funding or generate the revenue necessary to finance building out your own infrastructure. You can plan now for when you're 40 million users, absolutely. Go for it. That's the fun of early stages in a startup. But your bigger problem is getting to your first million or ten million even. For that use whichever of these databases gets you to market fastest with a solid product.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With