In the context of design of a social network using <code>Graphs</code> data structure, where you can perform a BFS to find a connection from one person to another, I have some questions pertaining to it. If there are million users, the topology would indeed be much more complicated and interconnected than the graphs we normally design and I am trying to comprehend how you could solve these problems. <ol> <li>In the real world, servers fail. How does this affect you?</li> <li>How could you take advantage of caching?</li> <li>Do you search until the end of the graph (infinite)? How do you decide when to give up?</li> <li>In real life, some people have more friends of friends than others, and are therefore more likely to make a path between you and someone else. How could you use this data to pick where you start traverse? </li> </ol>

Your question seems interesting and curious :) 1) Well... of course, data is stored in disks, not in ram. Disks have systems that avoid failure, in particular, RAID-5 for example. Redundancy is the key: if one system fail there is another system ready to take his place. There is also redundancy and workload sharing together... there are two computers that work in parallel and share their jobs but if one stops only one works and take the full workload. In places like google or facebook redundancy is not 2, is 1200000000 :) And consider also that data is not in a single server farm, in google there are several datacenters connected together, so if one building explodes, another one will take his place for example. 2) Not an easy question at all, but usually these systems have big cache for disk arrays too, so reading and writing data on disk is faster than on our laptops :) Data can be processed in parallel by several concurrent systems and this is the key of the speed of services like facebook. 3) The end of the graph is not infinite. So it is possible with actual technology indeed. The computational complexity of exploring all connections and all nodes on a graph is O(n + m) where n is the number of vertices and m the number of edges. This means, it is linear to the number of registered user and to the number of connection between users. And RAM these days is very cheap. Being a linear growth is easy to add resources when needed. Add more computers the more you get rich :) Consider also that no-one will perform a real search for every node, everything in facebook is quite "local", you can view the direct friend of one person, not the friend of friend of friend .... it would be not useful. Getting the number of vertices directly connected to a vertex, if the data structure is well done, is very easy and fast. In SQL it would be a simple select and if tables are well indexed it will be very fast and also not very dependant on the total number of users (see the concept of hash tables).

Graph Data Structures with millions of nodes (Social network)

1 Answers

Your question seems interesting and curious :)

1) Well... of course, data is stored in disks, not in ram. Disks have systems that avoid failure, in particular, RAID-5 for example. Redundancy is the key: if one system fail there is another system ready to take his place. There is also redundancy and workload sharing together... there are two computers that work in parallel and share their jobs but if one stops only one works and take the full workload.

In places like google or facebook redundancy is not 2, is 1200000000 :) And consider also that data is not in a single server farm, in google there are several datacenters connected together, so if one building explodes, another one will take his place for example.

2) Not an easy question at all, but usually these systems have big cache for disk arrays too, so reading and writing data on disk is faster than on our laptops :) Data can be processed in parallel by several concurrent systems and this is the key of the speed of services like facebook.

3) The end of the graph is not infinite. So it is possible with actual technology indeed.

The computational complexity of exploring all connections and all nodes on a graph is O(n + m) where n is the number of vertices and m the number of edges. This means, it is linear to the number of registered user and to the number of connection between users. And RAM these days is very cheap.

Being a linear growth is easy to add resources when needed. Add more computers the more you get rich :)

Consider also that no-one will perform a real search for every node, everything in facebook is quite "local", you can view the direct friend of one person, not the friend of friend of friend .... it would be not useful.

Getting the number of vertices directly connected to a vertex, if the data structure is well done, is very easy and fast. In SQL it would be a simple select and if tables are well indexed it will be very fast and also not very dependant on the total number of users (see the concept of hash tables).

176

answered Sep 28 '22 05:09

Salvatore Previti

Related questions
                            
                                Alternative of system() in c Linux to execute a terminal command on linux
                            
                                How Dangerous is This Faster `strlen`?
                            
                                Real-time aware sleep() call?
                            
                                Embed manifest file to require administrator execution level with mingw32
                            
                                Testing a kernel module
                            
                                (Where) Does clang document implementation-defined behavior?
                            
                                How to do an indirect load (gather-scatter) in AVX or SSE instructions?
                            
                                Unusual C function declaration
                            
                                Why does `const` work on the thing immediately preceding it?
                            
                                Why do arithmetic operations on unsigned chars promote them to signed integers?
                            
                                OpenMP num_threads(1) executes faster than no OpenMP
                            
                                How do I print the string which __FILE__ expands to correctly?
                            
                                Are posix regcomp and regexec threadsafe? In specific, on GNU libc?
                            
                                What's the best (for speed) arbitrary-precision library for C++? [duplicate]
                            
                                Value lookup table in C by strings?
                            
                                do malloc/memcpy function run independently on NUMA?
                            
                                REPL for interpreter using Flex/Bison
                            
                                linux kernel aio functionality
                            
                                what is mean by "suppress results from generated code"
                            
                                Find holes in C structs due to alignment

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Graph Data Structures with millions of nodes (Social network)

Tags:

c

oop

data-structures

graph

Legolas

People also ask

1 Answers

Salvatore Previti

Recent Activity

Donate For Us