[Reading time: 9 minutes]
Graphs are everywhere. They are used in social networks, the world wide web, biological networks, semantic web, product recommendation engines, mapping services, blockchains, and Bitcoin flow analyses. Furthermore, they’re used to define the flow of computation of software programs, to represent communication networks in distributed systems, and to represent data relationships in large organizations.
Overview Graph Applications
These are the most important graph applications:
- Social networks: A social network is by definition, well, a network. And graphs are special cases of networks, with only a single type of edge between vertices.
- Web graphs: The web is a huge collection of documents pointing to each other via hyperlinks. In other words, the web is another massive graph data set.
- Biological networks: The (biological) environment is actually one of the largest sources of real-world graphs. Examples are brain networks, protein interaction networks, food networks.
- Knowledge graphs: The knowledge of the world is inherently graph-structured. Information A is connected to information B if A stands in relation to B in some specific way. Consider the following types of information:
- Product recommendation graphs: If you buy a product, Amazon recommends you buying similar products. These recommended products are based on what other users have already bought. For example, you buy a book about Python; Amazon recommends you to buy a book about Scrum. At the heart of these systems are huge bipartite graphs.
- Neural networks: Artificial neural networks are huge graphs connecting neurons via artificial synapses. There are many different types of neural networks. The main difference between these types is the architecture of the graphs.
- Road networks: Apps like Maze, Google Maps, Apple Maps, and Uber are installed on every smartphone. Do you have studied a subject related to computer science? Then you know that navigational problems are inherently modeled as graph problems. Think about the traveling salesman problem, shortest path problems, Hammington paths, etc.
- Blockchains: The Blockchain is a large graph. The vertices are blocks, each storing many transactions. The edges connect subsequent blocks. The largest branch initiating from the first block (THE block-chain) is the currently valid state of historical transactions.
- Bitcoin transaction graphs: The Blockchain is an interesting graph that is often analyzed in the cryptocurrency space. Another insightful graph arises when you use Bitcoin wallets as vertices and transactions between wallets as edges. The resulting graph reflects the money flow between Bitcoin wallets. This graph is critical to learning about global money flow patterns.
This article dives into these 9 applications. At the end of the article, you will find awesome resources to download graph data sets. Plus, I will answer the question about the largest graphs in the world.
You can also watch the explainer video from Finxter founder Chris if you like.
What’s a Graph?
A graph is a basic data structure in computer science. It models relationships between data items. Using graphs to model real-world phenomena is not a new idea. In 1736, Leonhard Euler has invented the graph data structure to solve the problem of “seven bridges of Königsberg”. Graphs existed way before the first computer was even an idea. In fact, as we will see in this article, graphs helped to make the computer possible. Without graphs, there wouldn’t be a computer as we now know it today.
What Are Real-World Applications of Graphs?
You may ask this question because you are a computer science student who needs to write a report on the topic of graphs. Or you may think about working, or even doing research, in the area of graph theory. Or you have learned about graphs in an introduction to computer science lecture.
Graphs are everywhere. So let’s dive into a list of motivating use cases for graph data and graph algorithms.
A social network is by definition, well, a network. And graphs are special cases of networks, with only a single type of edge between vertices.
How many friends do you have on Facebook? How many friends do your friends have? And their friends? Facebook can answer this question with surprising accuracy. They store on their servers an internal graph representation of the human social network.
How does that look like? A digitized social network describes each user as a graph vertex and each friendship relation as an edge between two users. The hundreds of billions of friendship relations in the Facebook social network together build a graph data structure of massive scale.
Each time you use Facebook Graph Search, Facebook runs an algorithm on this graph data structure.
Each time a client of Facebook wants to learn about who is an influencer in a certain field, they run a graph algorithm (simple or complex). Such a graph algorithm can determine your position in the network. It can use the graph structure to predict the flow of information (“how many people would buy the new MacBook, if you recommended it?”).
The first important application scenario for graphs is social network analysis. And this topic is so important that you would not even have to look further to find a solid motivation for graph data.
But, of course, we will do exactly this.
The World Wide Web
The web is a huge collection of documents pointing to each other via hyperlinks. In other words, the web is another massive graph data set.
You can model the web as a graph by treating each web page as a graph vertex and each hyperlink as a graph edge.
Many business models of large companies (such as Google) revolve around analyzing the massive web graph.
Think about it: what does the Google search really do? Given your information need, Google needs to present you with the most relevant piece of content from this huge web graph. If they do a good job (they do), they satisfy your information need. Being satisfied with the service causes you to reuse it again and again.
The problem is that it’s still difficult for machines to differentiate between good and bad content. 99% of the web content you would consider as trash. How does the Google bot know which content has high quality?
They use a self-improving process that already undergoes all our activities: referrals. If many people believe that content is good, they will share it with others. Technically, they share content on the web by exchanging hyperlinks: Most likely, you already know the
<a href="www.example.com"> Awesome Web Page Here </a> HTML tags. The conventional view is that the more links pointing to a resource, the higher its quality. To rank web pages for quality, you could simply sort them by in-degree.
However, not all links are created equal. Which link source carries more value: Wikipedia.org or franks-cute-cat-videos.wordpress.com? Simply put, if Wikipedia links to your content, it’s more likely that it has high quality.
This idea is the basis of Google’s famous PageRank algorithm. It is an iterative graph algorithm that determines the rank of a web page (higher is better) based on the summed ranks of web pages linking to them. How do we know their rank? We look into the summed ranks of web pages linking to them. As you can see, this is an iterative procedure that refines the ranks of all the web pages one step at a time. After converging, the PageRank algorithm gives a pretty accurate measure of how trustworthy and renowned a web page is.
Ok, let’s go back to the question: what are application scenarios or use cases of graphs in computer science? The business models of the largest companies in the world, with billions of dollars yearly profits, center around efficient processing and information retrieval from large graphs.
Let’s move on to another application domain of graph theory: biological networks. The (biological) environment is actually one of the largest sources of real-world graphs. Let’s explore some biological networks in the following collection.
Brain networks. Neuron A connects to neuron B via the synapsis (A,B). A single human brain contains 100 billion neurons (source). The neurons of the brain of a child are interconnected by more than 1 quadrillion (1,000,000,000,000,000) synapses (source).
Protein interaction networks. Protein A is connected to protein B if they have interacted within a certain time frame. “Protein-protein interactions (PPIs) are the physical contacts of high specificity established between two or more protein molecules as a result of biochemical events steered by electrostatic forces including the hydrophobic effect” (source). The number of such interactions can be huge. You may ask: why do we need these networks? The reason is that they are critical to study the efficiency of evolutionary processes. A recent study showed: “[…], it has been discovered that proteins with high degrees of connectedness are more likely to be essential for survival than proteins with lesser degrees” (source).
DNA-Protein interaction networks. DNA (or RNA) A is connected to Protein B if they have interacted within a certain time frame. These interaction networks are also called “Gene Regulatory Networks”. They are important points of study in the field of morphogenesis (= the process of creating body structures).
Food networks. This bloody biological network describes one of the most natural processes in the world. Species A is connected to species B if A eats B. The stability of these networks over time is an important area of research in ecology. The network is not trivial: there are more than 7.6 million species in the world. And many species eat hundreds of different species. Such food networks are full of valuable insights into why certain species die out.
There are myriads of similar biological interaction networks. If you want to learn more about biological networks, check out this excellent Wikipedia article.
The knowledge of the world is inherently graph-structured. Information A is connected to information B if A stands in relation to B in some specific way. Consider the following types of information:
- Donald Trump follows Barack Obama on Facebook.
- Barack Obama has an account on Youtube.
- Barack Obama has an account on Facebook.
- Donald Trump eats steak.
There are hundreds of trillions such relations between two entities on the web. Among others, Google collects all these relations and merges them into a massive knowledge graph. This huge graph enables machines to automatically infer new knowledge from the graph. For example, Donald Trump lives in Texas; Many people living in Texas like steaks; Hence, it is likely that Donald Trump likes steaks as well.
Few people know that it’s possible to access this knowledge graph gathered by Google. Google provides an API that you can easily use and play around with.
Think about the opportunities of massive knowledge graphs that are shared among devices all over the world! The emerging collective intelligence makes applications smarter – even without high computing power. Every dumb device has access to the world’s wisdom. Now that sounds a bit scary, I know. But this article is only a compilation of graph applications and practical use cases. It’s neither more nor less.
Product Recommendation Graphs
In the last sections, you have learned that one of the largest company in the world centers around processing massive graph data. Next, you will learn about how the company owned by the richest man on the earth (Jeff Bezos’ net worth is 141 Billion US-$ in 2018, source) uses graphs to boost its revenue. Amazon. If you buy a product, Amazon recommends you buying similar products. These recommended products are based on what other users have already bought. For example, you buy a book about Python; Amazon recommends you to buy a book about Scrum.
How do these recommender systems work? At the heart of these systems are huge bipartite graphs.
Here is a quick reminder of what bipartite graphs are:
“In the mathematical field of graph theory, a bipartite graph (or bigraph) is a graph whose vertices can be divided into two disjoint and independent sets U and V such that every edge connects a vertex in U to one in V. Vertex sets U and V are usually called the parts of the graph. Equivalently, a bipartite graph is a graph that does not contain any odd-length cycles” (source).
Amazon knows for each user, which products he has bought (and liked). It’s a bipartite graph. On the left side are all the users. On the right side are all the products. If a user has bought (and liked) a product, there is a connection from the user to the product. The graph is bipartite as there can not be a connection between two users or two products. (Neither can a user buy another user nor can a product buy another product.)
The graph problem Amazon has to solve now is the following. Given the existing user-product graph. What are some likely edges from users and products? We can do this by clustering users together that bought the same products. It is likely that users will buy similar products in the future as well. Those are the recommendations provided by Amazon. While the algorithms are much more sophisticated, this remains the main idea.
Artificial Neural Networks
We have already seen that artificial neural networks are huge graphs connecting neurons via artificial synapses. There are many different types of neural networks. The main difference between these types is the architecture of the graphs. If you need visual examples, check out the following awesome cheat sheet.
Road and Map Networks
You are using an important graph application every couple of days. Navigation has become the killer application in mobile scenarios. Apps like Maze, Google Maps, Apple Maps, and Uber are installed on every smartphone. Do you have studied a subject related to computer science? Then you know that navigational problems are inherently modeled as graph problems. Think about the traveling salesman problem, shortest path problems, Hammington paths, etc. Do you want to know the fastest route from point A to point B? Your navigation app makes a graph problem out of it. What is the underlying graph? Create a vertex for each location. And create a connection between two locations A and B if there is a direct road between locations A and B. Finally, you annotate the road with the traveling time from point A to point B.
Do you possess Bitcoins? How did you get them? In any case, you created a Bitcoin wallet and transferred funds into your own wallet. The transaction of moving Bitcoins into your wallet was stored for all times in the Bitcoin Blockchain.
What is the Blockchain?
It is “an open, distributed ledger that can record transactions between two parties efficiently and in a verifiable and permanent way” (source).
In other words, the Blockchain is a large graph. The vertices are blocks, each storing many transactions. The edges connect subsequent blocks. The largest branch initiating from the first block (THE block-chain) is the currently valid state of historical transactions.
Bitcoin Transaction Graph
The Blockchain is an interesting graph that is often analyzed in the cryptocurrency space. Another insightful graph arises when you use Bitcoin wallets as vertices and transactions between wallets as edges. The resulting graph reflects the money flow between Bitcoin wallets. This graph is critical to learning about global money flow patterns.
Circuit and Computer Chip Placement
Graph algorithms are also important in many other domains and use cases. For instance, the field of computer chip design relies on resource-efficient ways to place signals on a single chip. The numerous signals must be embedded onto the two-dimensional plane. Here is an interesting paper about chip design.
What Are The Largest Graphs?
Here are three of the largest graphs that are currently available in the world:
- The Facebook social network has 2.27 billion monthly active users (source). Each user has 155 friends on average (source). So the total number of friendships (edges) is 2.27 billion * 155 = 351 billion.
- The worldwide web contains 2 billion web pages (source). If you assume between 10 and 100 hyperlinks per page, the web graph has between 20 and 200 billion hyperlinks. I know, this estimate is unreliable – probably it is a huge understatement. But it may give you a feeling of how large the web really is (and how hard Google’s job is managing this kind of data).
- The human brain has 100 billion neurons (source). The neurons of the brain of a child are interconnected by more than 1 quadrillion (1,000,000,000,000,000) synapses (source). So the brain graph is larger than any other graph that was ever (EVER) processed by any research institute in the world. It is just too large. Assuming two 16 byte integers per edge, we need 1 quadrillion * 2 * 16 bytes = 32 quadrillion = 32 million Gigabytes only to store the edge data of the graph.
Where to Find Graph Data Sets?
- Snap: A collection of graph data sets including small parts of the Facebook social network. The collection is maintained by a research group from Stanford.
- The graph challenge benchmark from MIT has a large collection of graphs.
- This beautiful resource for visualization and analytics of large graphs has helped me many times during my graph processing research. Check it out, there are hundreds of high-quality graph resources!
- This collection of resources is well categorized. You can check it out as a resource for further research if you didn’t find a graph at above three resources.
- The German research institute from the University of Koblenz has collected a large number of graphs. This link is highly valuable as it contains one of the largest graphs you can find on the web (Twitter) with 1.4 billion edges.
Thanks for reading through this article! If you have any additional graph applications (or graph resources), please let me know by replying to any of our emails (register by clicking the button below and follow the steps).
Want to dive deeper into computer science with Python? Register for the “Coffee Break Python” newsletter! It’s a 100% **FREE** Python email course (BONUS: 5 Python cheat sheets for absolute beginners).