Google processes huge data sets. For example, the search engine needs to know how often words occur in web documents.
The data sets are too large for a single computer. As a single computer has only eight CPUs, processing the data takes forever. Hence, Google uses hundreds of computers in parallel.
But writing parallel programs is hard. You must tell each of the hundreds of computers exactly what data to process and what to send to other computers. Writing these parallel programs, again and again, is expensive. What would you do, if a human experts cost you more than $100k per year?
Right, you would fix it. And so does Google. They developed a parallel processing system called MapReduce in 2008. With MapReduce, writing parallel programs is simple. You define two functions map() and reduce(). Then, the MapReduce system takes care about distributing the data and the function executions. The MapReduce system is even resilient against the crash of many computers.
The map() function transforms input data into output data. For example, it transforms a web document into a collection of (word, frequency) tuples. A document that contains the word “Google” ten times is mapped to the tuple (“Google”, 10).
The reduce() function takes tuples with the same key and maps them to a new value. For example, it takes all (“Google”, x ) tuples and creates a new tuple that sums over all values x. In other words, it reduces the keyword “Google” to a single value. This single value is the number of occurrences of the word “Google” in all web documents.
While working as a researcher in distributed systems, Dr. Christian Mayer found his love for teaching computer science students.
To help students reach higher levels of Python success, he founded the programming education website Finxter.com. He’s author of the popular programming book Python One-Liners (NoStarch 2020), coauthor of the Coffee Break Python series of self-published books, computer science enthusiast, freelancer, and owner of one of the top 10 largest Python blogs worldwide.
His passions are writing, reading, and coding. But his greatest passion is to serve aspiring coders through Finxter and help them to boost their skills. You can join his free email academy here.