[2-min CS Papers] A Short Overview of the Apache Spark System

Large companies analyze massive amounts of data coming from various sources such as social nets, weblogs, or customers. An important class of data analytics concerns large-scale set operations. Suppose you have two customer data sets A and B. Set A contains all customers who bought in 2017. Set B contains all customers who bought in 2018. Your boss asks you …

[2-min CS Papers] A Short Introduction to the TensorFlow System

Machine Learning (ML) is a sought-after skill in today’s automated world. Google is one of the key players in the Machine Learning space. With the growing scale and popularity of deep learning, the limitations of a single machine become more and more pronounced. Google’s response to this challenge is the distributed TensorFlow system. TensorFlow is a Github project published in …

[2-min Computer Science Papers] A Short Introduction into the MapReduce System

Google processes huge data sets. For example, the search engine needs to know how often words occur in web documents. The data sets are too large for a single computer. As a single computer has only eight CPUs, processing the data takes forever. Hence, Google uses hundreds of computers in parallel. But writing parallel programs is hard. You must tell …