Calculating Total Duration of Top K Watched Shows in Python

March 3, 2024 by Emily Rosemary Collins

💡 Problem Formulation: You are given a list of television shows with their accompanying durations in minutes. Your task is to write a Python program to calculate the total duration of the ‘k’ most-watched shows. For example, if the input is a list of shows with durations [("Friends", 500), ("Breaking Bad", 300), ("Game of Thrones", 700)] and k=2, the desired output should be the sum of the two most-watched shows’ durations, which is 1200 minutes.

Method 1: Using Sort and Slice

This method involves sorting the list of shows by duration in descending order and then selecting the first ‘k’ shows to sum their durations. The sorted() function is used to sort the data, and slicing allows us to pick the top ‘k’ elements from the list.

Here’s an example:

shows = [("Friends", 500), ("Breaking Bad", 300), ("Game of Thrones", 700)]
k = 2
sorted_shows = sorted(shows, key=lambda show: show[1], reverse=True)
total_duration = sum(duration for name, duration in sorted_shows[:k])
print(f"Total duration: {total_duration} minutes")

Output: Total duration: 1200 minutes

In this code, we sort the shows by their duration and then slice the list to get the first ‘k’ elements. We then calculate the total duration by summing the durations using a generator expression.

Method 2: Using heapq.nlargest

The heapq.nlargest() function is specifically designed to find the ‘n’ largest items from a dataset. This method is preferable when ‘k’ is small compared to the size of the list, as it does not require sorting the entire list, but rather constructs a min-heap and populates it with the ‘k’ largest items, which is more efficient.

Here’s an example:

import heapq

shows = [("Friends", 500), ("Breaking Bad", 300), ("Game of Thrones", 700)]
k = 2
largest_shows = heapq.nlargest(k, shows, key=lambda show: show[1])
total_duration = sum(duration for name, duration in largest_shows)
print(f"Total duration: {total_duration} minutes")

Output: Total duration: 1200 minutes

This code snippet uses the heapq.nlargest() function to fetch the ‘k’ shows with the largest duration without sorting the entire shows list, leading to potentially better performance, and then sums their durations.

Method 3: Using Loop and Conditional

An explicit loop with a conditional check can be used to traverse the list and maintain a running total of the durations of the ‘k’ longest shows. This method requires no imports and is straightforward but may not be as efficient as the previous methods for finding the largest elements.

Here’s an example:

shows = [("Friends", 500), ("Breaking Bad", 300), ("Game of Thrones", 700)]
k = 2
shows.sort(key=lambda show: show[1], reverse=True)
total_duration = 0
for i in range(k):
    total_duration += shows[i][1]
print(f"Total duration: {total_duration} minutes")

Output: Total duration: 1200 minutes

This code manually iterates over the first ‘k’ elements of the sorted list to calculate the total duration. It’s less Pythonic and usually slower, particularly for large datasets or large ‘k’ values.

Method 4: Using pandas DataFrame

For larger datasets or situations requiring further analysis, pandas can be used. A DataFrame is created from the list of shows, and then the nlargest method is called on the ‘duration’ column to obtain the top ‘k’ rows. This method can be advantageous when additional data manipulation and analysis are required.

Here’s an example:

import pandas as pd

data = {"Shows": ["Friends", "Breaking Bad", "Game of Thrones"], "Duration": [500, 300, 700]}
k = 2
df = pd.DataFrame(data)
total_duration = df.nlargest(k, 'Duration')['Duration'].sum()
print(f"Total duration: {total_duration} minutes")

Output: Total duration: 1200 minutes

This code uses pandas to create a DataFrame and then calculates the sum of the ‘k’ largest values in the ‘Duration’ column. While powerful, this method introduces a significant dependency if you’re not already using pandas for other tasks.

Bonus One-Liner Method 5: Using List Comprehension and sorted

A concise one-liner can be used combining list comprehension with the sorted() function to achieve the same result, potentially sacrificing code clarity.

Here’s an example:

shows = [("Friends", 500), ("Breaking Bad", 300), ("Game of Thrones", 700)]
k = 2
print(f"Total duration: {sum([duration for name, duration in sorted(shows, key=lambda show: show[1], reverse=True)[:k]])} minutes")

Output: Total duration: 1200 minutes

This one-liner is a compact version of Method 1, suited for cases where code brevity is preferred over readability and maintainability.

Summary/Discussion

Method 1: Using Sort and Slice. Strengths: Simple to understand and write. Weaknesses: May not be the most efficient for large lists or large ‘k’.
Method 2: Using heapq.nlargest. Strengths: More efficient for large lists and small ‘k’. Weaknesses: Slightly complex, requires import.
Method 3: Using Loop and Conditional. Strengths: Explicit control over the operation. Weaknesses: Verbose, potentially less efficient.
Method 4: Using pandas DataFrame. Strengths: Good for additional data manipulation, clean syntax. Weaknesses: Overkill for simple tasks, requires pandas dependency.
Bonus One-Liner Method 5: Using List Comprehension and sorted. Strengths: Extremely concise. Weaknesses: Potentially difficult to read and maintain.