Solution
Reservoir Sampling is a technique used to randomly select \(k \) samples from a
potentially large or infinite stream of data, where the total number of elements is
unknown in advance. It ensures that each item has an equal probability of being selected,
even when the dataset is too large to store in memory.
How Does it Work :
The algorithm works by maintaining a "reservoir" (a list of size \(k\)) that holds the selected samples. It processes elements from stream one by one and decides whether to replace an existing element in the reservoir based on probability.
-
Initialize a reservoir with the first \(k \) elements from the stream.
-
For each new incoming element (starting from index \(k \)+1):
-
Generate a random index \(j\) between \(0\) and the current index \(i \).
-
if \(j\) is less than \(k\), replace the reservoir element at index \(j\) with the new element.
-
Continue until all elements are processed.
-
The final reservoir contains \(k\) randomly selected elements with equal probability
import random
from typing import List
class Solution:
def reservoir_sampling(self, stream:List, k:int):
reservoir = []
for i, element in enumerate(stream):
if i < k: # Fill the reservoir with first k elements
reservoir.append(element)
else:
j = random.randint(0, i) # Pick a random index from 0 to i
if j < k:
reservoir[j] = element # Replace an element in the reservoir
return reservoir