设计一个计数器

作者：ideahub

添加时间：2022-02-27 22:24:25

来源：

打印本文小中大

无

添加时间：2022-02-27 22:24:25

评分：

防伪码：CMSEASYc8F1WNG3uYwbjkb909

“设计命中计数器”问题最近被包括 Dropbox 在内的许多公司提出，这个问题比看起来更难。它包括几个主题，如基本数据结构设计、各种优化、并发和分布式计数器。

它应该支持以下两个操作：hit和getHits。

hit(timestamp) – 显示给定时间戳的命中。

getHits(timestamp) – 返回过去 5 分钟（300 秒）内收到的命中数（来自 currentTimestamp）。

每个函数都接受一个时间戳参数（以秒为单位），您可以假设调用系统是按时间顺序进行的（即时间戳是单调递增的）。您可以假设最早的时间戳从 1 开始。

例子：

HitCounter 计数器 = new HitCounter();

// 在时间戳 1 处命中。

反击（1）；

// 在时间戳 2 处命中。

反击（2）；

// 在时间戳 3 处命中。

反击（3）；

// 在时间戳 4 处获得命中，应该返回 3。

counter.getHits(4);

// 在时间戳 300 处命中。

反击（300）；

// 在时间戳 300 处获得命中，应返回 4。

counter.getHits(300);

// 在时间戳 301 处获得命中，应返回 3。

counter.getHits(301);

问：微软、亚马逊、Dropbox 和更多公司。

1.简单的解决方案（蛮力方法）：

我们可以使用一个向量来存储所有的命中。这两个功能是自我解释的。

vector<int> v;

/* Record a hit.

@param timestamp - The current timestamp (in

seconds granularity). */

void hit(int timestamp)

{

v.push_back(timestamp);

}

// Time Complexity : O(1)

/** Return the number of hits in the past 5 minutes.

@param timestamp - The current timestamp (in

seconds granularity). */

int getHits(int timestamp)

{

int i, j;

for (i = 0; i < v.size(); ++i) {

if (v[i] > timestamp - 300) {

break;

}

return v.size() - i;

}

// Time Complexity : O(n)

2、空间优化方案：

我们可以使用队列来存储命中并删除队列中无用的条目。它将节省我们的空间。

我们正在从队列中删除多余的元素。

queue<int> q;

/** Record a hit.

@param timestamp - The current timestamp

(in seconds granularity). */

void hit(int timestamp)

{

q.push(timestamp);

}

// Time Complexity : O(1)

/** Return the number of hits in the past 5 minutes.

@param timestamp - The current timestamp (in seconds

granularity). */

int getHits(int timestamp)

{

while (!q.empty() && timestamp - q.front() >= 300) {

q.pop();

}

return q.size();

}

// Time Complexity : O(n)

3.最优化的解决方案：

如果数据是无序的并且多个命中带有相同的时间戳怎么办。

由于队列方法在没有有序数据的情况下无法工作，因此这次使用数组来存储每个时间单位的命中计数。

如果我们以 300 秒为单位跟踪过去 5 分钟内的命中，则创建 2 个大小为 300 的数组。

int[] hits = new int[300];

时间戳[] 次 = 新时间戳[300]; // 最后计数的命中

的时间戳给定一个传入，将其时间戳修改 300 以查看它在命中数组中的位置。

int idx = 时间戳 % 300; => hits[idx] 保持命中计数发生在这一秒

但在我们将 idx 的命中计数增加 1 之前，时间戳确实属于 hits[idx] 正在跟踪的第二个。

timestamp[i] 存储最后计数命中的时间戳。

如果 timestamp[i] > timestamp，这个命中应该被丢弃，因为它在过去 5 分钟内没有发生。

如果 timestamp[i] == timestamp，则 hits[i] 增加 1。

如果 timestamp[i] currentTime – 300。

vector<int> times, hits;

times.resize(300);

hits.resize(300);

/** Record a hit.

@param timestamp - The current timestamp

(in seconds granularity). */

void hit(int timestamp)

{

int idx = timestamp % 300;

if (times[idx] != timestamp) {

times[idx] = timestamp;

hits[idx] = 1;

}

else {

++hits[idx];

}

// Time Complexity : O(1)

/** Return the number of hits in the past 5 minutes.

@param timestamp - The current timestamp (in

seconds granularity). */

int getHits(int timestamp)

{

int res = 0;

for (int i = 0; i < 300; ++i) {

if (timestamp - times[i] < 300) {

res += hits[i];

}

return res;

}

// Time Complexity : O(300) == O(1)

如何处理并发请求？

当两个请求同时更新列表时，可能会出现竞争条件。最先更新列表的请求可能最终不会被包含在内。

最常见的解决方案是使用锁来保护列表。每当有人想要更新列表（通过添加新元素或删除尾部）时，都会在容器上放置一个锁。操作完成后，列表将被解锁。

当您没有大量请求或性能不是问题时，这非常有效。有时放置锁的成本可能很高，当并发请求过多时，锁可能会阻塞系统并成为性能瓶颈。

分发计数器

当单台机器获得太多流量并且性能成为问题时，正是考虑分布式解决方案的最佳时机。分布式系统通过将系统扩展到多个节点，显着减少了单台机器的负担，但同时增加了复杂性。

假设我们将访问请求平均分配给多台机器。我想首先强调平等分配的重要性。如果特定机器比其他机器获得更多的流量，则系统无法充分利用，在设计系统时考虑到这一点非常重要。在我们的例子中，我们可以获得用户电子邮件的哈希值并通过哈希值进行分发（直接使用电子邮件不是一个好主意，因为某些字母可能比其他字母出现得更频繁）。

为了计算数量，每台机器独立工作，从过去一分钟开始计算自己的用户。当我们请求全局编号时，我们只需要将所有计数器相加即可。

上一篇无

下一篇无

联系我们