Real Hashing

Fix . Let have size . Consider the following problem:

There is a circular race track of circumference . people, with (distinct) speeds start at the start of the race track, and run around at their speed, until a random time . After this, the runners are at locations . I then cut the race track into contiguous chunks, each of length and count how many runners are in each chunk. Let , the “maxload”, denote the number of runners in the fullest chunk. This is a random variable depending on the time when I stop the runners. In this document I will analyze .

Relation to prior work

TFAE:

(by rescaling distances).
( for all , so are very unlikely to be in different buckets).

In other words, the problem considered in this post, which I’ll call “real hashing” is equivalent to a special case of the linear hashing problem. I think that real hashing is basically just as interesting as the normal linear hashing problem, and am quite happy to study it.

Because real hashing is a special case of the normal linear hashing problem, Knudsen’s argument here shows that . Getting stronger bounds on is my favorite open math problem. Ultimately, I believe a bound of . In this document, I’ll endeavor to get something better than for real hashing.

Defining triple collisions

Define to be the probability that all end up in the same bin.

Observe:

In what follows, I’ll write instead of to denote the bin size for notational convenience and clarity.

The triple collision lemma

Lemma Let . Then,

Proof: By rescaling the length of the circle we have:

Integers are more favorable than arbitrary reals, so we can switch to taking an integer:

Now, here’s a helpful fact from number theory:

The equation has a unique solution in .

There are relevant equations and relevant equations . Overall we get:

Remark: Intuitively, you’d have guessed . However, this isn’t always true. For instance, .

The “all numbers are sufficiently coprime” lemma

Lemma:

Proof: By the triple collision lemma:

The term gives us .

We’ll now analyze the second term.

I think you can do better than the analysis that I’m about to do, using a technique from Alek’s paper on LH. But the below method suffices for our needs.

Partition dyadically into sets .

We have:

Fix any . Let , . Because we dyadically partitioned things, , .

We have:

There are at most many with . So . Also, trivially . Recall the fact that for any number , the number of is at most .

Thus,

as desired.

Overall, this means that we’ve bounded the second and third terms of the big sum that we care about for the lemma by .

To finish the lemma we bound the fourth term.

We use a similar set of tricks to the ones that we used in bounding the second term. We dyadically partition in the same way. The fourth term is then at most

Fix any , and define . Then,

This last expression is bounded by

A famous result in number theory is that

This is obviously way overkill, but anyways, applying this we bound the expression that we’re working on by

And, we’re done.

A simple bound on maxload

Theorem:

Proof Clearly, is a lower bound on the total number of triple collisions. By Jensen’s inequality we have:

Recalling that we are basically ok to replace with we get:

Applying the “all numbers are sufficiently coprime” lemma and setting we get:

Author’s note

I’m pretty sure that the above proof goes through, using the “quantum dinosaur” technique for normal linear hashing of the problem. But it’s probably at least a moderate amount of additional work. (The quantum dinos technique is just expertly using the fact that any can be expressed as where ).

The path forward

Ultimately, we’d like to prove some stronger bounds on . It might be easier to instead prove sth like , and it’s roughly equally cool, so I’ll focus on proving this slightly weaker statement.

Here’s the rough plan for how to prove this stronger bound:

Eat of the times around each fraction with denominator at most .
Figure out the quadruple collision lemma, and do some vibe checks on it. It’s going to be much trickier than I originally thought. It’s going to need to take into account that we ate some stuff.
Show that the expected number of quadruple collisions, over times that were not eaten is at most , by using the quadruple collision lemma and basic number theory.

Trying to prove a quadruple collision lemma

Let’s assume --- bc idk how to solve even given this assumption, and simpler problems are nice --- that is a set of prime numbers and .

Let be the set of such that is at least away from any fraction with denominator smaller than . We aim to bound:

Naively, you’d hope that . If you’re a bit more skeptical, you’ll notice that is quite large, and you might instead conjecture sth more like

Honestly, idk if I even expect this thing to be true.

Whatever happens, it’s all going to crucially depend on the fact that we’re taking .

Note that the weaker conjecture with all the stuff going on would be sufficient to cary the day, given our assumption that consists solely of primes (and it’d probably be fine regardless, but no point taking chances).

It feels very productive to write

for some . Then,

I’d like to argue that in the square , there are at most many points such that

This would be enough to win.

Note that banning is certainly going to be necessary, and it’s not clear that it’s going to suffice.

The remainder of the proof is left as an exercise to the reader.

By which I mean, I’ll think about this more next weekend probably.

🐱 Skyspace3.0

Explorer