Posts

Showing posts from August, 2024

Computing the probability of generating conflicting random strings

At work today, I was faced with the problem of generating unique, random identifiers for wire transfers that would fit inside of an ACH descriptor field. An ACH descriptor may contain at most 10 upper-case characters; for the random identifiers, we wished to restrict ourselves to letters only. However, with so few characters available in such a short string, the risk of generating conflicting IDs seemed significant, so I did some napkin math and quickly found that there was about a 50% chance of generating a conflict within 10 million identifiers. Not really a long-term scalable solution. I was particulary curious about whether adding digits to the IDs would result in a quantifiable improvement to the collision probability. After getting home, I did the math more rigorously on a notepad. First, preliminaries: there are \(n = 26^{10}\) possible strings we can generate. We consider a trial of length \(k\) to be a sequence of events where we select \(k - 1\) distinct...