What Does This Code do?

A colleague opened a pull requests with this code in it a some months ago ago.

self.nextId = function() {
    // a uuid generator found on the interwebs
    return ([1e7]+-1e3+-4e3+-8e3+-1e11).replace(/[018]/g, c =>
        (c ^ window.crypto.getRandomValues(new Uint8Array(1))[0] & 15 >> c / 4).toString(16)
    );
};

I reviewed the code and left this comment

The word next is misleading because to me that implies some ordering and the
uuids generated are random. Also, is there a reason we need true uuids instead
of doing something like sirepo-lattice.js? If there is a reason I think adding a
comment about it in the code would be helpful.

While the comment was fine, I hope, I did not actually understand this code which is a problem since I reviewed it. So, here is my attempt to go back and figure out what the code actually does.

This is just a stream of consciousness of me solving the problem. It bounces around. For better or worse that is how I think.

Breaking down the code

There is quite a bit of this code I don't understand so let's dive right in.

I think 1e7 is scientific notation.

> 1e7
10000000

The REPL confirms that it is.

Why is 1e7 an array and what does it mean to add the other values to it?

> [1] + 2
'12'
> x = [1] + 2
'12'
> typeof x
'string'

WAT?? (If you do nothing else with this article please stop and watch this talk.)

JS is always full of surprises. Some days I kick myself wishing I could remember all of the gotchas. Other days I'm just happy if I remember to zip up my pants after going to the bathroom. On those days I can not be bothered by remembering details like this. What I do try and remember is that JS has strangeness like this. When doing any operation to two things of different types it is a good idea to slow down and make sure I get it right.

> [1e7]+-1e3+-4e3+-8e3+-1e11
'10000000-1000-4000-8000-100000000000'

Ok, we have a string of some numbers separated by dashes. Looks something like a UUID.

At this point I get distracted and think "what happens when I just search 'js uuid generate'."

The first result is, as expected, a link to StackOverflow.

The top answer mentions the UUID module

In the simple case (all we need) that module just calls crypto.randomUUID(). Why can't we just call that? Off to caniuse.com.

Ugh, we don't have the strictest requirements for supporting old browsers but if the latest Safari (15.1) does not support it then it won't work for us.

This makes me wonder how JS implementations implement it. A quick search yields the W3C spec. The description of the spec does not look much like the code we are using. I then go to the Node.js docs. Why don't the docs link to the source code? That is something ruby-doc.org really got right. Anyways, the code is not that hard to find. Again, this looks like the spec and not much like our code.

From reading the spec and looking at our code I think our code is probably taking some short cuts. Reading through the SO comments this seems to be the case. Again, I don't feel it is that big of a deal because the likelihood of a collision seems pretty small and fairly harmless (the UUID's are for one simulation for one user).

Back to the code...

'10000000-1000-4000-8000-100000000000'
> ([1e7]+-1e3+-4e3+-8e3+-1e11).replace(/[018]/g, c => console.log(c));
1
0
0
<snip>
> ([1e7]+-1e3+-4e3+-8e3+-1e11).replace(/[018]/g, c => 'e');
'eeeeeeee-eeee-4eee-eeee-eeeeeeeeeeee'

This is replacing '0', '1', and '8' with the result of the callback.

Now I need to understand the structure of a UUID.

The '4' that is not replaced is the version of the UUID spec. Version 4 means random.

Why is the '8' getting replaced though? That is the "variant" and has some meaning I don't yet fully understand. Maybe the c ^ ... does some mask operation that includes it?

There are still some missing pieces but I'm starting to see that we are mostly just replacing values in the original string with randomly generated values.

Now, let's understand some about the random generation.

> crypto.getRandomValues(new Uint8Array(1))[0]
Uncaught TypeError: crypto.getRandomValues is not a function

Gr888. Maybe my version of node is out of date? Yes, but also there is this handy comment in the code saying I can just use randomFillSync

> crypto.randomFillSync(new Uint8Array(1))[0]
124
> crypto.randomFillSync(new Uint8Array(1))[0]
91
> crypto.randomFillSync(new Uint8Array(2))
Uint8Array(2) [ 39, 53 ]

Alright, we are getting one 8 bit unsigned integer. But, now I'm starting to get scared. Putting together this whole line is going to be tricky.

(c ^ window.crypto.getRandomValues(new Uint8Array(1))[0] & 15 >> c / 4).toString(16)

There is more bitwise operations going on than I am comfortable with. I need to remember operator precedence rules (does the >> happen before or after the & or does it even matter)? And, I need to understand why we do each operation as well as what the magic numbers 15, 4, and 16 all mean.

Back to the REPL.

> c = 1
1
> c ^ crypto.randomFillSync()[0]
9
> a = new Uint8Array(1)
Uint8Array(1) [ 0 ]
> a[0] = 34
34
> x = a[0]
34
> c ^ x
35
> 0 ^ x
34
> x = 254
254
> 1 ^ x
255
> 0 ^ x
254
> 1 ^ 1
0
> a = new Uint8Array(1)
Uint8Array(1) [ 0 ]
> a[0] = 260
260
> a[0]
4
<snip>

I did a lot more messing around. I realized I was mistaking ^ with |. Clearly I don't do much bit twiddling. ^ is XOR.

I am still not quite sure what this step is doing but I think it does not matter for the 1's or 0's. It has to do with the 8.

Looking at other SO answers this seems to be the case. That answer does some masking in the 8 case but not in the other cases.

Things are still unclear and I realize I could be thrown off by operator precedence. It turns out I am getting it wrong

So the division happens first

> 1/4
0.25
> 0/4
0
> 8/4
2

Then the bit shift

> 15 >> 1 / 4
15
> 15 >> 0 / 4
15
> 15 >> 8 / 4
3

Then the bitwise AND

> 0b1111
15
> 0b0011
3

15 and 3 are the two possible values for the result of the &.

0b1111 means that any bit that is set in the number being & with will be present in the result. So, once again the 0 vs 1 is confirmed to not mean anything because the original number is just being passed through.

0b0011 means that the leading two bits will be 0 in the output of the &. Why is this necessary? Let's look at the ^

8 is 0b1000. Then we are doing a ^ with some number that is in the form 0b00xx (the x's mean could be 1 or 0 since we did an & with a 1 in that position). So this results in some number of the form Ob10xx.

Remember that the 8 is the "variant" bit in the UUID? Looking at the Wikipedia page for UUID's I see that a variant of the form 0b10xx is a variant that conforms to DCE 1.1, ISO/IEC 11578:1996. That is the variant for the spec we are after!

The final bit of code is the toString(16). I know that UUID's are made up of hex characters and that hex is base 16. Looking at the JS docs I see that specifying a number in toString sets the radix (aka the base). So, setting it to 16 makes it a base 16 (hex) string.

Holy moly! There were moments in there I wasn't sure I was going to be able to figure all of this out. Putting it all together, the code creates a string of the right shape as a UUID with a few special characters (4 and 8). Everything besides the special characters are replaced with a random character in the hex range. The 4 is not replaced at all. The 8 is replaced with a value that conforms to 0b10xx. That's all that's needed to create the UUID we are after.

Thoughts on the code

Now looking at this code it makes sense (yay:)) but honestly it looks to me like code golf.

Using scientific notation (ex 1e7) is a cool way to make something with 8 characters but throws the reader off in a couple of important ways. First, the number is going to be converted to a string so it is confusing to use a number. Second, the number is made up of a 1 followed by n 0's. When I read code I'm looking for patterns. This pattern is actually totally not important. The 1's vs 0's make no difference in the final result (only the field with the leading 4 and 8 mattered).

Using the syntax [1e7]+-1e3 is using bizarre (IMHO) type rules in JS. You get a string, and the - isn't for creating a negative number but for the hyphen separating the parts of the UUID.

c ^ window.crypto.getRandomValues(new Uint8Array(1))[0] & 15 >> c / 4 is a fairly hard to understand line too. It should be read from right to left (the opposite of the way code is normally read) because of operator precedence. Operator precedence is something developers need to know but also isn't something they will know off of the top of their head. Especially for operators that don't appear anywhere else in our code (^, &, >>). I think too many ideas are happening in one line. The terse vs readable balance has swung too far in the terse direction. In addition, there are the magic numbers of 15 and 4. They make sense once you break the code down but having a function/named const I think would be better.

Conclusion

Ultimately, I think the code is too terse but maybe that is a matter of taste and it is an exercise in futility to make it clearer. It is pretty unclear to the untrained eye what it does. But, I never know if the eye should become more trained (read: I should become a better programmer) or if the code should be re-written to be clearer.

The code is in a small part of our codebase, isn't mission critical, can be recovered from if it is buggy, and is constrained neatly within a function call. So, it is probably ok as it is but I still can't help but try and re-write it.

Here is my attempt

// Generate a version 4 (random) variant DCE 1.1, ISO/IEC 11578:1996 UUID
// Adapted from https://stackoverflow.com/a/2117523
// TODO(e-carlin): When more browsers support crypto.randomUUID() use it instead
self.randomUuid = function() {
    function randomHexValues(count, prefix) {
        return (prefix || '') + [...Array(count)].map(
            _ => toHexString(randomUint4())
        ).join('');
    }

    function randomUint4() {
        return window.crypto.getRandomValues(new Uint8Array(1))[0] >> 4;
    }

    function toHexString(uint4) {
        return uint4.toString(16);
    }

    function variantHexValue() {
        // Variant must begin with 0b10xx
        // See https://en.wikipedia.org/wiki/Universally_unique_identifier#Variants
        let v = randomUint4();
        v = v | 0b1000;
        v = v & 0b1011;
        return toHexString(v) + randomHexValues(3);
    }

    const VERSION = '4';

    return [
        randomHexValues(8),
        randomHexValues(4),
        randomHexValues(3, VERSION),
        variantHexValue(),
        randomHexValues(12),
    ].join('-')
};

This is much less concise. But, since our code rarely handles things like this I think being less concise here is better. By being less concise there are many breadcrumbs sprinkled throughout that I think help explain what is going on. In addition, I think it is possible it will make developers smarter by reading it. If you don't know about UUID's (I didn't before writing this) then you at least learn they have this specific layout and there are versions and variants embedded in their structure. Ultimately, when browsers support randomUUID we'll be using that and leave UUID generation up to the pro's.

Now, off to forget everything I just learned about UUID's, operator precedence, and bitwise operators.