A colleague opened a pull requests with this code in it a some months ago ago.
self.nextId = function() {
// a uuid generator found on the interwebs
return ([1e7]+-1e3+-4e3+-8e3+-1e11).replace(/[018]/g, c =>
(c ^ window.crypto.getRandomValues(new Uint8Array(1))[0] & 15 >> c / 4).toString(16)
);
};
I reviewed the code and left this comment
The word next is misleading because to me that implies some ordering and the
uuids generated are random. Also, is there a reason we need true uuids instead
of doing something like sirepo-lattice.js? If there is a reason I think adding a
comment about it in the code would be helpful.
While the comment was fine, I hope, I did not actually understand this code which is a problem since I reviewed it. So, here is my attempt to go back and figure out what the code actually does.
This is just a stream of consciousness of me solving the problem. It bounces around. For better or worse that is how I think.
Breaking down the code
There is quite a bit of this code I don't understand so let's dive right in.
I think 1e7
is scientific notation.
> 1e7
10000000
The REPL confirms that it is.
Why is 1e7
an array and what does it mean to add the other values to
it?
> [1] + 2
'12'
> x = [1] + 2
'12'
> typeof x
'string'
WAT?? (If you do nothing else with this article please stop and watch this talk.)
JS is always full of surprises. Some days I kick myself wishing I could remember all of the gotchas. Other days I'm just happy if I remember to zip up my pants after going to the bathroom. On those days I can not be bothered by remembering details like this. What I do try and remember is that JS has strangeness like this. When doing any operation to two things of different types it is a good idea to slow down and make sure I get it right.
> [1e7]+-1e3+-4e3+-8e3+-1e11
'10000000-1000-4000-8000-100000000000'
Ok, we have a string of some numbers separated by dashes. Looks something like a UUID.
At this point I get distracted and think "what happens when I just search 'js uuid generate'."
The first result is, as expected, a link to StackOverflow.
The top answer mentions the UUID module
In the simple case (all we need) that module just calls
crypto.randomUUID()
. Why can't we just call that? Off to
caniuse.com.
Ugh, we don't have the strictest requirements for supporting old browsers but if the latest Safari (15.1) does not support it then it won't work for us.
This makes me wonder how JS implementations implement it. A quick search yields the W3C spec. The description of the spec does not look much like the code we are using. I then go to the Node.js docs. Why don't the docs link to the source code? That is something ruby-doc.org really got right. Anyways, the code is not that hard to find. Again, this looks like the spec and not much like our code.
From reading the spec and looking at our code I think our code is probably taking some short cuts. Reading through the SO comments this seems to be the case. Again, I don't feel it is that big of a deal because the likelihood of a collision seems pretty small and fairly harmless (the UUID's are for one simulation for one user).
Back to the code...
'10000000-1000-4000-8000-100000000000'
> ([1e7]+-1e3+-4e3+-8e3+-1e11).replace(/[018]/g, c => console.log(c));
1
0
0
<snip>
> ([1e7]+-1e3+-4e3+-8e3+-1e11).replace(/[018]/g, c => 'e');
'eeeeeeee-eeee-4eee-eeee-eeeeeeeeeeee'
This is replacing '0', '1', and '8' with the result of the callback.
Now I need to understand the structure of a UUID.
The '4' that is not replaced is the version of the UUID spec. Version 4 means random.
Why is the '8' getting replaced though? That is the
"variant" and has
some meaning I don't yet fully understand. Maybe the c ^ ...
does
some mask operation that includes it?
There are still some missing pieces but I'm starting to see that we are mostly just replacing values in the original string with randomly generated values.
Now, let's understand some about the random generation.
> crypto.getRandomValues(new Uint8Array(1))[0]
Uncaught TypeError: crypto.getRandomValues is not a function
Gr888. Maybe my version of node is out of date? Yes, but also there is
this handy
comment
in the code saying I can just use randomFillSync
> crypto.randomFillSync(new Uint8Array(1))[0]
124
> crypto.randomFillSync(new Uint8Array(1))[0]
91
> crypto.randomFillSync(new Uint8Array(2))
Uint8Array(2) [ 39, 53 ]
Alright, we are getting one 8 bit unsigned integer. But, now I'm starting to get scared. Putting together this whole line is going to be tricky.
(c ^ window.crypto.getRandomValues(new Uint8Array(1))[0] & 15 >> c / 4).toString(16)
There is more bitwise operations going on than I am comfortable
with. I need to remember operator precedence rules (does the >>
happen before or after the &
or does it even matter)? And, I need to
understand why we do each operation as well as what the magic numbers
15, 4, and 16 all mean.
Back to the REPL.
> c = 1
1
> c ^ crypto.randomFillSync()[0]
9
> a = new Uint8Array(1)
Uint8Array(1) [ 0 ]
> a[0] = 34
34
> x = a[0]
34
> c ^ x
35
> 0 ^ x
34
> x = 254
254
> 1 ^ x
255
> 0 ^ x
254
> 1 ^ 1
0
> a = new Uint8Array(1)
Uint8Array(1) [ 0 ]
> a[0] = 260
260
> a[0]
4
<snip>
I did a lot more messing around. I realized I was mistaking ^
with
|
. Clearly I don't do much bit twiddling. ^
is
XOR.
I am still not quite sure what this step is doing but I think it does not matter for the 1's or 0's. It has to do with the 8.
Looking at other SO answers this seems to be the case. That answer does some masking in the 8 case but not in the other cases.
Things are still unclear and I realize I could be thrown off by operator precedence. It turns out I am getting it wrong
So the division happens first
> 1/4
0.25
> 0/4
0
> 8/4
2
Then the bit shift
> 15 >> 1 / 4
15
> 15 >> 0 / 4
15
> 15 >> 8 / 4
3
Then the bitwise AND
> 0b1111
15
> 0b0011
3
15 and 3 are the two possible values for the result of the &
.
0b1111
means that any bit that is set in the number being &
with
will be present in the result. So, once again the 0 vs 1 is confirmed
to not mean anything because the original number is just being passed
through.
0b0011
means that the leading two bits will be 0 in the output of
the &
. Why is this necessary? Let's look at the ^
8 is 0b1000
. Then we are doing a ^
with some number that is in the
form 0b00xx
(the x's mean could be 1 or 0 since we did an &
with a
1 in that position). So this results in some number of the form
Ob10xx
.
Remember that the 8 is the "variant" bit in the UUID? Looking at the
Wikipedia
page
for UUID's I see that a variant of the form 0b10xx
is a variant that
conforms to DCE 1.1, ISO/IEC 11578:1996. That is the variant for the
spec we are after!
The final bit of code is the toString(16)
. I know that UUID's are
made up of hex characters and that hex is base 16. Looking at the JS
docs
I see that specifying a number in toString sets the radix (aka the
base). So, setting it to 16 makes it a base 16 (hex) string.
Holy moly! There were moments in there I wasn't sure I was going to be
able to figure all of this out. Putting it all together, the code
creates a string of the right shape as a UUID with a few special
characters (4 and 8). Everything besides the special characters are
replaced with a random character in the hex range. The 4 is not
replaced at all. The 8 is replaced with a value that conforms to
0b10xx
. That's all that's needed to create the UUID we are after.
Thoughts on the code
Now looking at this code it makes sense (yay:)) but honestly it looks to me like code golf.
Using scientific notation (ex 1e7
) is a cool way to make something
with 8 characters but throws the reader off in a couple of important
ways. First, the number is going to be converted to a string so it is
confusing to use a number. Second, the number is made up of a 1
followed by n 0's. When I read code I'm looking for patterns. This
pattern is actually totally not important. The 1's vs 0's make no
difference in the final result (only the field with the leading 4 and
8 mattered).
Using the syntax [1e7]+-1e3
is using bizarre (IMHO) type rules in
JS. You get a string, and the -
isn't for creating a negative number
but for the hyphen separating the parts of the UUID.
c ^ window.crypto.getRandomValues(new Uint8Array(1))[0] & 15 >> c /
4
is a fairly hard to understand line too. It should be read from
right to left (the opposite of the way code is normally read) because
of operator precedence. Operator precedence is something developers
need to know but also isn't something they will know off of the top of
their head. Especially for operators that don't appear anywhere else
in our code (^
, &
, >>
). I think too many ideas are happening in
one line. The terse vs readable balance has swung too far in the terse
direction. In addition, there are the magic numbers of 15 and 4. They
make sense once you break the code down but having a function/named
const I think would be better.
Conclusion
Ultimately, I think the code is too terse but maybe that is a matter of taste and it is an exercise in futility to make it clearer. It is pretty unclear to the untrained eye what it does. But, I never know if the eye should become more trained (read: I should become a better programmer) or if the code should be re-written to be clearer.
The code is in a small part of our codebase, isn't mission critical, can be recovered from if it is buggy, and is constrained neatly within a function call. So, it is probably ok as it is but I still can't help but try and re-write it.
Here is my attempt
// Generate a version 4 (random) variant DCE 1.1, ISO/IEC 11578:1996 UUID
// Adapted from https://stackoverflow.com/a/2117523
// TODO(e-carlin): When more browsers support crypto.randomUUID() use it instead
self.randomUuid = function() {
function randomHexValues(count, prefix) {
return (prefix || '') + [...Array(count)].map(
_ => toHexString(randomUint4())
).join('');
}
function randomUint4() {
return window.crypto.getRandomValues(new Uint8Array(1))[0] >> 4;
}
function toHexString(uint4) {
return uint4.toString(16);
}
function variantHexValue() {
// Variant must begin with 0b10xx
// See https://en.wikipedia.org/wiki/Universally_unique_identifier#Variants
let v = randomUint4();
v = v | 0b1000;
v = v & 0b1011;
return toHexString(v) + randomHexValues(3);
}
const VERSION = '4';
return [
randomHexValues(8),
randomHexValues(4),
randomHexValues(3, VERSION),
variantHexValue(),
randomHexValues(12),
].join('-')
};
This is much less concise. But, since our code rarely handles things
like this I think being less concise here is better. By being less
concise there are many breadcrumbs sprinkled throughout that I think
help explain what is going on. In addition, I think it is possible it
will make developers smarter by reading it. If you don't know about
UUID's (I didn't before writing this) then you at least learn they
have this specific layout and there are versions and variants embedded
in their structure. Ultimately, when browsers support randomUUID
we'll be using that and leave UUID generation up to the pro's.
Now, off to forget everything I just learned about UUID's, operator precedence, and bitwise operators.