(Replying to PARENT post)
Does it make the URL in the URL bar longer? Yeah, but does that matter?
(Replying to PARENT post)
(Replying to PARENT post)
(Replying to PARENT post)
[0] https://github.com/jetpack-io/typeid [1] https://github.com/ulid/spec [2] https://github.com/segmentio/ksuid
(Replying to PARENT post)
The good thing about UUID is that it's omnipresent. From what I've heard, it's this lengthy (2^32) because it was hard to guarantee uniqueness when it was conceived in the telecom industry. The length is overkill, and per se, that's fine, but the fact that it dampers communication is awful.
That all said, since posting this, I've come to terms with accepting that it's part of life ยฏ\_(ใ)_/ยฏ
P.S. Using a second human-friendly ID to end-users is an alternative adopted by some projects. However, most projects don't bother, and also, most good IDs you might want to share with people would make UUID unnecessary anyways (in practice).
(Replying to PARENT post)
Yes, UUIDs are overkill for most applications. But CPUs and hard drives are, relatively speaking, cheap. Using an existing, battle-tested unique ID library implementation has advantages. The 8 bytes per record you're saving over bigserial is, for most use cases, negligible. 1,000,000 rows? You'll save 8 MB by switching away from UUIDs.
Most databases won't be that large. Use a UUID if you want; pretend you'll have Really Big Data some day if it makes you happy. Render it using a special function if the hyphens are too ugly.
(Replying to PARENT post)
The argumentation in this article is pretty poor from my experience. A UUID isnโt meant to be handled by the non-technical end user. The end user usually doesnโt and shouldnโt care about the URL. I can assure there are bigger architectural problems in your design if your user has to care about accessible internal ids.
(Replying to PARENT post)
Both of those solutions typically make it hard for a DB if you write new entries, assuming you have an index on the ID.
In addition it might be more calming to actually be sure that a particular ID is not in use without doing a round-trip.
Is it practical to pre-allocate empty entries and reserve a set of them?
(Replying to PARENT post)
Nitpick: Google isn't concerned about the discoverability of private videos; those can only be viewed when granted access. You're thinking of unlisted videos.
(Replying to PARENT post)
Donโt forget: UUIDs are not dash-separated strings. Theyโre integers. You can render them differently if those dashes are sucky.
(Replying to PARENT post)
I like its simplicity. It is sequential. It has a very low probability of collision. And it is of a more reasonable length.
Because it encodes the time, theoretically you could use it to grab the CreatedDate of a record without a need for another field.
(Replying to PARENT post)
In particular databases (where UUIDs taking space was a big concern) they have largely switched to a packed binary format that makes the size of UUIDs over time a non issue for all practical purposes.
(Replying to PARENT post)
(Replying to PARENT post)
(Replying to PARENT post)
https://en.wikipedia.org/wiki/Snowflake_ID
They fit within 64 bits, allow for more than enough processes to handle 10k+ transactions per second, give enough of a timestamp headroom for decades into the future, and where ID generation can be made isolated to each process.
They don't work well for anything related to archival work, but you might as well use a regular ID for that anyway, unless you're also actively scraping terabytes of data off of the Internet every second, in which case UUIDv5's good enough for your extreme edge case.
...But at that point you might as well just roll your own 128-bit version of a Snowflake ID.
(Replying to PARENT post)
They will be in a deterministic order, but will appear semi-random to the end user.
For things like product IDs or user IDs, etc you don't actually need them to be random. But perhaps you don't want them to simply start counting sequentially.
(Replying to PARENT post)
1. Compact sequential is used, obviously, where order matters. It has the drawback of requiring a coordination with a singleton. (This can be sharded/vectorized, of course.) Aside from that, it also leaks the number of objects/transactions, just by looking for the highest number available. Can be varint-encoded very nicely.
2. Compact non-sequential is used where I need a small identifier, but not leak the number of objects. Since it's compact, I must still guarantee uniqueness as in (1). This is currently implemented using a block cipher on top of the compact sequential ID generator. The drawback of this is that the domain may still be in guessable territory, depending on how much is generated. A 64-bit integer filled with 4B only requires 4B guesses to hit a collision. I don't use this much. The key can never be rotated: a key number would eat up precious bits.
3. Sparse random, used where non-guessability is important, aside from not leaking rates/counts. Take a Google Docs sharable link as an example. I doubt YouTube cares about this. This is where something like a 128-bit number like UUID or ULID shines. The space is large enough that uniqueness is assumed, given a decent PRNG.
Sure, I try to use the nicest one at any given point (e.g. using a compact sequential instead of non-sequential during debugging.) But fact is that sparse random just tick more boxes.
4) Sparse, human readable. For "vouchers". They are bearer tokens that give requests more powers, e.g. to create an account or act as admin. These should be reasonably human readable, so they can be spoken. They obviously need to be sparse and hard to guess, which requires a trade-off in length.
I present them in three ways: base32, english words and QR-code. Pick one; they all do the same. For copy-pasting, base32 might be best (or base58, by all means.)
For shouting to a colleague, the sequence of english words might be better. I'd add other languages as needed: it's just a fixed list of words. The nice thing is it can encode the sequence in base-500 or base-1000 without being obnoxious. (The Matrix protocol and others use emoji lists, but it's the same idea. [1]) Finally, if you have a phone in your pocket or camera on your computer, perhaps the QR-code is the easiest way to use the voucher code.
[1] Actually, IIRC, Matrix only uses 64 emojis, which feels a bit wasteful.
(Replying to PARENT post)
(Replying to PARENT post)
https://colab.research.google.com/drive/1ec4n7Ex9bnkl_c45EUl...
(Replying to PARENT post)
- Don't invent your own ID datatype (especially 11 byte one), this is almost guaranteed to cause dangerous bugs, because each integration will have to carefully implement/hack it.
- Use UUID (preferably the new v7). 128-bit UUID is implemented, for you, pretty much everywhere.
- Serial integers still work too, but you should choose them consciously to fit the data model.
- Implement "natural keys" if you want pretty/memorable/Cool URLs. Never use non-standard PKs to store custom semantic data, because inevitably you will get garbage PKs that need to be fixed, and migrating a PK value is extremely risky.
(Replying to PARENT post)
We could encode the 128 bit UUID integer in base58 as well if needed. Its textual representation won't conform to the standard but we'd get the same number of bits. Which would be 2^122 or 2^121 bits of uniqueness not 2^128 if it's a proper UUID.
11base58 character certainly doesn't have the 2^122 bits. So we could decided separately if we could either reduce the number of bits needed and/or use a different encoding.
> If you click and buy any of these from Amazon after visiting the links above, I might get a commission from their Affiliate program.
:-)
(Replying to PARENT post)
A UUID can also be encoded in any form, it doesn't need to be represented as the dashed string notation which is common, you can just as easily use the base58 alphabet suggested in the post.
But the code in the post doesn't encode to base58 correctly. You need to map 58 bits of the input, sequentially, to one character in the alphabet as output. You can't just mod each byte of the input by the alphabet length and use the corresponding alphabet element.
(Replying to PARENT post)
[0] https://trustoverip.github.io/tswg-cesr-specification/draft-... [1] https://keri.one/keri-resources/ [2] https://trustoverip.github.io/tswg-keri-specification/draft-... [3] https://github.com/WebOfTrust/keripy/ [3] https://trustoverip.github.io/tswg-acdc-specification/draft-...
(Replying to PARENT post)
1. When dealing with user friendly IDs, it's often important to make sure there are no ambiguous characters. This is a UX requirement for anyone that may need to read an ID and type it in at any point. For example, this means removing certain confusing characters like "0oIi1l5sS", etc. You end up with a much smaller set of characters. In my case, young children were required to type in teacher codes. Needless to say, I had to limit a lot of possible confusing characters.
2. You will have collisions. How do you handle them (ie, retry N times until you get a valid one)? What happens if you can't get a valid one? This happened in a product of mine. We had ~8 character codes that were human readable and type-able and we... ran out of codes.
3. How can you embed more information into the code such as versioning? It's often useful to prefix a code with some info in the event you need to modify it later (such as expand the character set or length)?
(Replying to PARENT post)
(Replying to PARENT post)
But besides human friendly slugs, which usually have an ID mapping behind the scenes in my experience, it seems like there might be more work than value for startups in many cases.
(Replying to PARENT post)
Databases and lots of other things also have native support for them which makes them even more appealing to use.
Of course, there are plenty of reasons to avoid them like the article discusses. I'd almost certainly not criticize someone for choosing smaller IDs as long as security isn't an issue.
(Replying to PARENT post)
I like using GUID specifically because there's usually a built-in implementation everywhere, and because it's so stupidly huge, the likelihood of a conflict is statistically zero, meaning I don't need to bother synchronizing against a server.
Yes, it's technically possible I could save a few bytes of bandwidth or something by using some kind of base encoding, and maybe in some kind of embedded case that might matter, but for a vast majority of cases, GUID is absolutely fine, and since it's used everywhere, it's also very thoroughly tested by multi-billion-dollar corporations meaning that I don't have to worry much about any issues.
So no, you don't need GUID, you don't need a lot of stuff in the software world. I could poll bytes from /dev/urandom and probably get something that works well enough, I could turn off garbage collection and just pre-allocate all my memory before hand, I could avoid an OS entirely and write straight to the metal, I could do a lot of things, but I don't because I value my time . GUID solves a specific problem pretty well.
EDIT:
The reason I'm specifying this is because I think the title would be better if it was something like "You (probably) don't need UUID".
(Replying to PARENT post)
That's a terrible way to use it is all.
(Replying to PARENT post)
We have different ideas what 'microservice architecture' means, I guess.
One of the key points of UUIDs is to be able to generate (probably) non-conflicting values without coordination
(Replying to PARENT post)
For example, you can update a postgres integer 1000s of times per second, sequentially, and 100s of 1000s of times per second if you allow for gaps/ unordered sequencing. The reason to choose a UUID is because it doesn't require a db. Totally fair. But you probably already have a db lying around if you're generating UUIDs to begin with?
Or because you need an unguessable token, which a UUID is great for.
But if you have a postgres db around somewhere consider just having a counters table and using that. It'll be fast enough for almost anyone, it'll return smaller tokens, and it'll return tokens that are sequentially allocated (in range. These are nice properties to have!
(Replying to PARENT post)
UUIDs represent a trivially easy way of implementing a non-incrementing ID with a ridiculously unwieldy address space that makes it supremely unrealistic for the vast majority of users to mess with. Theyโre just going to give up long before the first successful hit.
(Replying to PARENT post)
(Replying to PARENT post)
(Replying to PARENT post)
id UUID DEFAULT gen_random_uuid()
(Replying to PARENT post)
(Replying to PARENT post)
(Replying to PARENT post)
(Replying to PARENT post)
If you need something user friendly, use the first 6 or 8 characters.
If you need to worry about index performance due to enormous index sizes, you probably know it.
(Replying to PARENT post)
(Replying to PARENT post)
Small correction here but it is not 128 random bits entirely: 6 bits are reserved for the version marker.
(Replying to PARENT post)
How readable is base58 in Arabic or Chinese? I've concluded that only the standard numbers 0123456789, algebra symbols +-/*= and phone symbols #* are universal enough for a global encoding. Any other insights are welcomed!
(Replying to PARENT post)
For instance, UUIDs allow mobile apps to create entities while offline.
(Replying to PARENT post)
0-9 and a-f are unambiguous and widely understood. Theyโre more โhuman readableโ.
(Replying to PARENT post)
(Replying to PARENT post)
Until another standard similar to what the article is suggesting becomes widely implemented in standard libraries then uuid isn't going anywhere, although in principle I agree with many of the arguments presented.
(Replying to PARENT post)
(Replying to PARENT post)
(Replying to PARENT post)
AKA I will make broad sweeping stupid headlines
(Replying to PARENT post)
(Replying to PARENT post)
...because I am going to keep using uuid
(Replying to PARENT post)
I would challenge the premise we appear to be starting from, that the average end user cares to be dealing with any random string of numbers and digits. GUIDs work well, theyโre implemented everywhere, and you wonโt find out long after you go into production you made some mistake that is going to make it so you have to migrate away from them.