You Don't Need UUID

👤popcalc🕑2y🔼129🗨️195

(Replying to PARENT post)

> A simple ID like 3c6n63N is more than enough to represent any product while keeping it readable and making communication easier. A UUID alternative like a73ba12d-1d8b-2516-3aee-4b15e563a835 is just wasteful from a user’s perspective.

I would challenge the premise we appear to be starting from, that the average end user cares to be dealing with any random string of numbers and digits. GUIDs work well, they’re implemented everywhere, and you won’t find out long after you go into production you made some mistake that is going to make it so you have to migrate away from them.

👤emodendroket🕑2y🔼0🗨️0

(Replying to PARENT post)

The crux of this argument seems to be that UUIDs are too long? Which I disagree with. I can't memorize them, no, and it would be cumbersome to try to say one aloud, but these aren't situations I've ever found myself in.

Does it make the URL in the URL bar longer? Yeah, but does that matter?

👤iaaan🕑2y🔼0🗨️0

(Replying to PARENT post)

When it comes to random identifiers, the advantage of uuid4 is mainly that it's a standard thing that everyone understands well, and finding a lib that generates one securely won't require any thought. It's never the ideal solution, but it's often good enough.

👤hot_gril🕑2y🔼0🗨️0

(Replying to PARENT post)

There are so many nuances and edge cases not mentioned in the article. The author needs to do several projects in real life before making global pronouncements like this.

👤swader999🕑2y🔼0🗨️0

(Replying to PARENT post)

IMO, a good middleground is using schemes like TypeID[0], ulid[1], or KSUID[2] that provides a more compact and readable (base32) representation and provides better database locality (K-sortable).

[0] https://github.com/jetpack-io/typeid [1] https://github.com/ulid/spec [2] https://github.com/segmentio/ksuid

👤ekojs🕑2y🔼0🗨️0

(Replying to PARENT post)

Author here. I posted this because I've witnessed many systems in companies I've worked for where our end-users needed UUID to communicate with it (technical support, customer ID, etc.) in a way that makes communication harder. We could've used another shorter ID scheme, which would be fine.

The good thing about UUID is that it's omnipresent. From what I've heard, it's this lengthy (2^32) because it was hard to guarantee uniqueness when it was conceived in the telecom industry. The length is overkill, and per se, that's fine, but the fact that it dampers communication is awful.

That all said, since posting this, I've come to terms with accepting that it's part of life ¯\_(ツ)_/¯

P.S. Using a second human-friendly ID to end-users is an alternative adopted by some projects. However, most projects don't bother, and also, most good IDs you might want to share with people would make UUID unnecessary anyways (in practice).

👤henvic🕑2y🔼0🗨️0

(Replying to PARENT post)

It's amusing that the UUIDs are considered 'ugly' while the Amazon and YouTube IDs are not. Beauty is in the eye of the beholder. I don't find UUIDs pretty, at all, but the others are even uglier to my taste.

Yes, UUIDs are overkill for most applications. But CPUs and hard drives are, relatively speaking, cheap. Using an existing, battle-tested unique ID library implementation has advantages. The 8 bytes per record you're saving over bigserial is, for most use cases, negligible. 1,000,000 rows? You'll save 8 MB by switching away from UUIDs.

Most databases won't be that large. Use a UUID if you want; pretend you'll have Really Big Data some day if it makes you happy. Render it using a special function if the hyphens are too ugly.

👤jherskovic🕑2y🔼0🗨️0

(Replying to PARENT post)

> A UUID alternative like a73ba12d-1d8b-2516-3aee-4b15e563a835 is just wasteful from an user’s perspective.

The argumentation in this article is pretty poor from my experience. A UUID isn’t meant to be handled by the non-technical end user. The end user usually doesn’t and shouldn’t care about the URL. I can assure there are bigger architectural problems in your design if your user has to care about accessible internal ids.

👤siva7🕑2y🔼0🗨️0

(Replying to PARENT post)

An issue that is not solved by either UUIDv4 or the proposed solution (random base58 strings) is indexing performance.

Both of those solutions typically make it hard for a DB if you write new entries, assuming you have an index on the ID.

In addition it might be more calming to actually be sure that a particular ID is not in use without doing a round-trip.

Is it practical to pre-allocate empty entries and reserve a set of them?

👤dgb23🕑2y🔼0🗨️0

(Replying to PARENT post)

> As Tom Scott shows in his video, 11 base58-encoded characters are enough for YouTube to serve content even when considering that private videos should be undiscoverable.

Nitpick: Google isn't concerned about the discoverability of private videos; those can only be viewed when granted access. You're thinking of unlisted videos.

👤xcdzvyn🕑2y🔼0🗨️0

(Replying to PARENT post)

I think is an “engineer thinks harder about UX” kind of problem. UUIDs are fine for what they’re good at. Try to avoid making them user-facing because, yeah, they’re ugly. I also wouldn’t hand 128 bit integers to the user if I could avoid it.

Don’t forget: UUIDs are not dash-separated strings. They’re integers. You can render them differently if those dashes are sucky.

👤Waterluvian🕑2y🔼0🗨️0

(Replying to PARENT post)

I recently found ULID.

I like its simplicity. It is sequential. It has a very low probability of collision. And it is of a more reasonable length.

Because it encodes the time, theoretically you could use it to grab the CreatedDate of a record without a need for another field.

https://github.com/ulid/spec

👤nwah1🕑2y🔼0🗨️0

(Replying to PARENT post)

Isn't auto generating UUIDs "fast enough" in most situations that its negligible?

In particular databases (where UUIDs taking space was a big concern) they have largely switched to a packed binary format that makes the size of UUIDs over time a non issue for all practical purposes.

👤no_wizard🕑2y🔼0🗨️0

(Replying to PARENT post)

Sure, you don't need them, but why re-invent the wheel? At least UUIDs are not some one-off function in a helpers file that may need to be amended many times before you get it right, but are backed by a specification, are time-sortable, etc.

👤ddoolin🕑2y🔼0🗨️0

(Replying to PARENT post)

I completely agree. UUIDs are almost always evidence of overengineering, especially when user-facing. 64-bits is "enough" of an address space for almost any purpose, even global. And while it seems like speaking an ID over the phone or having to scan it manually is something you'd never have to do, in practice it happens all the time. Cut-and-paste is not always an option on all platforms.

👤saulpw🕑2y🔼0🗨️0

(Replying to PARENT post)

No comments on this page about Snowflake IDs.

https://en.wikipedia.org/wiki/Snowflake_ID

They fit within 64 bits, allow for more than enough processes to handle 10k+ transactions per second, give enough of a timestamp headroom for decades into the future, and where ID generation can be made isolated to each process.

They don't work well for anything related to archival work, but you might as well use a regular ID for that anyway, unless you're also actively scraping terabytes of data off of the Internet every second, in which case UUIDv5's good enough for your extreme edge case.

...But at that point you might as well just roll your own 128-bit version of a Snowflake ID.

👤x-complexity🕑2y🔼0🗨️0

(Replying to PARENT post)

Use sequential IDs, but run them through a maximal linear feedback shift register.

They will be in a deterministic order, but will appear semi-random to the end user.

For things like product IDs or user IDs, etc you don't actually need them to be random. But perhaps you don't want them to simply start counting sequentially.

👤ars🕑2y🔼0🗨️0

(Replying to PARENT post)

I'm building an object database. In different places, it uses four different categories of IDs, depending on requirements:

1. Compact sequential is used, obviously, where order matters. It has the drawback of requiring a coordination with a singleton. (This can be sharded/vectorized, of course.) Aside from that, it also leaks the number of objects/transactions, just by looking for the highest number available. Can be varint-encoded very nicely.

2. Compact non-sequential is used where I need a small identifier, but not leak the number of objects. Since it's compact, I must still guarantee uniqueness as in (1). This is currently implemented using a block cipher on top of the compact sequential ID generator. The drawback of this is that the domain may still be in guessable territory, depending on how much is generated. A 64-bit integer filled with 4B only requires 4B guesses to hit a collision. I don't use this much. The key can never be rotated: a key number would eat up precious bits.

3. Sparse random, used where non-guessability is important, aside from not leaking rates/counts. Take a Google Docs sharable link as an example. I doubt YouTube cares about this. This is where something like a 128-bit number like UUID or ULID shines. The space is large enough that uniqueness is assumed, given a decent PRNG.

Sure, I try to use the nicest one at any given point (e.g. using a compact sequential instead of non-sequential during debugging.) But fact is that sparse random just tick more boxes.

4) Sparse, human readable. For "vouchers". They are bearer tokens that give requests more powers, e.g. to create an account or act as admin. These should be reasonably human readable, so they can be spoken. They obviously need to be sparse and hard to guess, which requires a trade-off in length.

I present them in three ways: base32, english words and QR-code. Pick one; they all do the same. For copy-pasting, base32 might be best (or base58, by all means.)

For shouting to a colleague, the sequence of english words might be better. I'd add other languages as needed: it's just a fixed list of words. The nice thing is it can encode the sequence in base-500 or base-1000 without being obnoxious. (The Matrix protocol and others use emoji lists, but it's the same idea. [1]) Finally, if you have a phone in your pocket or camera on your computer, perhaps the QR-code is the easiest way to use the voucher code.

[1] Actually, IIRC, Matrix only uses 64 emojis, which feels a bit wasteful.

👤tommiegannert🕑2y🔼0🗨️0

(Replying to PARENT post)

I usually go for Nano Id for new projects https://github.com/ai/nanoid

👤erlendellingsen🕑2y🔼0🗨️0

(Replying to PARENT post)

In case it helps anyone, I just went down the rabbit hole of calculating the amount of random bits required to avoid collisions. The traditional Birthday Paradox formula gives us a probability but it's not very intuitive to understand what the probability means in terms we are used to as developers, so I tried a different approach: expected time period for a collision to happen.

https://colab.research.google.com/drive/1ec4n7Ex9bnkl_c45EUl...

👤olalonde🕑2y🔼0🗨️0

(Replying to PARENT post)

You Don't Need UUID, But You Probably Want It:

- Don't invent your own ID datatype (especially 11 byte one), this is almost guaranteed to cause dangerous bugs, because each integration will have to carefully implement/hack it.

- Use UUID (preferably the new v7). 128-bit UUID is implemented, for you, pretty much everywhere.

- Serial integers still work too, but you should choose them consciously to fit the data model.

- Implement "natural keys" if you want pretty/memorable/Cool URLs. Never use non-standard PKs to store custom semantic data, because inevitably you will get garbage PKs that need to be fixed, and migrating a PK value is extremely risky.

👤pphysch🕑2y🔼0🗨️0

(Replying to PARENT post)

> This solution uses the human-readable base58 encoding scheme.

We could encode the 128 bit UUID integer in base58 as well if needed. Its textual representation won't conform to the standard but we'd get the same number of bits. Which would be 2^122 or 2^121 bits of uniqueness not 2^128 if it's a proper UUID.

11base58 character certainly doesn't have the 2^122 bits. So we could decided separately if we could either reduce the number of bits needed and/or use a different encoding.

> If you click and buy any of these from Amazon after visiting the links above, I might get a commission from their Affiliate program.

:-)

👤rdtsc🕑2y🔼0🗨️0

(Replying to PARENT post)

A UUID is 128 bits, or 16 bytes. This code suggests IDs which are 11 bytes. That's not really comparable.

A UUID can also be encoded in any form, it doesn't need to be represented as the dashed string notation which is common, you can just as easily use the base58 alphabet suggested in the post.

But the code in the post doesn't encode to base58 correctly. You need to map 58 bits of the input, sequentially, to one character in the alphabet as output. You can't just mod each byte of the input by the alphabet length and use the corresponding alphabet element.

👤kiitos🕑2y🔼0🗨️0

(Replying to PARENT post)

Pivoting to the related subject of compact textual representations of _typed_ data, check out CESR[0]. Through some careful choices of code prefixes and payload lengths that won't require Base64 padding, it provides a novel encoding scheme for JSON plus cryptographic signatures (all while enabling true concatenative composability). It is at the heart of the KERI[1][2][3], a decentralized identity management scheme, and ACDC[4], a mechanism for verified credentials.

[0] https://trustoverip.github.io/tswg-cesr-specification/draft-... [1] https://keri.one/keri-resources/ [2] https://trustoverip.github.io/tswg-keri-specification/draft-... [3] https://github.com/WebOfTrust/keripy/ [3] https://trustoverip.github.io/tswg-acdc-specification/draft-...

👤pdmccormick🕑2y🔼0🗨️0

(Replying to PARENT post)

I have created a custom unique identifier scheme before. This post doesn't address some of the challenges you come across.

1. When dealing with user friendly IDs, it's often important to make sure there are no ambiguous characters. This is a UX requirement for anyone that may need to read an ID and type it in at any point. For example, this means removing certain confusing characters like "0oIi1l5sS", etc. You end up with a much smaller set of characters. In my case, young children were required to type in teacher codes. Needless to say, I had to limit a lot of possible confusing characters.

2. You will have collisions. How do you handle them (ie, retry N times until you get a valid one)? What happens if you can't get a valid one? This happened in a product of mine. We had ~8 character codes that were human readable and type-able and we... ran out of codes.

3. How can you embed more information into the code such as versioning? It's often useful to prefix a code with some info in the event you need to modify it later (such as expand the character set or length)?

👤consoomer🕑2y🔼0🗨️0

(Replying to PARENT post)

If your users needs to communicate an ID orally to you, you already failed in numerous ways.

👤BiteCode_dev🕑2y🔼0🗨️0

(Replying to PARENT post)

Is there any existing standardized middle ground between UUIDs, random bits of length n, and serial IDs? I totally buy that UUIDs are unpleasant in URLs in many cases. I can also imagine that in many cases UUIDs for small objects might be a huge % of on the wire payload size.

But besides human friendly slugs, which usually have an ID mapping behind the scenes in my experience, it seems like there might be more work than value for startups in many cases.

👤Glyptodon🕑2y🔼0🗨️0

(Replying to PARENT post)

I like UUIDs aesthetically. They're also an instant queue that something is an "entity" and familiar to people for that purpose.

Databases and lots of other things also have native support for them which makes them even more appealing to use.

Of course, there are plenty of reasons to avoid them like the article discusses. I'd almost certainly not criticize someone for choosing smaller IDs as long as security isn't an issue.

👤Ameo🕑2y🔼0🗨️0

(Replying to PARENT post)

I don't know that I really agree with the logic here; certainly for some cases having a pretty URL matters (e.g. URL shorteners or even something like YouTube), but if your goal is to minimize URL size, and if you're centrally synchronizing anyway to avoid conflicts, why not just use integers or something? If you use a GMP-style bigint, I can virtually guarantee it'll be faster than any random string concat glue you come up with, and you're never going to run out of integers.

I like using GUID specifically because there's usually a built-in implementation everywhere, and because it's so stupidly huge, the likelihood of a conflict is statistically zero, meaning I don't need to bother synchronizing against a server.

Yes, it's technically possible I could save a few bytes of bandwidth or something by using some kind of base encoding, and maybe in some kind of embedded case that might matter, but for a vast majority of cases, GUID is absolutely fine, and since it's used everywhere, it's also very thoroughly tested by multi-billion-dollar corporations meaning that I don't have to worry much about any issues.

So no, you don't need GUID, you don't need a lot of stuff in the software world. I could poll bytes from /dev/urandom and probably get something that works well enough, I could turn off garbage collection and just pre-allocate all my memory before hand, I could avoid an OS entirely and write straight to the metal, I could do a lot of things, but I don't because I value my time . GUID solves a specific problem pretty well.

EDIT:

The reason I'm specifying this is because I think the title would be better if it was something like "You (probably) don't need UUID".

👤tombert🕑2y🔼0🗨️0

(Replying to PARENT post)

I mean, don't use UUID for user-facing things sure, but that doesn't mean it's not useful in of itself.

That's a terrible way to use it is all.

👤mattbeck🕑2y🔼0🗨️0

(Replying to PARENT post)

> If you’re talking about a web system composed of microservice architecture all running on the same datacenter, perhaps sharing the same database

We have different ideas what 'microservice architecture' means, I guess.

One of the key points of UUIDs is to be able to generate (probably) non-conflicting values without coordination

👤CiaranMcNulty🕑2y🔼0🗨️0

(Replying to PARENT post)

I think there's a better "You Don't Need UUID", which is that you probably aren't going to generate so many integers per second that you need to do so in a way that's perfectly collision free.

For example, you can update a postgres integer 1000s of times per second, sequentially, and 100s of 1000s of times per second if you allow for gaps/ unordered sequencing. The reason to choose a UUID is because it doesn't require a db. Totally fair. But you probably already have a db lying around if you're generating UUIDs to begin with?

Or because you need an unguessable token, which a UUID is great for.

But if you have a postgres db around somewhere consider just having a counters table and using that. It'll be fast enough for almost anyone, it'll return smaller tokens, and it'll return tokens that are sequentially allocated (in range. These are nice properties to have!

👤insanitybit🕑2y🔼0🗨️0

(Replying to PARENT post)

For the web, sequential ID attacks represent a significant problem in terms of a Resource Enumeration Attack. It gets worse when auth isn’t even in play yet (the user isn’t logged in, or doesn’t even have an account yet, or this info is meant to be available outside of auth but to only one specific user), but you need to display only the current item, and not any others. Having a sequential ID of any kind allows the user to trivially hack their way into any other item they wish by just incrementing or decrementing that ID.

UUIDs represent a trivially easy way of implementing a non-incrementing ID with a ridiculously unwieldy address space that makes it supremely unrealistic for the vast majority of users to mess with. They’re just going to give up long before the first successful hit.

👤rekabis🕑2y🔼0🗨️0

(Replying to PARENT post)

Best part is watching the video and seeing the guy dance at the end over doing it in a single take.

👤mccrory🕑2y🔼0🗨️0

(Replying to PARENT post)

Doesn’t UUID have mechanisms to prevent collisions even if you start two processes simultaneously with the same initial seed? There are human readable tweaks you can make to them that preserve all of the good properties without limiting the pool much though.

👤jackblemming🕑2y🔼0🗨️0

(Replying to PARENT post)

I’m not sure if this is accurate any longer, but it was bad practice to use UUID’s as PK’s in PostgreSQL for example due to indexing and performance. Is this still true? It’s really nice in PostgreSQL to just do:

    id UUID DEFAULT gen_random_uuid()

👤nodesocket🕑2y🔼0🗨️0

(Replying to PARENT post)

We need them tho. Especially UUIDv4 and UUIDv5. We have a distributed system and we manage permission identities with that. Each item that's supposed to be restricted has either of these ids. UUIDv4 for global things like workspaces and users, UUIDv5 for workspace-specific stuff (the workspace UUIDv4 is given as a namespace for the v5) and it's quite beautiful and prevents clashes. So it really depends on your usage. We still have long ids that are exposed to the client tho.

👤RamblingCTO🕑2y🔼0🗨️0

(Replying to PARENT post)

My experience: If you're storing data in a database keyed by a GUID, you get a nicer random distribution in an index, avoiding hot partitions when your workload is seemingly random.

👤wnolens🕑2y🔼0🗨️0

(Replying to PARENT post)

You don't "need" them, but they're almost universally understood and supported which makes them super convenient and a good starting point for many use cases. The point in which the "wasteful" nature of a UUUID really bites you is so far down the road that it is almost not even worth thinking about until you get there. Chances are you won't be dealing in petabytes at all and the 64 bit savings you are fretting over doesn't really matter.

👤kolanos🕑2y🔼0🗨️0

(Replying to PARENT post)

Just use UUID.

If you need something user friendly, use the first 6 or 8 characters.

If you need to worry about index performance due to enormous index sizes, you probably know it.

👤konschubert🕑2y🔼0🗨️0

(Replying to PARENT post)

I see ulid and nanoid being recommended here. I like cuid2 based on input from its README [0] and discussion [1]

[0] https://github.com/paralleldrive/cuid2

[1] https://github.com/paralleldrive/cuid2/issues/7

👤fuzzythinker🕑2y🔼0🗨️0

(Replying to PARENT post)

> Version 4 is completely randomly generated (hence, it has more entropy) and is what most web systems seem to use. It has 16^32 = 2^128 bits that guarantee uniqueness and has an insignificant risk of collision.

Small correction here but it is not 128 random bits entirely: 6 bits are reserved for the version marker.

👤alwaysbeconsing🕑2y🔼0🗨️0

(Replying to PARENT post)

> This solution uses the human-readable base58 encoding scheme.

How readable is base58 in Arabic or Chinese? I've concluded that only the standard numbers 0123456789, algebra symbols +-/*= and phone symbols #* are universal enough for a global encoding. Any other insights are welcomed!

👤irq-1🕑2y🔼0🗨️0

(Replying to PARENT post)

UUIDs are great for when you have a distributed system where connectivity between the nodes is intermittent, and every node needs to be able to create new entities without ever having ID collisions.

For instance, UUIDs allow mobile apps to create entities while offline.

👤tcoff91🕑2y🔼0🗨️0

(Replying to PARENT post)

It’s a sales pitch for base58 that I’m not buying at all. According to the code segment, they include numeral “1” but exclude lowercase “l”?

0-9 and a-f are unambiguous and widely understood. They’re more “human readable”.

👤afgrant🕑2y🔼0🗨️0

(Replying to PARENT post)

I wonder how Stripe generates their IDs. They look quite nice.

👤olalonde🕑2y🔼0🗨️0

(Replying to PARENT post)

uuids are ubiquitous and easy and implemented in many languages in the standard library - therefore a dev can get near-guaranteed uniqueness without thinking too hard about it.

Until another standard similar to what the article is suggesting becomes widely implemented in standard libraries then uuid isn't going anywhere, although in principle I agree with many of the arguments presented.

👤stayfrosty420🕑2y🔼0🗨️0

(Replying to PARENT post)

Read this as I was about to use uuid for id :/

👤srameshc🕑2y🔼0🗨️0

(Replying to PARENT post)

Everyone wanted UUIDs and we had IRIs all along...

👤tanepiper🕑2y🔼0🗨️0

(Replying to PARENT post)

YoU dOnT nEeD _____

AKA I will make broad sweeping stupid headlines

👤Exuma🕑2y🔼0🗨️0

(Replying to PARENT post)

This is such a HN article.

👤0xbadcafebee🕑2y🔼0🗨️0

(Replying to PARENT post)

You don't need to tell me I don't need UUID

...because I am going to keep using uuid

👤lifechoseme123🕑2y🔼0🗨️0