(Replying to PARENT post)

Am I seeing the same thing you are in those two images you linked to?

The first one, generated by AI has you looking at the camera. The other one has you looking at an instrument. They don’t look much like each other to me, in terms of pose or anything.

The first images suggested by Stable Attribution looks a lot more like the AI image to me, in terms of pose and everything.

πŸ‘€codetrotterπŸ•‘2yπŸ”Ό0πŸ—¨οΈ0

(Replying to PARENT post)

I think you're missing the point. How does Stable Diffusion know "what does saurik look like?". The answer is of course that it's seen saurik's profile pic in training data. Stable Attribution is not showing that.

As another comment[1] points out:

> This appears to be just looking for the nearest neighbors of the image in embedding space and calling those the source data.

Stylistically similar images are not the same as source images.

[1] https://news.ycombinator.com/item?id=34670483

πŸ‘€Retr0idπŸ•‘2yπŸ”Ό0πŸ—¨οΈ0

(Replying to PARENT post)

The slight adjustment of where I'm looking is minor. The instrument is uncommon and honestly difficult to get stable diffusion to generate correctly at all; though, from my experience playing with this (I spent a lot of time trying to figure out why it knew who I was before discovering the CLIP database browser), I'm going to argue that the reason that hand is showing up in the position it is is because of my hand holding the violin there. Frankly, we might also as well start nitpicking that the suit-like thing being worn in the generated image is not a tuxedo and the generated person looks a tad bit rounded and has more hair ;P.

However, the key thing here is: do you really think "avatar for saurik" is going to come up with something out of the blue--out of all the numerous random images of people that exist and who have avatar/profile photos... people wearing lots of different clothing and in different orientations--where we can even be talking about such silly things as props and gaze direction? I will assert that would be ridiculous. Clearly seeing at least one photo of me (and AFAIK it was only trained on thousands of copies of that single photo) was absolutely crucial to the construction of this image, and yet this website isn't finding any such to show as it isn't really doing what it is claiming to do (and it isn't clear to me how it could without the original prompt).

From there, SD is upscaling from memory a lot and filling in a ton of details (as almost all of the copies of my photo it was trained on are very small, embedded in screenshots of tweets), but the cornerstone of that construction is clearly my Twitter profile photo... and then, once you do that, you can go back and attempt to rationalize "these were the photos that were most similar to the photo that we generated, and so I guess those were what the algorithm used to generate this image", but--at least in this case--it is pretty obvious that that isn't how this worked as there's no way you start with "avatar of saurik" and arbitrarily pluck an image that is this similar to the one photo of saurik you have.

πŸ‘€saurikπŸ•‘2yπŸ”Ό0πŸ—¨οΈ0

(Replying to PARENT post)

That's the point. The website is finding images that look similar, not the images the algorithm actually used to generate that picture for the prompt "avatar for saurik".
πŸ‘€csande17πŸ•‘2yπŸ”Ό0πŸ—¨οΈ0