(Replying to PARENT post)

> Big neural nets trained with SGD are unlikely to memorize something if they only see it once over the course of one million training steps

I am not so sure about that. Have you seen this thread: https://www.reddit.com/r/MachineLearning/comments/dfky70/dis...

Apparently lots of sentence fragments were memorized in GPT-2 (including real world URLs, entire conversations, username/emails and other PII).

๐Ÿ‘คthrowaway_bad๐Ÿ•‘6y๐Ÿ”ผ0๐Ÿ—จ๏ธ0

(Replying to PARENT post)

It actually can be more pernicious than that: https://arxiv.org/abs/1802.08232

However note that the dataset used to train GPT-2 is about 20x smaller than C4. I'm not 100% sure how many times the training set was repeated over the course of training for GPT-2, but it was likely many times. I stand by my statement (that memorization is unlikely with SGD and no repetition of training data) but I would be happy to be proven otherwise.

๐Ÿ‘คcraffel๐Ÿ•‘6y๐Ÿ”ผ0๐Ÿ—จ๏ธ0