(Replying to PARENT post)

I like the idea of having the agent be attracted to the unpredictable, but I guess there should be something to ensure that unpredictability doesn't dominate which action is selected. For an interesting/funny example check their two videos: "Agent in a maze without a noisy TV" and "Agent in a maze with a noisy TV"
๐Ÿ‘คmagoghm๐Ÿ•‘7y๐Ÿ”ผ0๐Ÿ—จ๏ธ0

(Replying to PARENT post)

How about an approach where the agent's reward is not the predictability itself but the first derivative of it. This way the agent will be attracted to the parts of environment where it can improve and will avoid white-noise parts since its model of the world doesn't generalize on these.

Juergen Schmidhuber (the author of original LSTM paper) had a very similar idea, http://people.idsia.ch/~juergen/driven2009.pdf

"This drive maximizes interestingness, the first derivative of subjective beauty or compressibility, that is, the steepness of the learning curve. It motivates exploring infants, pure mathematicians, composers, artists, dancers, comedians, yourself, and (since 1990) artificial systems."

๐Ÿ‘คmoosinho๐Ÿ•‘7y๐Ÿ”ผ0๐Ÿ—จ๏ธ0

(Replying to PARENT post)

If you read it thats exactly what they addressed here. They say they address the noisy TV problem. The video shows why they needed to address it.

> These choices make RND immune to the noisy-TV problem.

๐Ÿ‘คmooneater๐Ÿ•‘7y๐Ÿ”ผ0๐Ÿ—จ๏ธ0

(Replying to PARENT post)

I would imagine they would need some kind of breakaway factor that allows the agent to decide that, despite the unpredictability, what its trying to explore might not be worth the reward, or that there is no reward behind it.
๐Ÿ‘คaeleos๐Ÿ•‘7y๐Ÿ”ผ0๐Ÿ—จ๏ธ0