(Replying to PARENT post)
How about an approach where the agent's reward is not the predictability itself but the first derivative of it. This way the agent will be attracted to the parts of environment where it can improve and will avoid white-noise parts since its model of the world doesn't generalize on these.
Juergen Schmidhuber (the author of original LSTM paper) had a very similar idea, http://people.idsia.ch/~juergen/driven2009.pdf
"This drive maximizes interestingness, the first derivative of subjective beauty or compressibility, that is, the steepness of the learning curve. It motivates exploring infants, pure mathematicians, composers, artists, dancers, comedians, yourself, and (since 1990) artificial systems."
๐คmoosinho๐7y๐ผ0๐จ๏ธ0
(Replying to PARENT post)
If you read it thats exactly what they addressed here.
They say they address the noisy TV problem. The video shows why they needed to address it.
> These choices make RND immune to the noisy-TV problem.
๐คmooneater๐7y๐ผ0๐จ๏ธ0
(Replying to PARENT post)
I would imagine they would need some kind of breakaway factor that allows the agent to decide that, despite the unpredictability, what its trying to explore might not be worth the reward, or that there is no reward behind it.
๐คaeleos๐7y๐ผ0๐จ๏ธ0
(Replying to PARENT post)