(Replying to PARENT post)

I read this story in the most chilling manner: the same tactics they use to perform this analysis will eventually be used to link people who anonymously post now but might say something in the future that can be linked to them using the same type of analysis used here. To phrase in another way, we have come to a point where your very prose is a digital signature.

👤SOLAR_FIELDS🕑6y🔼0🗨️0

(Replying to PARENT post)

This is already a thing. In fact the rumour is that the US govt discovered the identity of Bitcoin's Satoshi using this.

https://medium.com/cryptomuse/how-the-nsa-caught-satoshi-nak...

👤_nedR🕑6y🔼0🗨️0

(Replying to PARENT post)

I keep forgetting the general public is not aware of these things.

Data is like nuclear waste. Everything you do online leaves a pattern of behavior that is unique to you. Your only saving grace is no one cares about you specifically, until they do.

👤ggggtez🕑6y🔼0🗨️0

(Replying to PARENT post)

It was already known that a simple Markov Chain was used to detect another author had written a chapter inside a book. It was in 2003 I think, unfortunately I cannot find a reference about this. Just to tell that Markov chains are a very basic and old ML method quite efficient for this kind of task.

👤yogsototh🕑6y🔼0🗨️0

(Replying to PARENT post)

This sort of analysis is older than Tolkien. There are pretty substantial processing requirements to do it at scale and it's pretty inaccurate. In the future people who say controversial things will use short sentence long statements to render this sort of analysis useless.

👤TheOperator🕑6y🔼0🗨️0

(Replying to PARENT post)

There are rephrasing services available, presumably for helping users plagiarise. Some are laughable bad but possible helpful, while othersare quite good. Eg: https://quillbot.com/app

https://paraphrasing-tool.com/

👤lostlogin🕑6y🔼0🗨️0

(Replying to PARENT post)

Isn't this the reason the grep was created? It was used to determine which parts of the Federalist Papers were written by which author.[0]

Considering this occurred in 1974, I can only imagine that techniques for de-anonymizing authors have gotten much better due to how much written text individuals post on social media sites, like hn. Uh oh.

[0] https://en.wikipedia.org/wiki/Grep

👤mmcwilliams🕑6y🔼0🗨️0

(Replying to PARENT post)

It's already a thing, see how the FBI caught Silkroad admin. Although not in the automated fashion that you suggest, I am pretty sure the algos are already in use.

👤cocochanel🕑6y🔼0🗨️0

(Replying to PARENT post)

https://github.com/psal/anonymouth

👤Xelbair🕑6y🔼0🗨️0

(Replying to PARENT post)

Aren't college students everywhere already exposed to this with every text, due to "plagiarism detection" software?

👤kzrdude🕑6y🔼0🗨️0

(Replying to PARENT post)

Possible solution: run your text through a different machine translator for each account. Make minor corrections for cohesiveness.

👤salutonmundo🕑6y🔼0🗨️0

(Replying to PARENT post)

But the whole cat and mouse game hasn't really started yet. Once people find out what the algo looks at they can try to game it. Eg if you know or looks for the same phrases like "first of all" you can stop using that. Or if it looks at specific errors you can sprinkle it in one text but not another.

👤lordnacho🕑6y🔼0🗨️0

(Replying to PARENT post)

Wasn't that how they caught the Unabomber? I saw a documentary about the guy who caught him by using this sort of analysis, although his method was quite analog (scanning through written letters and Unabomber's correspondences to the press).

👤metastew🕑6y🔼0🗨️0

(Replying to PARENT post)

Only a matter of time before AI-powered 'prose anonymiser' is developed.

Then you can just run all your naughty words through the russian styliser.

👤Simple_Guy🕑6y🔼0🗨️0

(Replying to PARENT post)

simple & obscure solution: google translate to another language and back to original.

👤viko-h🕑6y🔼0🗨️0