>>17252410Zip compression and indexing of repetitive text can be suprisingly efficient.
https://www.youtube.com/watch?v=efPrtcLdcdMThat dude used a couple grand computer to do it, so it ain't that expensive.
NSA Runs a datacenter out of Arizona which is where they archive the internet at. Specifically they store encrypted communications for elongated periods of time in order to crack them later, and no doubt they have huge underground datacenters built to crack encryption as needed. That is a much more expensive storage problem because encrypted data can't be semantically symbolized and summarized the same way uncompressed or encrypted data is. You can distill a https session to a clickmap using a packet capture and compress and index it along with the website, but if it's all encrypted, you're storing everything at the full face-value, and assuming you have the key, decrypting that volume of data has significant computational cost.
While the data might be out there for a good long while, the records of attribution, meaning who is doing what, is really hard to figure out reliably without some kind of meta-identity-database table. IP addresses are unreliable sources of identity. You need to identify individual computer users; even if you target specific individuals, a lot of the posts on this site come from large government-affiliated botnets using zero-days to temporarily hijack cellphones and PC's. Identity brokers are building the infrastructure but people view surveillance and forced identity as stalking which is a precursor to crime and violence which necessitates building adblockers to poison identity brokers data. Ergo, while the attribution data today might be useful for targeting malware and commercially viable for advertising, it is not forensically sound enough for the courts.
So post away you insufferable cunts. This is your tax dollars at work.