The next OpenAI personal data ‘scandal’ (Google’s researchers ottenere risposte che rendono disponibili i dati “grezzi” usati per addestrare il modelli della serie GPT) invariably made headlines about ‘privacy’, copyright and so on. These issues are also invariably discussed without taking into account some clear facts that defuse the hype and once again expose the consequences of the ‘freebie’ economy, ‘loneliness capitalism‘ and the inertia of the supervisory authorities – by Andrea Monti – Initially published in Italian on Strategikon – a La Repubblica-Italian Tech blog.
Let’s start with a neutral observation: information used to build AI datasets are public, i.e. made available by their creators as essays, columns, posts, stories and as everything else supporting the spasmodic urge to get the famous ’15 minutes of celebrity’ (seemingly) predicted by Andy Wharol.
That said, we can start by getting into the molasses of complex legal reasoning to warn that the publication of content does not – per se – confer the right to process it for purposes other than personal use. This is the case, for example, with press articles, which are often marked ‘all rights reserved’; or, to some extent, with personal data that we make available online (and for which there is actually no absolute prohibition on re-use). The next step is to venture into the maze of legal interpretation to understand whether we are facing a ‘fair use exception’, as some (interested) players in the AI industry claim, or whether the balance of conflicting interests tips the scales towards industry or citizens. To find a way out, we can ask ourselves why we (and the regulators before us) have not yet complained about search engines whose crawlers behave exactly like those of the AI companies.
While we try to find our way through a growing complexity (this one artificial), the fact nevertheless remains: to paraphrase Metastasio, dati dalla rete fuggita, più richiamar no vale —it does not worth to recall data escaped from the network. In other words, this means that at the moment there is no actually practicable legal remedy that would concretely allow one to obtain the deletion of one’s own data or content included in a dataset and collected without the knowledge of concerned persons, companies and institutions, or to obtain fair compensation for their use by Big Tech and start-ups. Once again: legal injunctions, class-actions and, perhaps, criminal investigations are conceivable in the abstract; but against whom? In what timeframe? At what costs but, above all, with what outcomes? One only has to look at the geopolitics and geo-economics of high-tech to realise that grand statements of law are destined to remain just ink blots on sheets of paper locked in a drawer.
It is clear that the -perhaps- only really effective measure to prevent the sacking of our data would be to stop making them available. Everything, in other words, should end up behind paywalls or ‘closed circles’ and everyone should decide whether, what, how much and how to make public, taking responsibility for the choice.
If this consideration is correct, it has but one consequence: the umpteenth confirmation of the end of the romantic illusion of a ‘free’ network in which information can circulate freely.
Information want to be free we used to say to each other when as kids we used to lull ourselves in the utopia of the non-existent ‘cyberspace’ waiting for the handshake of a 2400 baud modem. But today, the illusion has dematerialised and it is clear that information is not (and never has been) free either in terms of cost or availability.
The only thing that remains to be understood -today- is who pays the bill, with what currency, and on what bank account or cryptowallet.