Weaponizing generated content

A couple of weeks ago, to much fanfare, the researchers at OpenAI published a “language model”, GPT-2.  “Published” is actually too strong a word – they published the fact that they had created an AI “too dangerous to release!”, along with a description of the algorithm and data (rather than runnable code), and a fully published reduced-power version.

Without throwing around a bunch of fairly obscure mathematical expressions, a language model tells you, given a certain input, what words are likely to follow.  One seeds it with particular input, and it generates a stream of following words, and eventually a magic “I’m done” signifier.  Ars Technica has good examples of this in practice.

The secret sauce wasn’t so much the algorithm they used, which was a clear extrapolation of previous, fully-published work.  Instead, like most machine learning developments over the past decade, it was driven by their custom dataset – every outbound link from Reddit over several years with at least three upvotes, run through a couple of plain-text filters to grab usable language from the resulting websites (instead of the kind of weird markup you get if you “view source” on a page).  Since everyone tweaks their models for their particular tasks anyway, and they gave a description sufficient for an expert to write models in the same family fairly easily, the unreleased dataset was the main item of interest.

This is broadly related to the idea of “style transfer”, which is how one generates stylistically correct, but synthetic, examples of fake porn, fake portraits, and fake waifus.  Delving into all the subtleties of the specific algorithms and how they relate to each other is a bit out of scope, but it suffices to say as a summary, that shitpoasting text, images, and to some extent video (full fake video does not currently exist; swapping objects or faces does, and cartoons are probably soon) is now semi-automatic.

So what are the implications of this?  And why didn’t they publish the whole thing?

The concerns raised in the blog post were unconvincing, to put it mildly.  “Fake news”, “impersonation”, “spam”.  These are not things that are currently undersupplied, excessively dangerous, or limited by availability of content.  It’s also a bit silly to publish on the one thing that worked and then pat yourself on the back for your discretion.  If it’s 1941 and you know in principle the Wikipedia-level description of an A-bomb, you’ve saved yourself half the effort of building one, because you know where to focus your efforts.

The surface explanation is that the OpenAI guys were trying to “raise awareness” of the problem in general, fitting with their concern over “AI risk”.  However, they also thanked Google for their computing contributions, which means the US deep state ipso facto has access to their full research output.  Glancing at their staff, and that of Google, it’s fair to say the Chinese do as well.  If one is genuinely concerned about AI being misused, these actors having access is roughly akin to (continuing the analogy), borrowing some physicists from Werner Heisenberg to work on your A-bomb.

So assuming that at least the de facto outcome is to give various deep states a running start on this particular subfield, and presumably encourage other authors to exercise the same “discretion” – what do you actually use it for if you are the deep state?  How do you weaponize generated content?

  • It’s astonishing there has been no “n-word tape”, or something similar, of a major US politician.  On the other hand, 2020 is just around the corner.
  • There are acknowledged groups whose purpose is to derail conversations online.  Remember when Trump “touched the orb”?  That was at a Saudi shitpoasting lab.  This becomes somewhat easier if you are able to generate content that realistically refers to other generated content.
  • The language models that have been published on are just the source material – they try to generate plausible language, but it’s not directed to particular tasks.  (The main point of the published paper was that if you throw enough data at these models, they end up learning some tasks anyway, but they are not metaphysically “trying to”.)  Scott Alexander, of all people, has an interesting fictional description of what is in principle possible if you layer in, for instance, a social objective function.  Given that the source Reddit dataset actually includes upvotes and downvotes, attempting to make that story real is a project taking maybe a man-month or so.
  • There’s a whole other field of research on correlating visual and text descriptions – automated captioning, and automated image generation from captions.  One of the interesting phenomena of the last decade or so has been the information overload generated by events like 9/11, that make in-depth, consistent analysis very difficult.  Primary sources are nonexistent – it’s screenshots of archives of YouTube videos of network broadcasts of footage from a camera that doesn’t exist anymore.  Contemporaneously layering in fake footage, images, and text analysis during another such major news event would make it much easier to control or derail narratives as they are created.

These are just me spitballing what I’d do if I put on my black hat and had a dozen ethically unencumbered PhDs at my disposal.  It doesn’t presuppose the existence of a portfolio of fully resourced social experiments at varying levels of complexity and success whose progress becomes far easier – but these things probably do exist.

I don’t have good countermeasures to suggest other than that it’s a good idea to become skilled at recognizing the deployment of generated content, which gets a lot easier if one becomes familiar with generating it.

One Comment Add yours

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s