The Stochastic Parrot Squawks!

Squawk!  Anything I say is probabilistic at best!  Squawk!

I was winding my way through mock data for travelectable recently and encountered a problem that I thought was easy enough to fix for my robot friend but turned out to be borderline intractable (or non-deterministic for those mathematicians reading who demand exacting language).

While asking Llama-3 to collect hotel examples from various cities I switched to the smaller context model (only 8 billion tokens instead of 70 billion) in order to save on cost. 

In doing so, I lost some accuracy in its response.  This isn't necessarily catastrophic.  It was still returning realistic results, but it was messing up the formatting even when explicitly instructed not to deviate.  This, in itself, is yet another reason to view LLMs' capabilities skeptically.  But I've beat that horse enough, so I'll move on (at least until the end of the article).

Luckily, when it did fail, it failed in a handful of deterministic (predictable) ways that I could easily fix in code.  The first major error occurred when I asked it to return data in the format 

[{data},{data},{data},...]

It would often leave off the right bracket, so

[{data},{data},{data},...

Computers really hate it when things aren't matchy and matchy and refuse to process the data.  This was an easy enough fix, though.  If it didn't have a bracket at the end of the output, I added it.

[Aside:  Those of you who are software engineers will notice that I'm doing the bare minimum of input checking here.  If I were running a customer-centric site and a lack of quality could drive my customers away due to the inconsistency of the site, then I'd need to be more careful.  But here, it's easy enough to cover most use cases and fire off a new request if the last one fails.  My advice is to know the space you're working in.  Attention to detail is important, but you don't need tolerances of 1/100th of an inch when you're building a house.]

I started up my program again and encountered another error with enough consistency (it was failing 1/3 - 1/2 of the time), that I decided it was time to fix it.

Again, it was a specific problem, so I thought it'd be easy enough to tackle.  When grabbing the address for my hotel, the data should've been returned as 

"address": "300 W. Elm Anytown, OH USA"

but would erroneously come back as 

"address": 300 W. Elm Anytown, OH USA" 
#Note the missing double quote at the beginning of the address.

I asked the code generator to provide the code to fix it and, predictably, it gave me code with a regular expression in it.  For those of you who aren't technical, a regular expression is a construct that finds patterns that match certain criteria, and, if instructed, updates them.  

While the concept is simple, getting the details right so you don't capture something you didn't intend to or capture too much input is surprisingly difficult.

There's an old adage in software that, if you use regular expressions to solve a problem, now you've got two problems.  That apparently goes for LLMs' use of regular expressions as well.

It suggested the following expression:

re.sub(r'"address":\s*([^"])',r'"address": "\1',<my hotel data>)

Again for those who aren't technical, re.sub means "call the substitution function of the regular expression module in python.  Replace anything that matches the first regular expression (the part before the first comma) in <my hotel data> with the second expression.

This effectively just means - if there isn't a double quote after the colon, put it there.  But computers are so literal you have to be very precise (this is why programming can get difficult).

So, the first expression means

  • Find the literal text "address": 
  • Find any number of spaces after it (\s means a space and * mean 0 or more occurrences)
  • Find any character that is not a double quote - ^" means "not a double quote".  The [] effectively means find one of them and the () means group that output so I can reference it later.  
    • So, ([^"]) means let me reference the first character after "address": and a bunch of spaces and keep track of it.
The second expression means - if you've found something that matches the first expression, then rewrite it as "address": "\1 where the \1 is Group 1, which refers to the character captured above (again - essentially just put a double quote where it should be.)

Seems reasonable enough.  Except when I ran it, it would always add a double quote, even if there already was one.

I was able to find a workaround that met my needs (so, in other words, I didn't follow my own advice in my aside - I searched for a general solution when a more specific one was perfectly acceptable), but it wasn't until I started this write up that I finally zeroed in on the problem.

[Aside #2: ...and that's the power of writing.  For all the grumbling about too-long emails when people don't want to spend 4 minutes reading something and will instead hold a 30 or 60-minute meeting instead, writing helps you slow down and organize your thoughts.  I'll save my specific rant on meetings and corporate illiteracy for another day.]

The actual expression that works is 

r'"address":\s*([^"\s])'

Regular expression operators, like \s*, are typically expected to be greedy.  In mathematical terms that means they want to consume as much input as possible that meets what they're expecting to match.  If we have four spaces - '    ' - the operator should consume all four spaces.

In the case of our broken input, this works as expected.  The expression would match 4 spaces and see a non-double quote character: '    7' for example.  Because it notices that there is no double quote, it adds our replacement ' "7' (remember that our expression only generates one space regardless of the number we see in the original input - hence the space compression).

However, when we see correct input, '    "7' for example, the matcher decides to be non-greedy (I'd prefer altruistic, but that's not a term in regular expression math).  As far as I know, this is not expected behavior and may even be a bug in the library, but, like I said, regular expressions are hard, so I'm not 100% certain without further investigation.

[Aside #3: Bugs in well-known software, like Python regular expressions, are very rare given the number of eyes on the code before it's released.  Software engineers often claim that a library has a bug when the library doesn't behave as expected.  Greater than 99% of the time, it's you, not the software.  I'm still not convinced this is a bug, given the complexity of regular expressions.]

In this "good" case the \s* operator only picks the first 3 spaces and then notices that the remaining input is ' "7'.  Since the next part of the regular expression is [^"] (remember - anything but a double quote), and the next character is a space, then the pattern matches and it throws in an extra double quote.  The regular expression is doing something called backtracking, which essentially means that it will back up its spot in the input and re-examine the input for other, more specific patterns.  

Again, as far as I know, it shouldn't do this.  Backtracking should only occur when you tell an expression to go slow.  The \s* operator should always consume as many spaces as it can, like an insatiable cookie monster.

The Stochastic Parrot Squawks

I'll mostly leave aside the discussion on what a stochastic parrot is other than to say that LLMs like ChatGPT have been labeled such by one well-informed researcher and the term essentially means that they're just probabilistically repeating words that they've seen before with no insight behind those words.

When I asked Llama-3 if it was a stochastic parrot, it understood the origin of the term but answered (predictably) it was capable of so much more.  I disagreed and said it was a stochastic parrot.  It immediately acquiesced in order to please.  So much for reasoned discourse.

In the case of the regular expression above, it supplied me with the original code, and, when I indicated it didn't work, began building more and more convoluted solutions.  

When I asked, separately, if the original code should work, it agreed that it should, but could offer no other suggestion other than "it may be a bug," as any inexperienced developer would when looking to shift the workload.

At no point did it figure out why it was a potential bug or come up with the simpler workaround that I constructed (after 2-3 hours of research and testing).

And here, again, is my typical, belabored point - LLMs are simply unable to take over anything remotely handled by the human mind.  The so-called software engineering replacement tools would just slap band-aid upon band-aid on the problem because they lack critical thinking skills.  The code would certainly become unmaintainable by any human.

Of course, say the AI hypemen, it doesn't matter.  We won't need to code when the machines are advanced enough, because they'll handle it all.  But there are two problems with that - we need to know how to code now during the bridge phase and we're assuming that, when the code becomes inscrutable to us, the machines themselves will be able to maintain the systems.

Obviously, humans are far from perfect, have a horrible grasp of probability, and often should let software override what we think is non-intuitive.  But we've also built accommodating systems in a world that's evolved for billions of years and, as usual, don't realize that we've injected our own biases and over-corrections into the machines we've built, which have already demonstrated that they can cause massive damage, even before we let them loose.

Our brains are built to survive on the planet that created us.  Machines have no such evolutionary concept built into them without our training.  Logically, the response to being attacked by a tiger leaping out of the bushes is to kill all tigers so it will never happen again.  Machines are very literally sociopaths because they don't have consciences (much less consciousness), and, if we don't have ways to keep them tamped down at critical moments, will let the narrowest logical constraints influence their behavior without an inherent ethical background guiding them.

Is that a parrot you want to listen to?

Until next time, my human and robot friends.

Comments

Popular Posts