BItmaps and negation

Just had another one of those moments where I really wasn’t thinking, then had to wonder why my code wasn’t working again. I had a very simple case where I needed a regex to match a character that was neither a word character or a space character. My fingers quickly typed:

[\W\S]

And I then wondered why it didn’t work. For those not understanding that part, I basically was creating a character class that consisted of \W (non word chars) and \S (non space chars). What is different from that and what does work?

==
[^\w^\s]
==

after all, in basic style, putting the caps char in a class \w vs. \W is just a negation, just like the ^ does inside a char class.

What happens is in how the regex engine defines a character class. What it’s really doing inside is creating a bitmap for that token. When creating it, the \W is expanded into those bits not in the word set. ^\w however puts in the word chars, and says, not them. With just one of these that would be fine, but when I added the second set, it busted, why?

When creating the bitmap with all the non word, and non space chars it blocked out some of the chars I really cared about. The way the cases are interpolated, the set’s were broken.

Thankfully I caught the problem quickly, but I could see how somebody could get rather confused about that.