If the string ends in anything but a tag, it crashes. Could be a letter, number, punctuation, symbol (even an angle bracket if not part of a tag). Quick fix is maybe to append a dummy tag or a < br > at the end.
As for the issue with quotes and attributes, my app was already stripping that stuff out. I am taking job descriptions that people pasted from Word (which creates a really nasty tag soup) and sanitizing it for basic display. I want to leave breaks, paragraphs, bold and such, but everything else can go. Here's the regex to remove all tag attributes:
Great! Big problem
Great!
Big problem though-
If the string ends in anything but a tag, it crashes. Could be a letter, number, punctuation, symbol (even an angle bracket if not part of a tag). Quick fix is maybe to append a dummy tag or a < br > at the end.
As for the issue with quotes and attributes, my app was already stripping that stuff out. I am taking job descriptions that people pasted from Word (which creates a really nasty tag soup) and sanitizing it for basic display. I want to leave breaks, paragraphs, bold and such, but everything else can go. Here's the regex to remove all tag attributes:
$string = preg_replace('/<([^\s>]*)(\s[^<]*)>/',"<\\1>",$string);
Also, if anybody wants it- Word makes a bunch of capitalized tags... for shame... lets fix that:
$string = preg_replace("/(<\/?)(\w+)([^>]*>)/e","'\\1'.strtolower('\\2').'\\3'",$string);