Reply to comment

Great! Big problem

Great!

Big problem though-

If the string ends in anything but a tag, it crashes. Could be a letter, number, punctuation, symbol (even an angle bracket if not part of a tag). Quick fix is maybe to append a dummy tag or a < br > at the end.

As for the issue with quotes and attributes, my app was already stripping that stuff out. I am taking job descriptions that people pasted from Word (which creates a really nasty tag soup) and sanitizing it for basic display. I want to leave breaks, paragraphs, bold and such, but everything else can go. Here's the regex to remove all tag attributes:

$string = preg_replace('/<([^\s>]*)(\s[^<]*)>/',"<\\1>",$string);

Also, if anybody wants it- Word makes a bunch of capitalized tags... for shame... lets fix that:

$string = preg_replace("/(<\/?)(\w+)([^>]*>)/e","'\\1'.strtolower('\\2').'\\3'",$string);

Reply

The content of this field is kept private and will not be shown publicly.
  • Allowed HTML tags: <a> <b> <dd> <dl> <dt> <i> <li> <ol> <u> <ul> <p> <br> <div> <pre> <code> <img><h1><h2><h3><h4> <blockquote>
  • Lines and paragraphs break automatically.
  • Web page addresses and e-mail addresses turn into links automatically.

More information about formatting options

.