python - How do I stop extracting href tags using Beautifulsoup when I encounter a comment in html? -
03420 <a href="/kegg-bin/show_pathway?ban03420">nucleotide excision repair</a><br> 03430 <a href="/kegg-bin/show_pathway?ban03430">mismatch repair</a><br> 03440 <a href="/kegg-bin/show_pathway?ban03440">homologous recombination</a><br> </ul> </ul> <!-- --> <b>environmental information processing</b> <ul> membrane transport <ul> 02010 <a href="/kegg-bin/show_pathway?ban02010">abc transporters</a><br>
i need extract pathway codes(eg. 03420, 03430 etc) webpage using python i've done using beautifulsoup. want stop before environmental information processing looking distinct tag here can use. <!-- -->
@ perfect position can't figure out how stop @ point. can tell me if/how can use stop extracting codes before comment. (i'm new python , html , straightaway jumping web parsing bear me please.)
in html
, xhtml
, xml
, <!--
starts commentary scope , -->
finish it. comment , not affect result on browser, add bytes on response.
<!-- comment text , can break lines. compatible html, xhtml , xml. -->
on other languages have other sintaxes comments, sample:
/* comment c, c++, c#, java, javascript, css, etc. can break lines */ // single line comment c, c++, c#, java, javascript.. can't break lines here
see more comments if want on this link.
Comments
Post a Comment