python - How do I stop extracting href tags using Beautifulsoup when I encounter a comment in html? -

    03420&nbsp;&nbsp;<a href="/kegg-bin/show_pathway?ban03420">nucleotide excision repair</a><br>     03430&nbsp;&nbsp;<a href="/kegg-bin/show_pathway?ban03430">mismatch repair</a><br>     03440&nbsp;&nbsp;<a href="/kegg-bin/show_pathway?ban03440">homologous recombination</a><br>       </ul>     </ul>     <!-- -->     <b>environmental information processing</b>     <ul>      membrane transport       <ul>     02010&nbsp;&nbsp;<a href="/kegg-bin/show_pathway?ban02010">abc transporters</a><br>

i need extract pathway codes(eg. 03420, 03430 etc) webpage using python i've done using beautifulsoup. want stop before environmental information processing looking distinct tag here can use.  @ perfect position can't figure out how stop @ point. can tell me if/how can use stop extracting codes before comment. (i'm new python , html , straightaway jumping web parsing bear me please.)

in html, xhtml, xml,  finish it. comment , not affect result on browser, add bytes on response.

<!-- comment text       , can break lines.      compatible html, xhtml , xml. -->

on other languages have other sintaxes comments, sample:

/* comment c, c++, c#, java, javascript, css, etc.    can break lines */  // single line comment c, c++, c#, java, javascript.. can't break lines here

see more comments if want on this link.

Abbruzzese

Search This Blog

python - How do I stop extracting href tags using Beautifulsoup when I encounter a comment in html? -

Comments

Post a Comment