regex - Replace ignores non-capture group -


why regex.replace ignore non-capturing group? removing bracketed numbers occur @ end of filename, whether followed 0, 1 or 2 extensions. example,

whatever(54).xml

will become

whatever.xml

this doesn't work:

private function fixfilename(byval fn string) string     static rgx new regex("(\(\d+\))(?:(\.\w{2,3}){0,2})$")     return rgx.replace(fn, "", 1) end function 

it removes extensions after numbers, though i'm not capturing them. works:

private function fixfilename(byval fn string) string     static rgx new regex("(\(\d+\))((\.\w{2,3}){0,2})$")     return rgx.replace(fn, "$2", 1) end function 

by capturing, , reinserting, extensions (if any).

some test code:

option strict on option explicit on  imports system.text.regularexpressions  public class form1      private sub form1_load(byval sender system.object, byval e system.eventargs) handles mybase.load         richtextbox1.wordwrap = false     end sub      private sub button1_click(byval sender system.object, byval e system.eventargs) _             handles button1.click         dim filenames() string = {"wibble(a).xml", "blah (blah( blah)).xml", "blah(54)",                                      "blahblah(433).xml", "blah(2)blah(500)", "blah(23)blah(500).xml",                                      "blah(23)blah(500).xml.doh"}          each filename string in filenames             richtextbox1.appendtext(filename & " --> " & fixfilename(filename) & vbnewline)         next     end sub 

this image might useful:

regex101

i want know whether design or if there wrong regex? , whether attempt @ positive lookahead assertion might work.

regardless of whether create capture or not, .replace() replace whole match. msdn:

regex.replace method (string, string)
in specified input string, replaces strings match regular expression pattern specified replacement string.


this design, , expected behaviour in regex flavours. correct, need use group extensions, , backreference in substitution.

an comment: there's no need use group in (\(\d+\)), since don't need capture (or @ least not in example provided). \(\d+\) work fine.

and again you're right, use lookahead, assert without consuming characters in match.

regex: (spoiler)

\(\d+\)(?=(?:\.\w{2,3}){0,2}$)

  • move mouse on above block see regex lookahead (it wasn't clear in question whether wanted solution or know if work)

output:

wibble(a).xml --> wibble(a).xml blah (blah( blah)).xml --> blah (blah( blah)).xml blah(54) --> blah blahblah(433).xml --> blahblah.xml blah(2)blah(500) --> blah(2)blah blah(23)blah(500).xml --> blah(23)blah.xml blah(23)blah(500).xml.doh --> blah(23)blah.xml.doh 

demo


Comments