regex - Using a regular expression to extract substring -


i'm trying extract substring in r, using stringr. timeago wrote script did job, not work anymore. due update, don't know.

my string looks like(!) this: mystr <- " layout = (3,3); //lala". string contain layout keyword, equal sign , 2 braces (open ... close). however, number of arguments in between can vary: (1,23,455,22) possible. after part after ) can varying well.

i obtain substring starting form ( , ending ). example must give: (3,3). others may give e.g. (1,23,455,22).

up used this:

library(stringr) str_extract("    layout = (3,3); //lala", "*\\(.*\\)") 

however not work anymore. gives me error:

error in stri_extract_first_regex(string, pattern, opts_regex = attr(pattern,  :    syntax error in regexp pattern. (u_regex_rule_syntax) 

it used work in past. wrong regular expression?

edit: if string contains 2 pair of braces, substring should select left pair (the other commented-out //):

str <- "layout = (1,2,3,4) //lala(huhu)" gsub(".*([(])(.*)([)]).*", "\\1\\2\\3", str) #gives "(huhu)" not good; should (1,2,3,4) 

your regex "*\\(.*\\)" not correct starts *, quantifier, , causes incorrect regex syntax issue cannot have multiple string start positions (it logical error checked regex engine when parsing expression).

the substring should select left pair

use lazy matching in left part - .*?:

mystr <- "layout = (1,2,3,4) //lala(huhu)" gsub(".*?(\\([^()]*\\)).*", "\\1", mystr) ##    ^^^ 

see ideone demo

result: [1] "(1,2,3,4)"

lazy matching ensure match few characters possible before first occurrence of subsequent pattern.

note if want extract multiple (number,number....) values, need use

library(stringr) str_extract(str,"\\(\\d+(\\s*,\\d+)*\\)") 

see regex demo here.


Comments