i facing performance issue while searching content of file. using filestream
class read files (~10 files involved each search each being ~70 mb in size). however, of these files simultaneously being accessed , updated process during search. such, cannot use buffersize
reading files. using buffer size in streamreader
takes 3 minutes though using regex.
has come across similar situation , offer pointers on improving performance of file search?
code snippet
private static int buffersize = 32768; using (filestream fs = file.open(filepath, filemode.open, fileaccess.read, fileshare.readwrite)) { using (textreader txtreader = new streamreader(fs, encoding.utf8, true, buffersize)) { system.text.regularexpressions.regex patternmatching = new system.text.regularexpressions.regex(@"(?=\d{4}-\d{2}-\d{2}\s\d{2}:\d{2}:\d{2})(.*?)(?=\n\d{4}-\d{2}-\d{2}\s\d{2}:\d{2}:\d{2})", system.text.regularexpressions.regexoptions.ignorecase); system.text.regularexpressions.regex datestringmatch = new regex(@"^\d{4}-\d{2}-\d{2}\s\d{2}:\d{2}:\d{2}"); char[] temp = new char[1048576]; while (txtreader.readblock(temp, 0, 1048576) > 0) { stringbuilder parsestring = new stringbuilder(); parsestring.append(temp); if (temp[1023].tostring() != environment.newline) { parsestring.append(txtreader.readline()); while (txtreader.peek() > 0 && !(txtreader.peek() >= 48 && txtreader.peek() <= 57)) { parsestring.append(txtreader.readline()); } } if (parsestring.length > 0) { string[] allrecords = patternmatching.split(parsestring.tostring()); foreach (var item in allrecords) { var contentstring = item.trim(); if (!string.isnullorwhitespace(contentstring)) { var matches = datestringmatch.matches(contentstring); if (matches.count > 0) { var rowdatetime = datetime.minvalue; if (datetime.tryparse(matches[0].value, out rowdatetime)) { if (rowdatetime >= startdate && rowdatetime < enddate) { if (contentstring.tolowerinvariant().contains(searchtext)) { var result = new searchresult { logfiletype = logfiletype, message = string.format(messagetemplatenew, item), timestamp = rowdatetime, componentname = componentname, filename = filepath, servername = servername }; searchresults.add(result); } } } } } } } } } } return searchresults;
some time ago had analyse many filezilla server logfiles each >120mb. used simple list lines of each logfile , had great performance searching specific lines.
list<string> filecontent = file.readalllines(pathtofile).tolist()
but in case think main reason bad performance isn't reading file. try stopwatch parts of loop check time spent. regex , tryparse can time consuming if used many times in loop yours.
Comments
Post a Comment