syntax - Finding lines that start with a digit in Scala using filter() method -


i python programmer , python api slow spark application , decided port code spark scala api, compare computation time.

i trying filter out lines start numeric characters huge file using scala api in spark. in file, lines have numbers , have words , want lines have numbers.

so, in python application, have these lines.

l = sc.textfile("my_file_path") l_filtered = l.filter(lambda s: s[0].isdigit()) 

which works want.

this have tried far.

val l = sc.textfile("my_file_path") val l_filtered = l.filter(x => x.forall(_.isdigit)) 

this throws out error saying char not have forall() function.

i tried taking first character of lines using s.take(1) , apply isdigit() function on in following way.

val l = sc.textfile("my_file_path") val l_filtered = l.filter(x => x.take(1).isdigit) 

and too...

val l = sc.textfile("my_file_path") val l_filtered = l.filter(x => x.take(1).character.isdigit) 

this throws error.

this small error , not accustomed scala syntax, having hard time figuring out. appreciated.

edit: answered question, tried writing function, unable use in filter() function in application. to apply function lines in file.

in scala indexing syntax uses parens () instead of brackets []. exact translation of python code this:

val l = sc.textfile("my_file_path") val l_filtered = l.filter(_(0).isdigit) 

a more idiomatic extraction of first symbol using head method:

val l = sc.textfile("my_file_path") val l_filtered = l.filter(_.head.isdigit) 

both of these methods fail if file contains empty lines.

if that's case, want this:

val l = sc.textfile("my_file_path") val l_filtered = l.filter(_.headoption.map(_.isdigit).getorelse(false)) 

upd.

as curious noted map(predicate).getorelse(false) on option shortened exists(predicate):

val l = sc.textfile("my_file_path") val l_filtered = l.filter(_.headoption.exists(_.isdigit)) 

Comments