StringScanner is Ruby extension for fast scanning.
Since Regexp class of Ruby cannot match to sub-string, to scan string you must make new String. For example
p " I_want_to_match_this_word but can't".index( /\A\w+/, 1 )This code display "nil". Another way to match is as like this:
str = " word word word" while str.size > 0 do if /\A[ \t]+/ === str then str = $' elsif /\A\w+/ === str then str = $' end endBut this method has big problem on speed issue. $' makes new string EVERY time. Then, in this example, all these strings are created:
" word word word" "word word word" " word word" "word word" " word" "word" ""This makes heavy load. If length of 'str' is 50KB, nearly 50KB ** 2 / 5 = 50MB memory is used.
StringScanner resolves this.
StringScanner has C string and pointer to it. When scanning, StringScanner
do only increment pointer and not create new string. As a result, both of
speed and application memory size decrease.
Then, here's two short example of scanning routine.
First is easy to write but slow scanning code. Second is also easy to write,
but FAST scanning code using StringScanner class.
First example:
ATOM = /\A\w+/ SPACE = /\A[ \t]+/ while str.size > 0 do if ATOM === str then str = $' return $& elsif SPACE === str then str = $' return $& end end
Second example:
ATOM = /\A\w+/ SPACE = /\A[ \t]+/ s = StringScanner.new( str ) while s.rest? do if tmp = s.scan( ATOM ) then return tmp elsif tmp = s.scan( SPACE ) then return tmp end end
Usage of StringScanner is simple.
First: Create StringScanner object, next call 'scan' method. It return matched
string and at the same time it increments its internal maintained "scan pointer".
It is simply implemented as pointer to char(char*).
'skip' method is similer to 'scan', but it returns length of matched string.
s = StringScanner.new( "abcdefg" ) # scan pointer is on 'a', index 0 puts s.scan( /a/ ) # return 'a'. scan pointer is on 'b', index 1 puts s.skip( /bc/ ) # return 2. scan pointer is on 'd', index 3At that time previous "scan pointer" is preserved in StringScanner object. Then, str[ prev pointer..current pointer ] means the string which is returned from 'scan' --- "matched string". We can get it by 'matched' method.
puts s.matched # return 'bc'. scan pointer don't move puts s.scan( /a/ ) # return nil. scan pointer don't move, too. puts s.matched # return 'bc'.
To puts scan pointer back, is also permitted. 'unscan' method implements that. But 'unscan' can do only once for one 'scan' because StringScanner object can't preserve more than one pointer.
puts s.scan( /de/ ) # return 'de'. scan pointer is on 'f', index 5 s.unscan # scan pointer is on 'd', index 3 puts s.scan( /def/ ) # return 'def'. scan pointer is on 'g', index 6For more details, see reference manual. And of course, source code is most inportant documentation, I think :-)
Copyright (c) 1999-2001 Minero Aoki <aamine@dp.u-netsurf.ne.jp>