Scanner
The base class for all Scanners.
It is a subclass of Ruby’s great StringScanner, which makes it easy to access the scanning methods inside.
It is also Enumerable, so you can use it like an Array of Tokens:
require 'coderay'
c_scanner = CodeRay::Scanners[:c].new "if (*p == '{') nest++;"
for text, kind in c_scanner
puts text if kind == :operator
end
# prints: (*==)++;
OK, this is a very simple example :) You can also use map, +any?+, find and even sort_by, if you want.
- binary_string
- column
- each
- encoding
- file_extension
- file_extension
- lang
- lang
- line
- new
- normalize
- reset
- scan_tokens
- setup
- string=
- tokenize
- tokens
- Enumerable
| ScanError | = | Class.new StandardError |
| Raised if a Scanner fails while scanning | ||
| DEFAULT_OPTIONS | = | { } |
|
The default options for all scanner classes.
Define @default_options for subclasses. |
||
| KINDS_NOT_LOC | = | [:comment, :doctype, :docstring] |
| [RW] | state |
The encoding used internally by this scanner.
# File lib/coderay/scanner.rb, line 89 def encoding name = 'UTF-8' @encoding ||= defined?(Encoding.find) && Encoding.find(name) end
The typical filename suffix for this scanner’s language.
# File lib/coderay/scanner.rb, line 84 def file_extension extension = lang @file_extension ||= extension.to_s end
# File lib/coderay/scanner.rb, line 94 def lang @plugin_id end
Create a new Scanner.
- code is the input String and is handled by the superclass StringScanner.
- options is a Hash with Symbols as keys. It is merged with the default options of the class (you can overwrite default options here.)
Else, a Tokens object is used.
# File lib/coderay/scanner.rb, line 143 def initialize code = '', options = {} if self.class == Scanner raise NotImplementedError, "I am only the basic Scanner class. I can't scan anything. :( Use my subclasses." end @options = self.class::DEFAULT_OPTIONS.merge options super self.class.normalize(code) @tokens = options[:tokens] || Tokens.new @tokens.scanner = self if @tokens.respond_to? :scanner= setup end
Normalizes the given code into a string with UNIX newlines, in the scanner’s internal encoding, with invalid and undefined charachters replaced by placeholders. Always returns a new object.
# File lib/coderay/scanner.rb, line 69 def normalize code # original = code code = code.to_s unless code.is_a? ::String return code if code.empty? if code.respond_to? :encoding code = encode_with_encoding code, self.encoding else code = to_unix code end # code = code.dup if code.eql? original code end
The string in binary encoding.
To be used with pos, which is the index of the byte the scanner will scan next.
# File lib/coderay/scanner.rb, line 243 def binary_string @binary_string ||= if string.respond_to?(:bytesize) && string.bytesize != string.size #:nocov: string.dup.force_encoding('binary') #:nocov: else string end end
The current column position of the scanner, starting with 1. See also: line.
# File lib/coderay/scanner.rb, line 234 def column pos = self.pos return 1 if pos <= 0 pos - (binary_string.rindex(?\n, pos - 1) || -1) end
Traverse the tokens.
# File lib/coderay/scanner.rb, line 217 def each &block tokens.each(&block) end
the default file extension for this scanner
# File lib/coderay/scanner.rb, line 178 def file_extension self.class.file_extension end
the Plugin ID for this scanner
# File lib/coderay/scanner.rb, line 173 def lang self.class.lang end
The current line position of the scanner, starting with 1. See also: column.
Beware, this is implemented inefficiently. It should be used for debugging only.
# File lib/coderay/scanner.rb, line 227 def line pos = self.pos return 1 if pos <= 0 binary_string[0...pos].count("\n") + 1 end
Sets back the scanner. Subclasses should redefine the reset_instance method instead of this one.
# File lib/coderay/scanner.rb, line 160 def reset super reset_instance end
Set a new string to be scanned.
# File lib/coderay/scanner.rb, line 166 def string= code code = self.class.normalize(code) super code reset_instance end
Scan the code and returns all tokens in a Tokens object.
# File lib/coderay/scanner.rb, line 183 def tokenize source = nil, options = {} options = @options.merge(options) @tokens = options[:tokens] || @tokens || Tokens.new @tokens.scanner = self if @tokens.respond_to? :scanner= case source when Array self.string = self.class.normalize(source.join) when nil reset else self.string = self.class.normalize(source) end begin scan_tokens @tokens, options rescue => e message = "Error in %s#scan_tokens, initial state was: %p" % [self.class, defined?(state) && state] raise_inspect e.message, @tokens, message, 30, e.backtrace end @cached_tokens = @tokens if source.is_a? Array @tokens.split_into_parts(*source.map { |part| part.size }) else @tokens end end
Cache the result of tokenize.
# File lib/coderay/scanner.rb, line 212 def tokens @cached_tokens ||= tokenize end
This is the central method, and commonly the only one a subclass implements.
Subclasses must implement this method; it must return tokens and must only use Tokens#<< for storing scanned tokens!
# File lib/coderay/scanner.rb, line 269 def scan_tokens tokens, options # :doc: raise NotImplementedError, "#{self.class}#scan_tokens not implemented." end
Can be implemented by subclasses to do some initialization that has to be done once per instance.
Use reset for initialization that has to be done once per scan.
# File lib/coderay/scanner.rb, line 261 def setup # :doc: end
