The base class for all Scanners.
It is a subclass of Ruby’s great StringScanner
, which makes it
easy to access the scanning methods inside.
It is also Enumerable
, so you can use it like an Array of
Tokens:
require 'coderay' c_scanner = CodeRay::Scanners[:c].new "if (*p == '{') nest++;" for text, kind in c_scanner puts text if kind == :operator end # prints: (*==)++;
OK, this is a very simple example :) You can also use map
,
any?
, find
and even sort_by
, if you
want.
The default options for all scanner classes.
Define @default_options for subclasses.
Raised if a Scanner fails while scanning
The encoding used internally by this scanner.
# File lib/coderay/scanner.rb, line 88 def encoding name = 'UTF-8' @encoding ||= defined?(Encoding.find) && Encoding.find(name) end
The typical filename suffix for this scanner’s language.
# File lib/coderay/scanner.rb, line 83 def file_extension extension = lang @file_extension ||= extension.to_s end
Create a new Scanner.
code
is the input String and is handled by the superclass
StringScanner.
options
is a Hash with Symbols as keys. It is merged with the
default options of the class (you can overwrite default options here.)
Else, a Tokens object is used.
# File lib/coderay/scanner.rb, line 142 def initialize code = '', options = {} if self.class == Scanner raise NotImplementedError, "I am only the basic Scanner class. I can't scan anything. :( Use my subclasses." end @options = self.class::DEFAULT_OPTIONS.merge options super self.class.normalize(code) @tokens = options[:tokens] || Tokens.new @tokens.scanner = self if @tokens.respond_to? :scanner= setup end
Normalizes the given code into a string with UNIX newlines, in the scanner’s internal encoding, with invalid and undefined charachters replaced by placeholders. Always returns a new object.
# File lib/coderay/scanner.rb, line 68 def normalize code # original = code code = code.to_s unless code.is_a? ::String return code if code.empty? if code.respond_to? :encoding code = encode_with_encoding code, self.encoding else code = to_unix code end # code = code.dup if code.eql? original code end
# File lib/coderay/scanner.rb, line 99 def encode_with_encoding code, target_encoding if code.encoding == target_encoding if code.valid_encoding? return to_unix(code) else source_encoding = guess_encoding code end else source_encoding = code.encoding end # print "encode_with_encoding from #{source_encoding} to #{target_encoding}" code.encode target_encoding, source_encoding, :universal_newline => true, :undef => :replace, :invalid => :replace end
# File lib/coderay/scanner.rb, line 117 def guess_encoding s #:nocov: IO.popen("file -b --mime -", "w+") do |file| file.write s[0, 1024] file.close_write begin Encoding.find file.gets[%rcharset=([-\w]+)/, 1] rescue ArgumentError Encoding::BINARY end end #:nocov: end
# File lib/coderay/scanner.rb, line 113 def to_unix code code.index(\r\) ? code.gsub(%r\r\n?/, "\n") : code end
The string in binary encoding.
To be used with pos, which is the index of the byte the scanner will scan next.
# File lib/coderay/scanner.rb, line 242 def binary_string @binary_string ||= if string.respond_to?(:bytesize) && string.bytesize != string.size #:nocov: string.dup.force_encoding('binary') #:nocov: else string end end
The current column position of the scanner, starting with 1. See also: line.
# File lib/coderay/scanner.rb, line 233 def column pos = self.pos return 1 if pos <= 0 pos - (binary_string.rindex(\n\, pos - 1) || -1) end
Traverse the tokens.
# File lib/coderay/scanner.rb, line 216 def each &block tokens.each(&block) end
the default file extension for this scanner
# File lib/coderay/scanner.rb, line 177 def file_extension self.class.file_extension end
the Plugin ID for this scanner
# File lib/coderay/scanner.rb, line 172 def lang self.class.lang end
The current line position of the scanner, starting with 1. See also: column.
Beware, this is implemented inefficiently. It should be used for debugging only.
# File lib/coderay/scanner.rb, line 226 def line pos = self.pos return 1 if pos <= 0 binary_string[0...pos].count("\n") + 1 end
Sets back the scanner. Subclasses should redefine the #reset_instance method instead of this one.
# File lib/coderay/scanner.rb, line 159 def reset super reset_instance end
Set a new string to be scanned.
# File lib/coderay/scanner.rb, line 165 def string= code code = self.class.normalize(code) super code reset_instance end
Scan the code and returns all tokens in a Tokens object.
# File lib/coderay/scanner.rb, line 182 def tokenize source = nil, options = {} options = @options.merge(options) @tokens = options[:tokens] || @tokens || Tokens.new @tokens.scanner = self if @tokens.respond_to? :scanner= case source when Array self.string = self.class.normalize(source.join) when nil reset else self.string = self.class.normalize(source) end begin scan_tokens @tokens, options rescue => e message = "Error in %s#scan_tokens, initial state was: %p" % [self.class, defined?(state) && state] raise_inspect e.message, @tokens, message, 30, e.backtrace end @cached_tokens = @tokens if source.is_a? Array @tokens.split_into_parts(*source.map { |part| part.size }) else @tokens end end
Cache the result of tokenize.
# File lib/coderay/scanner.rb, line 211 def tokens @cached_tokens ||= tokenize end
Scanner error with additional status information
# File lib/coderay/scanner.rb, line 280 def raise_inspect msg, tokens, state = self.state || 'No state given!', ambit = 30, backtrace = caller raise ScanError, " ***ERROR in %s: %s (after %d tokens) tokens: %s current line: %d column: %d pos: %d matched: %p state: %p bol? = %p, eos? = %p surrounding code: %p ~~ %p ***ERROR*** " % [ File.basename(caller[0]), msg, tokens.respond_to?(:size) ? tokens.size : 0, tokens.respond_to?(:last) ? tokens.last(10).map { |t| t.inspect }.join("\n") : '', line, column, pos, matched, state, bol?, eos?, binary_string[pos - ambit, ambit], binary_string[pos, ambit], ], backtrace end
Resets the scanner.
# File lib/coderay/scanner.rb, line 273 def reset_instance @tokens.clear if @tokens.respond_to?(:clear) && !@options[:keep_tokens] @cached_tokens = nil @binary_string = nil if defined? @binary_string end
Shorthand for scan_until(/z/). This method also avoids a JRuby 1.9 mode bug.
# File lib/coderay/scanner.rb, line 313 def scan_rest rest = self.rest terminate rest end
This is the central method, and commonly the only one a subclass implements.
Subclasses must implement this method; it must return tokens
and must only use Tokens#<< for storing scanned tokens!
# File lib/coderay/scanner.rb, line 268 def scan_tokens tokens, options # :doc: raise NotImplementedError, "#{self.class}#scan_tokens not implemented." end
Can be implemented by subclasses to do some initialization that has to be done once per instance.
Use reset for initialization that has to be done once per scan.
# File lib/coderay/scanner.rb, line 260 def setup # :doc: end