class CodeRay::Scanners::Scanner

Scanner

The base class for all Scanners.

It is a subclass of Ruby’s great StringScanner, which makes it easy to access the scanning methods inside.

It is also Enumerable, so you can use it like an Array of Tokens:

require 'coderay'

c_scanner = CodeRay::Scanners[:c].new "if (*p == '{') nest++;"

for text, kind in c_scanner
  puts text if kind == :operator
end

# prints: (*==)++;

OK, this is a very simple example :) You can also use map, any?, find and even sort_by, if you want.

Constants

DEFAULT_OPTIONS

The default options for all scanner classes.

Define @default_options for subclasses.

KINDS_NOT_LOC
ScanError

Raised if a Scanner fails while scanning

Attributes

state[RW]

Public Class Methods

encoding(name = 'UTF-8') click to toggle source

The encoding used internally by this scanner.

# File lib/coderay/scanner.rb, line 88
def encoding name = 'UTF-8'
  @encoding ||= defined?(Encoding.find) && Encoding.find(name)
end
file_extension(extension = lang) click to toggle source

The typical filename suffix for this scanner’s language.

# File lib/coderay/scanner.rb, line 83
def file_extension extension = lang
  @file_extension ||= extension.to_s
end
lang() click to toggle source

The lang of this Scanner class, which is equal to its Plugin ID.

# File lib/coderay/scanner.rb, line 93
def lang
  @plugin_id
end
new(code = '', options = {}) click to toggle source

Create a new Scanner.

  • code is the input String and is handled by the superclass StringScanner.

  • options is a Hash with Symbols as keys. It is merged with the default options of the class (you can overwrite default options here.)

Else, a Tokens object is used.

# File lib/coderay/scanner.rb, line 142
def initialize code = '', options = {}
  if self.class == Scanner
    raise NotImplementedError, "I am only the basic Scanner class. I can't scan anything. :( Use my subclasses."
  end
  
  @options = self.class::DEFAULT_OPTIONS.merge options
  
  super self.class.normalize(code)
  
  @tokens = options[:tokens] || Tokens.new
  @tokens.scanner = self if @tokens.respond_to? :scanner=
  
  setup
end
normalize(code) click to toggle source

Normalizes the given code into a string with UNIX newlines, in the scanner’s internal encoding, with invalid and undefined charachters replaced by placeholders. Always returns a new object.

# File lib/coderay/scanner.rb, line 68
def normalize code
  # original = code
  code = code.to_s unless code.is_a? ::String
  return code if code.empty?
  
  if code.respond_to? :encoding
    code = encode_with_encoding code, self.encoding
  else
    code = to_unix code
  end
  # code = code.dup if code.eql? original
  code
end

Protected Class Methods

encode_with_encoding(code, target_encoding) click to toggle source
# File lib/coderay/scanner.rb, line 99
def encode_with_encoding code, target_encoding
  if code.encoding == target_encoding
    if code.valid_encoding?
      return to_unix(code)
    else
      source_encoding = guess_encoding code
    end
  else
    source_encoding = code.encoding
  end
  # print "encode_with_encoding from #{source_encoding} to #{target_encoding}"
  code.encode target_encoding, source_encoding, :universal_newline => true, :undef => :replace, :invalid => :replace
end
guess_encoding(s) click to toggle source
# File lib/coderay/scanner.rb, line 117
def guess_encoding s
  #:nocov:
  IO.popen("file -b --mime -", "w+") do |file|
    file.write s[0, 1024]
    file.close_write
    begin
      Encoding.find file.gets[%rcharset=([-\w]+)/, 1]
    rescue ArgumentError
      Encoding::BINARY
    end
  end
  #:nocov:
end
to_unix(code) click to toggle source
# File lib/coderay/scanner.rb, line 113
def to_unix code
  code.index(\r\) ? code.gsub(%r\r\n?/, "\n") : code
end

Public Instance Methods

binary_string() click to toggle source

The string in binary encoding.

To be used with pos, which is the index of the byte the scanner will scan next.

# File lib/coderay/scanner.rb, line 242
def binary_string
  @binary_string ||=
    if string.respond_to?(:bytesize) && string.bytesize != string.size
      #:nocov:
      string.dup.force_encoding('binary')
      #:nocov:
    else
      string
    end
end
column(pos = self.pos) click to toggle source

The current column position of the scanner, starting with 1. See also: line.

# File lib/coderay/scanner.rb, line 233
def column pos = self.pos
  return 1 if pos <= 0
  pos - (binary_string.rindex(\n\, pos - 1) || -1)
end
each(&block) click to toggle source

Traverse the tokens.

# File lib/coderay/scanner.rb, line 216
def each &block
  tokens.each(&block)
end
file_extension() click to toggle source

the default file extension for this scanner

# File lib/coderay/scanner.rb, line 177
def file_extension
  self.class.file_extension
end
lang() click to toggle source

the Plugin ID for this scanner

# File lib/coderay/scanner.rb, line 172
def lang
  self.class.lang
end
line(pos = self.pos) click to toggle source

The current line position of the scanner, starting with 1. See also: column.

Beware, this is implemented inefficiently. It should be used for debugging only.

# File lib/coderay/scanner.rb, line 226
def line pos = self.pos
  return 1 if pos <= 0
  binary_string[0...pos].count("\n") + 1
end
reset() click to toggle source

Sets back the scanner. Subclasses should redefine the #reset_instance method instead of this one.

# File lib/coderay/scanner.rb, line 159
def reset
  super
  reset_instance
end
string=(code) click to toggle source

Set a new string to be scanned.

# File lib/coderay/scanner.rb, line 165
def string= code
  code = self.class.normalize(code)
  super code
  reset_instance
end
tokenize(source = nil, options = {}) click to toggle source

Scan the code and returns all tokens in a Tokens object.

# File lib/coderay/scanner.rb, line 182
def tokenize source = nil, options = {}
  options = @options.merge(options)
  @tokens = options[:tokens] || @tokens || Tokens.new
  @tokens.scanner = self if @tokens.respond_to? :scanner=
  case source
  when Array
    self.string = self.class.normalize(source.join)
  when nil
    reset
  else
    self.string = self.class.normalize(source)
  end
  
  begin
    scan_tokens @tokens, options
  rescue => e
    message = "Error in %s#scan_tokens, initial state was: %p" % [self.class, defined?(state) && state]
    raise_inspect e.message, @tokens, message, 30, e.backtrace
  end
  
  @cached_tokens = @tokens
  if source.is_a? Array
    @tokens.split_into_parts(*source.map { |part| part.size })
  else
    @tokens
  end
end
tokens() click to toggle source

Cache the result of tokenize.

# File lib/coderay/scanner.rb, line 211
def tokens
  @cached_tokens ||= tokenize
end

Protected Instance Methods

raise_inspect(msg, tokens, state = self.state || 'No state given!', ambit = 30, backtrace = caller) click to toggle source

Scanner error with additional status information

# File lib/coderay/scanner.rb, line 280
      def raise_inspect msg, tokens, state = self.state || 'No state given!', ambit = 30, backtrace = caller
        raise ScanError, "

***ERROR in %s: %s (after %d tokens)

tokens:
%s

current line: %d  column: %d  pos: %d
matched: %p  state: %p
bol? = %p,  eos? = %p

surrounding code:
%p  ~~  %p


***ERROR***

" % [
          File.basename(caller[0]),
          msg,
          tokens.respond_to?(:size) ? tokens.size : 0,
          tokens.respond_to?(:last) ? tokens.last(10).map { |t| t.inspect }.join("\n") : '',
          line, column, pos,
          matched, state, bol?, eos?,
          binary_string[pos - ambit, ambit],
          binary_string[pos, ambit],
        ], backtrace
      end
reset_instance() click to toggle source

Resets the scanner.

# File lib/coderay/scanner.rb, line 273
def reset_instance
  @tokens.clear if @tokens.respond_to?(:clear) && !@options[:keep_tokens]
  @cached_tokens = nil
  @binary_string = nil if defined? @binary_string
end
scan_rest() click to toggle source

Shorthand for scan_until(/z/). This method also avoids a JRuby 1.9 mode bug.

# File lib/coderay/scanner.rb, line 313
def scan_rest
  rest = self.rest
  terminate
  rest
end
scan_tokens(tokens, options) click to toggle source

This is the central method, and commonly the only one a subclass implements.

Subclasses must implement this method; it must return tokens and must only use Tokens#<< for storing scanned tokens!

# File lib/coderay/scanner.rb, line 268
def scan_tokens tokens, options  # :doc:
  raise NotImplementedError, "#{self.class}#scan_tokens not implemented."
end
setup() click to toggle source

Can be implemented by subclasses to do some initialization that has to be done once per instance.

Use reset for initialization that has to be done once per scan.

# File lib/coderay/scanner.rb, line 260
def setup  # :doc:
end