Scanner

The base class for all Scanners.

It is a subclass of Ruby’s great StringScanner, which makes it easy to access the scanning methods inside.

It is also Enumerable, so you can use it like an Array of Tokens:

  require 'coderay'

  c_scanner = CodeRay::Scanners[:c].new "if (*p == '{') nest++;"

  for text, kind in c_scanner
    puts text if kind == :operator
  end

  # prints: (*==)++;

OK, this is a very simple example :) You can also use map, +any?+, find and even sort_by, if you want.

Methods
Included Modules
Constants
ScanError = Class.new StandardError
  Raised if a Scanner fails while scanning
DEFAULT_OPTIONS = { }
  The default options for all scanner classes.

Define @default_options for subclasses.

KINDS_NOT_LOC = [:comment, :doctype, :docstring]
Attributes
[RW] state
Public Class methods
encoding(name = 'UTF-8')

The encoding used internally by this scanner.

# File lib/coderay/scanner.rb, line 89
        def encoding name = 'UTF-8'
          @encoding ||= defined?(Encoding.find) && Encoding.find(name)
        end
file_extension(extension = lang)

The typical filename suffix for this scanner’s language.

# File lib/coderay/scanner.rb, line 84
        def file_extension extension = lang
          @file_extension ||= extension.to_s
        end
lang()

The lang of this Scanner class, which is equal to its Plugin ID.

# File lib/coderay/scanner.rb, line 94
        def lang
          @plugin_id
        end
new(code = '', options = {})

Create a new Scanner.

  • code is the input String and is handled by the superclass StringScanner.
  • options is a Hash with Symbols as keys. It is merged with the default options of the class (you can overwrite default options here.)

Else, a Tokens object is used.

# File lib/coderay/scanner.rb, line 143
      def initialize code = '', options = {}
        if self.class == Scanner
          raise NotImplementedError, "I am only the basic Scanner class. I can't scan anything. :( Use my subclasses."
        end
        
        @options = self.class::DEFAULT_OPTIONS.merge options
        
        super self.class.normalize(code)
        
        @tokens = options[:tokens] || Tokens.new
        @tokens.scanner = self if @tokens.respond_to? :scanner=
        
        setup
      end
normalize(code)

Normalizes the given code into a string with UNIX newlines, in the scanner’s internal encoding, with invalid and undefined charachters replaced by placeholders. Always returns a new object.

# File lib/coderay/scanner.rb, line 69
        def normalize code
          # original = code
          code = code.to_s unless code.is_a? ::String
          return code if code.empty?
          
          if code.respond_to? :encoding
            code = encode_with_encoding code, self.encoding
          else
            code = to_unix code
          end
          # code = code.dup if code.eql? original
          code
        end
Public Instance methods
binary_string()

The string in binary encoding.

To be used with pos, which is the index of the byte the scanner will scan next.

# File lib/coderay/scanner.rb, line 243
      def binary_string
        @binary_string ||=
          if string.respond_to?(:bytesize) && string.bytesize != string.size
            #:nocov:
            string.dup.force_encoding('binary')
            #:nocov:
          else
            string
          end
      end
column(pos = self.pos)

The current column position of the scanner, starting with 1. See also: line.

# File lib/coderay/scanner.rb, line 234
      def column pos = self.pos
        return 1 if pos <= 0
        pos - (binary_string.rindex(?\n, pos - 1) || -1)
      end
each(&block)

Traverse the tokens.

# File lib/coderay/scanner.rb, line 217
      def each &block
        tokens.each(&block)
      end
file_extension()

the default file extension for this scanner

# File lib/coderay/scanner.rb, line 178
      def file_extension
        self.class.file_extension
      end
lang()

the Plugin ID for this scanner

# File lib/coderay/scanner.rb, line 173
      def lang
        self.class.lang
      end
line(pos = self.pos)

The current line position of the scanner, starting with 1. See also: column.

Beware, this is implemented inefficiently. It should be used for debugging only.

# File lib/coderay/scanner.rb, line 227
      def line pos = self.pos
        return 1 if pos <= 0
        binary_string[0...pos].count("\n") + 1
      end
reset()

Sets back the scanner. Subclasses should redefine the reset_instance method instead of this one.

# File lib/coderay/scanner.rb, line 160
      def reset
        super
        reset_instance
      end
string=(code)

Set a new string to be scanned.

# File lib/coderay/scanner.rb, line 166
      def string= code
        code = self.class.normalize(code)
        super code
        reset_instance
      end
tokenize(source = nil, options = {})

Scan the code and returns all tokens in a Tokens object.

# File lib/coderay/scanner.rb, line 183
      def tokenize source = nil, options = {}
        options = @options.merge(options)
        @tokens = options[:tokens] || @tokens || Tokens.new
        @tokens.scanner = self if @tokens.respond_to? :scanner=
        case source
        when Array
          self.string = self.class.normalize(source.join)
        when nil
          reset
        else
          self.string = self.class.normalize(source)
        end
        
        begin
          scan_tokens @tokens, options
        rescue => e
          message = "Error in %s#scan_tokens, initial state was: %p" % [self.class, defined?(state) && state]
          raise_inspect e.message, @tokens, message, 30, e.backtrace
        end
        
        @cached_tokens = @tokens
        if source.is_a? Array
          @tokens.split_into_parts(*source.map { |part| part.size })
        else
          @tokens
        end
      end
tokens()

Cache the result of tokenize.

# File lib/coderay/scanner.rb, line 212
      def tokens
        @cached_tokens ||= tokenize
      end
Protected Instance methods
scan_tokens(tokens, options

This is the central method, and commonly the only one a subclass implements.

Subclasses must implement this method; it must return tokens and must only use Tokens#<< for storing scanned tokens!

# File lib/coderay/scanner.rb, line 269
      def scan_tokens tokens, options  # :doc:
        raise NotImplementedError, "#{self.class}#scan_tokens not implemented."
      end
setup(

Can be implemented by subclasses to do some initialization that has to be done once per instance.

Use reset for initialization that has to be done once per scan.

# File lib/coderay/scanner.rb, line 261
      def setup  # :doc:
      end