#497 open
kostakimu

project search not working for .erb files

Reported by kostakimu | April 4th, 2011 @ 10:58 AM

the project search is working fine in my rails project for javascript, rb, and yml files. however, it doesn't find anything in html.erb files.

Version: 0.11
Ruby Version: 1.8.7
Jruby version: 1.5.3
Redcar.environment: user

running on mac os x

Comments and changes to this ticket

  • kostakimu

    kostakimu April 4th, 2011 @ 08:07 PM

    after removing the .redcar dir and then starting up, indexing seems to fail due to "OutOfMemory". but the next index seems to pass:

    Completed Description Duration
    21:03:14 /Users/johannes/dev/mp/mpcp: refresh index 0.008
    21:03:14 /Users/johannes/dev/mp/mpcp: reparse files for declarations 0.238
    21:03:13 /Users/johannes/dev/mp/mpcp: refresh index 4.675
    Java::JavaLang::OutOfMemoryError Java heap space

    org.jruby.util.ByteList.(ByteList.java:91)
    org.jruby.util.io.ChannelStream.readall(ChannelStream.java:365)
    org.jruby.RubyIO.readAll(RubyIO.java:2825)
    org.jruby.RubyIO.read(RubyIO.java:2641)
    org.jruby.RubyIO.read(RubyIO.java:3327)
    org.jruby.RubyIO$s_method_multi$RUBYINVOKER$read.call(org/jruby/RubyIO$s_method_multi$RUBYINVOKER$read.gen:65535)
    org.jruby.internal.runtime.methods.JavaMethod$JavaMethodOneOrNBlock.call(JavaMethod.java:319)
    org.jruby.runtime.callsite.CachingCallSite.call(CachingCallSite.java:146)
    org.jruby.ast.CallOneArgNode.interpret(CallOneArgNode.java:57)
    org.jruby.ast.DAsgnNode.interpret(DAsgnNode.java:110)
    org.jruby.ast.NewlineNode.interpret(NewlineNode.java:104)
    org.jruby.ast.BlockNode.interpret(BlockNode.java:71)
    org.jruby.ast.IfNode.interpret(IfNode.java:119)
    org.jruby.ast.NewlineNode.interpret(NewlineNode.java:104)
    org.jruby.runtime.InterpretedBlock.evalBlockBody(InterpretedBlock.java:373)
    org.jruby.runtime.InterpretedBlock.yieldSpecific(InterpretedBlock.java:259)
    org.jruby.runtime.Block.yieldSpecific(Block.java:117)
    org.jruby.RubyHash$11.visit(RubyHash.java:1132)
    org.jruby.RubyHash.visitAll(RubyHash.java:579)
    org.jruby.RubyHash.iteratorVisitAll(RubyHash.java:1119)
    org.jruby.RubyHash.each(RubyHash.java:1130)
    org.jruby.RubyHash.each19(RubyHash.java:1150)
    org.jruby.RubyHash$i_method_0_0$RUBYFRAMEDINVOKER$each19.call(org/jruby/RubyHash$i_method_0_0$RUBYFRAMEDINVOKER$each19.gen:65535)
    org.jruby.runtime.callsite.CachingCallSite.cacheAndCall(CachingCallSite.java:299)
    org.jruby.runtime.callsite.CachingCallSite.callBlock(CachingCallSite.java:117)
    org.jruby.runtime.callsite.CachingCallSite.call(CachingCallSite.java:122)
    org.jruby.ast.CallNoArgBlockNode.interpret(CallNoArgBlockNode.java:64)
    org.jruby.ast.NewlineNode.interpret(NewlineNode.java:104)
    org.jruby.ast.BlockNode.interpret(BlockNode.java:71)
    org.jruby.ast.RescueNode.executeBody(RescueNode.java:199)
    org.jruby.ast.RescueNode.interpretWithJavaExceptions(RescueNode.java:118)
    org.jruby.ast.RescueNode.interpret(RescueNode.java:110)

  • kostakimu

    kostakimu April 4th, 2011 @ 08:12 PM

    jep. obviously the out-of-memory is the problem. checked another (smaller) project with no out-of-memory error and successfully searched and found text in html.erb files..

  • kostakimu

    kostakimu April 4th, 2011 @ 08:41 PM

    setting the xmx value from 320 to 1024m in runner.rb doesn't even help :(

  • Daniel Lucraft

    Daniel Lucraft April 8th, 2011 @ 03:59 PM

    Do you have anything like cyclic symlinks in your project?

  • kostakimu

    kostakimu April 8th, 2011 @ 04:24 PM

    nope. but i spent some time finding the problem. obviously the binary detection doesn't work properly. that's why the memory usage got very high.. i managed to get it working by those changes in lucene_index.rb:

    def update
          changed_files = @project.file_list.changed_since(last_updated)
          @last_updated = Time.now
          changed_files.reject! do |fn, ts|
            fn.index(@project.config_dir) or Redcar::Project::FileList.hide_file_path?(fn)
          end
          files_array = changed_files.to_a
          start = 0
          stop = 99
          begin
            while start < files_array.size do
              files = files_array.slice(start, stop)
              Lucene::Transaction.run do
                @lucene_index ||= Lucene::Index.new(lucene_index_dir)
                @lucene_index.field_infos[:contents][:store] = true
                @lucene_index.field_infos[:contents][:tokenized] = true
                files.each do |fn, ts|
                  unless File.basename(fn)[0..0] == "." or fn.include?(".git")
                    unless BinaryDataDetector.binary?(File.new(fn).read(200))
                      next if File.size(fn) > (500 * 1024) # ommit files larger than 500kb
                      contents = File.read(fn)
                      adjusted_contents = contents.gsub(/\.([^\s])/, '. \1')
                      @lucene_index << { :id => fn, :contents => adjusted_contents }
                    end
                  end
                end
                @lucene_index.commit
              end
              start = stop +1
              stop += 100
            end
            @has_content = true
            dump
          rescue => e
            puts e.message
            puts e.backtrace
          end
        end
    

    i think the while loop is not needed. but i'm skipping files larger than 500kb and i don't sent the whole file into the binary detection - only the first 200 bytes (chars?).

    sorry, i don't know how contributing works. if you can point me to a howto, i'll take the time to provide a proper patch.

    cheers,
    johannes.

  • Daniel Lucraft

    Daniel Lucraft April 12th, 2011 @ 09:41 PM

    • State changed from “new” to “open”

    Is the problem here that it's reading an entire file (and then taking the first 200 chars) to check it's binary, or that it's indexing some very large files?

  • kostakimu

    kostakimu April 14th, 2011 @ 10:29 AM

    i've seen that you already made some changes in the source. cool ;)

    hm, checking against the first 200 chars only should be better. but i think the real problem is that the binarydetection doesn't work properly. i've seen that the indexer also tries to index .flv files. and i get search results in binary files as well (see screenshot).

  • Daniel Lucraft

    Daniel Lucraft April 14th, 2011 @ 10:43 AM

    I see. Without having the file it's hard to tell, but the BinaryDataDetector is very simple. Can you see why it is discriminating those files as plain text?

Please Sign in or create a free account to add a new ticket.

With your very own profile, you can contribute to projects, track your activity, watch tickets, receive and update tickets through your email and much more.

New-ticket Create new ticket

Create your profile

Help contribute to this project by taking a few moments to create your personal profile. Create your profile ยป

A programmer's text editor for Gnome.

People watching this ticket

Pages