#497 project search not working for .erb files

Type	To find
responsible:me	tickets assigned to you
tagged:"@high"	tickets tagged @high
milestone:next	tickets in the upcoming milestone
state:invalid	tickets with the state invalid
created:"last week"	tickets created last week
sort:number, importance, updated	tickets sorted by #, importance or updated
Combine keywords for powerful searching.
Use advanced searching »

#497 open

project search not working for .erb files

Reported by kostakimu | April 4th, 2011 @ 10:58 AM

the project search is working fine in my rails project for javascript, rb, and yml files. however, it doesn't find anything in html.erb files.

Version: 0.11
Ruby Version: 1.8.7
Jruby version: 1.5.3
Redcar.environment: user

running on mac os x

Comments and changes to this ticket

You flagged this item as spam.
kostakimu April 4th, 2011 @ 08:07 PM
after removing the .redcar dir and then starting up, indexing seems to fail due to "OutOfMemory". but the next index seems to pass:

Completed Description Duration
21:03:14 /Users/johannes/dev/mp/mpcp: refresh index 0.008
21:03:14 /Users/johannes/dev/mp/mpcp: reparse files for declarations 0.238
21:03:13 /Users/johannes/dev/mp/mpcp: refresh index 4.675
Java::JavaLang::OutOfMemoryError Java heap space

org.jruby.util.ByteList.(ByteList.java:91)
org.jruby.util.io.ChannelStream.readall(ChannelStream.java:365)
org.jruby.RubyIO.readAll(RubyIO.java:2825)
org.jruby.RubyIO.read(RubyIO.java:2641)
org.jruby.RubyIO.read(RubyIO.java:3327)
org.jruby.RubyIO$s_method_multi$RUBYINVOKER$read.call(org/jruby/RubyIO$s_method_multi$RUBYINVOKER$read.gen:65535)
org.jruby.internal.runtime.methods.JavaMethod$JavaMethodOneOrNBlock.call(JavaMethod.java:319)
org.jruby.runtime.callsite.CachingCallSite.call(CachingCallSite.java:146)
org.jruby.ast.CallOneArgNode.interpret(CallOneArgNode.java:57)
org.jruby.ast.DAsgnNode.interpret(DAsgnNode.java:110)
org.jruby.ast.NewlineNode.interpret(NewlineNode.java:104)
org.jruby.ast.BlockNode.interpret(BlockNode.java:71)
org.jruby.ast.IfNode.interpret(IfNode.java:119)
org.jruby.ast.NewlineNode.interpret(NewlineNode.java:104)
org.jruby.runtime.InterpretedBlock.evalBlockBody(InterpretedBlock.java:373)
org.jruby.runtime.InterpretedBlock.yieldSpecific(InterpretedBlock.java:259)
org.jruby.runtime.Block.yieldSpecific(Block.java:117)
org.jruby.RubyHash$11.visit(RubyHash.java:1132)
org.jruby.RubyHash.visitAll(RubyHash.java:579)
org.jruby.RubyHash.iteratorVisitAll(RubyHash.java:1119)
org.jruby.RubyHash.each(RubyHash.java:1130)
org.jruby.RubyHash.each19(RubyHash.java:1150)
org.jruby.RubyHash$i_method_0_0$RUBYFRAMEDINVOKER$each19.call(org/jruby/RubyHash$i_method_0_0$RUBYFRAMEDINVOKER$each19.gen:65535)
org.jruby.runtime.callsite.CachingCallSite.cacheAndCall(CachingCallSite.java:299)
org.jruby.runtime.callsite.CachingCallSite.callBlock(CachingCallSite.java:117)
org.jruby.runtime.callsite.CachingCallSite.call(CachingCallSite.java:122)
org.jruby.ast.CallNoArgBlockNode.interpret(CallNoArgBlockNode.java:64)
org.jruby.ast.NewlineNode.interpret(NewlineNode.java:104)
org.jruby.ast.BlockNode.interpret(BlockNode.java:71)
org.jruby.ast.RescueNode.executeBody(RescueNode.java:199)
org.jruby.ast.RescueNode.interpretWithJavaExceptions(RescueNode.java:118)
org.jruby.ast.RescueNode.interpret(RescueNode.java:110)
You flagged this item as spam.
kostakimu April 4th, 2011 @ 08:12 PM
jep. obviously the out-of-memory is the problem. checked another (smaller) project with no out-of-memory error and successfully searched and found text in html.erb files..
You flagged this item as spam.
kostakimu April 4th, 2011 @ 08:41 PM
setting the xmx value from 320 to 1024m in runner.rb doesn't even help :(
Daniel Lucraft April 8th, 2011 @ 03:59 PM
Do you have anything like cyclic symlinks in your project?

kostakimu April 8th, 2011 @ 04:24 PM

nope. but i spent some time finding the problem. obviously the binary detection doesn't work properly. that's why the memory usage got very high.. i managed to get it working by those changes in lucene_index.rb:

def update
      changed_files = @project.file_list.changed_since(last_updated)
      @last_updated = Time.now
      changed_files.reject! do |fn, ts|
        fn.index(@project.config_dir) or Redcar::Project::FileList.hide_file_path?(fn)
      end
      files_array = changed_files.to_a
      start = 0
      stop = 99
      begin
        while start < files_array.size do
          files = files_array.slice(start, stop)
          Lucene::Transaction.run do
            @lucene_index ||= Lucene::Index.new(lucene_index_dir)
            @lucene_index.field_infos[:contents][:store] = true
            @lucene_index.field_infos[:contents][:tokenized] = true
            files.each do |fn, ts|
              unless File.basename(fn)[0..0] == "." or fn.include?(".git")
                unless BinaryDataDetector.binary?(File.new(fn).read(200))
                  next if File.size(fn) > (500 * 1024) # ommit files larger than 500kb
                  contents = File.read(fn)
                  adjusted_contents = contents.gsub(/\.([^\s])/, '. \1')
                  @lucene_index << { :id => fn, :contents => adjusted_contents }
                end
              end
            end
            @lucene_index.commit
          end
          start = stop +1
          stop += 100
        end
        @has_content = true
        dump
      rescue => e
        puts e.message
        puts e.backtrace
      end
    end

i think the while loop is not needed. but i'm skipping files larger than 500kb and i don't sent the whole file into the binary detection - only the first 200 bytes (chars?).

sorry, i don't know how contributing works. if you can point me to a howto, i'll take the time to provide a proper patch.

cheers,
johannes.

Daniel Lucraft April 12th, 2011 @ 09:41 PM
- State changed from “new” to “open”
Is the problem here that it's reading an entire file (and then taking the first 200 chars) to check it's binary, or that it's indexing some very large files?
You flagged this item as spam.
kostakimu April 14th, 2011 @ 10:29 AM
i've seen that you already made some changes in the source. cool ;)

hm, checking against the first 200 chars only should be better. but i think the real problem is that the binarydetection doesn't work properly. i've seen that the indexer also tries to index .flv files. and i get search results in binary files as well (see screenshot).
- screen-shot-2011-04-14-at-1125.png 24.3 KB
Daniel Lucraft April 14th, 2011 @ 10:43 AM
I see. Without having the file it's hard to tell, but the BinaryDataDetector is very simple. Can you see why it is discriminating those files as plain text?

Please Sign in or create a free account to add a new ticket.

With your very own profile, you can contribute to projects, track your activity, watch tickets, receive and update tickets through your email and much more.