project search not working for .erb files
Reported by kostakimu | April 4th, 2011 @ 10:58 AM
the project search is working fine in my rails project for javascript, rb, and yml files. however, it doesn't find anything in html.erb files.
Version: 0.11
Ruby Version: 1.8.7
Jruby version: 1.5.3
Redcar.environment: user
running on mac os x
Comments and changes to this ticket
-
kostakimu April 4th, 2011 @ 08:07 PM
after removing the .redcar dir and then starting up, indexing seems to fail due to "OutOfMemory". but the next index seems to pass:
Completed Description Duration
21:03:14 /Users/johannes/dev/mp/mpcp: refresh index 0.008
21:03:14 /Users/johannes/dev/mp/mpcp: reparse files for declarations 0.238
21:03:13 /Users/johannes/dev/mp/mpcp: refresh index 4.675
Java::JavaLang::OutOfMemoryError Java heap spaceorg.jruby.util.ByteList.(ByteList.java:91)
org.jruby.util.io.ChannelStream.readall(ChannelStream.java:365)
org.jruby.RubyIO.readAll(RubyIO.java:2825)
org.jruby.RubyIO.read(RubyIO.java:2641)
org.jruby.RubyIO.read(RubyIO.java:3327)
org.jruby.RubyIO$s_method_multi$RUBYINVOKER$read.call(org/jruby/RubyIO$s_method_multi$RUBYINVOKER$read.gen:65535)
org.jruby.internal.runtime.methods.JavaMethod$JavaMethodOneOrNBlock.call(JavaMethod.java:319)
org.jruby.runtime.callsite.CachingCallSite.call(CachingCallSite.java:146)
org.jruby.ast.CallOneArgNode.interpret(CallOneArgNode.java:57)
org.jruby.ast.DAsgnNode.interpret(DAsgnNode.java:110)
org.jruby.ast.NewlineNode.interpret(NewlineNode.java:104)
org.jruby.ast.BlockNode.interpret(BlockNode.java:71)
org.jruby.ast.IfNode.interpret(IfNode.java:119)
org.jruby.ast.NewlineNode.interpret(NewlineNode.java:104)
org.jruby.runtime.InterpretedBlock.evalBlockBody(InterpretedBlock.java:373)
org.jruby.runtime.InterpretedBlock.yieldSpecific(InterpretedBlock.java:259)
org.jruby.runtime.Block.yieldSpecific(Block.java:117)
org.jruby.RubyHash$11.visit(RubyHash.java:1132)
org.jruby.RubyHash.visitAll(RubyHash.java:579)
org.jruby.RubyHash.iteratorVisitAll(RubyHash.java:1119)
org.jruby.RubyHash.each(RubyHash.java:1130)
org.jruby.RubyHash.each19(RubyHash.java:1150)
org.jruby.RubyHash$i_method_0_0$RUBYFRAMEDINVOKER$each19.call(org/jruby/RubyHash$i_method_0_0$RUBYFRAMEDINVOKER$each19.gen:65535)
org.jruby.runtime.callsite.CachingCallSite.cacheAndCall(CachingCallSite.java:299)
org.jruby.runtime.callsite.CachingCallSite.callBlock(CachingCallSite.java:117)
org.jruby.runtime.callsite.CachingCallSite.call(CachingCallSite.java:122)
org.jruby.ast.CallNoArgBlockNode.interpret(CallNoArgBlockNode.java:64)
org.jruby.ast.NewlineNode.interpret(NewlineNode.java:104)
org.jruby.ast.BlockNode.interpret(BlockNode.java:71)
org.jruby.ast.RescueNode.executeBody(RescueNode.java:199)
org.jruby.ast.RescueNode.interpretWithJavaExceptions(RescueNode.java:118)
org.jruby.ast.RescueNode.interpret(RescueNode.java:110) -
kostakimu April 4th, 2011 @ 08:12 PM
jep. obviously the out-of-memory is the problem. checked another (smaller) project with no out-of-memory error and successfully searched and found text in html.erb files..
-
kostakimu April 4th, 2011 @ 08:41 PM
setting the xmx value from 320 to 1024m in runner.rb doesn't even help :(
-
Daniel Lucraft April 8th, 2011 @ 03:59 PM
Do you have anything like cyclic symlinks in your project?
-
kostakimu April 8th, 2011 @ 04:24 PM
nope. but i spent some time finding the problem. obviously the binary detection doesn't work properly. that's why the memory usage got very high.. i managed to get it working by those changes in lucene_index.rb:
def update changed_files = @project.file_list.changed_since(last_updated) @last_updated = Time.now changed_files.reject! do |fn, ts| fn.index(@project.config_dir) or Redcar::Project::FileList.hide_file_path?(fn) end files_array = changed_files.to_a start = 0 stop = 99 begin while start < files_array.size do files = files_array.slice(start, stop) Lucene::Transaction.run do @lucene_index ||= Lucene::Index.new(lucene_index_dir) @lucene_index.field_infos[:contents][:store] = true @lucene_index.field_infos[:contents][:tokenized] = true files.each do |fn, ts| unless File.basename(fn)[0..0] == "." or fn.include?(".git") unless BinaryDataDetector.binary?(File.new(fn).read(200)) next if File.size(fn) > (500 * 1024) # ommit files larger than 500kb contents = File.read(fn) adjusted_contents = contents.gsub(/\.([^\s])/, '. \1') @lucene_index << { :id => fn, :contents => adjusted_contents } end end end @lucene_index.commit end start = stop +1 stop += 100 end @has_content = true dump rescue => e puts e.message puts e.backtrace end end
i think the while loop is not needed. but i'm skipping files larger than 500kb and i don't sent the whole file into the binary detection - only the first 200 bytes (chars?).
sorry, i don't know how contributing works. if you can point me to a howto, i'll take the time to provide a proper patch.
cheers,
johannes. -
Daniel Lucraft April 12th, 2011 @ 09:41 PM
- State changed from new to open
Is the problem here that it's reading an entire file (and then taking the first 200 chars) to check it's binary, or that it's indexing some very large files?
-
kostakimu April 14th, 2011 @ 10:29 AM
i've seen that you already made some changes in the source. cool ;)
hm, checking against the first 200 chars only should be better. but i think the real problem is that the binarydetection doesn't work properly. i've seen that the indexer also tries to index .flv files. and i get search results in binary files as well (see screenshot).
-
Daniel Lucraft April 14th, 2011 @ 10:43 AM
I see. Without having the file it's hard to tell, but the BinaryDataDetector is very simple. Can you see why it is discriminating those files as plain text?
Please Sign in or create a free account to add a new ticket.
With your very own profile, you can contribute to projects, track your activity, watch tickets, receive and update tickets through your email and much more.
Create your profile
Help contribute to this project by taking a few moments to create your personal profile. Create your profile ยป
A programmer's text editor for Gnome.