<?xml version="1.0" encoding="utf-8" standalone="yes" ?>
<feed xmlns="http://www.w3.org/2005/Atom">

	<title>Planet Ruby</title>
	<link rel="self" href="http://planetruby.0x42.net/atom.xml"/>
	<link href="http://planetruby.0x42.net/"/>
	<id>http://planetruby.0x42.net/atom.xml</id>
	<updated>2013-05-23T19:00:40+00:00</updated>
	<generator uri="http://www.planetplanet.org/">Planet/2.0 +http://www.planetplanet.org</generator>

	<entry>
		<title type="html">On Languages, VMs, Optimization, and the Way of the World</title>
		<link href="http://blog.headius.com/2013/05/on-languages-vms-optimization-and-way.html"/>
		<id>tag:blogger.com,1999:blog-4704664917418794835.post-681101033932402497</id>
		<updated>2013-05-11T04:10:32+00:00</updated>
		<content type="html">&lt;div dir=&quot;ltr&quot;&gt;&lt;div&gt;I shouldn't be up this late, but I've been doing lots of thinking and exploring tonight.&lt;/div&gt;&lt;div&gt;&lt;br /&gt;In studying various VMs over the past few years, I've come up with a list of do's and don't that make things optimize right. These apply to languages, the structures that back them, and the VMs that optimize those languages, and from what I've seen there's a lot of immutable truths here given current optimization technology.&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;Let's dive in.&lt;br /&gt;&lt;br /&gt;&lt;/div&gt;&lt;h3&gt;#1: Types don't have to be static&lt;/h3&gt;&lt;div&gt;&lt;br /&gt;JVM and other dynamic-optimizing runtimes have proven this out. At runtime, it's possible to gather the same information static types would provide you at compile time, leading to optimizations at least as good as fully statically-typed, statically-optimized code. In some cases, it may be possible to do a better job, since runtime profiling is based on real execution, real branch percentages, real behavior, rather than a guess at what a program might do. You could probably make the claim that static optimization is a halting problem, and dynamic optimization eventually can beat it by definition since it can optimize what the program is actually doing.&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;However, this requires one key thing to really work well.&lt;br /&gt;&lt;br /&gt;&lt;/div&gt;&lt;h3&gt;#2: Types need to be predictable&lt;/h3&gt;&lt;div&gt;&lt;br /&gt;In order for runtime optimization to happen, objects need to have predictable types and those types need to have a predictable structure. This isn't to say that types must be statically declared...they just need to look the same on repeat visits. If objects can change type (smalltalk's become, perl's and C's weak typing) you're forced to include more guards against those changes, or you're forced to invalidate more code whenever something changes (or in the case of C, you just completely shit the bed when things aren't as expected). If change is possible and exposed at a language level, there may be nothing you can do to cope with all those different type shapes, and optimization can only go so far.&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;This applies both to the shape of a type's method table (methods remaining consistent once encountered) and the shape of the type's instances (predictable object layout). Many dynamically-typed languages impose dynamic type shape and object shape on VMs that run them, preventing those VMs from making useful predictions about how to optimize code. Optimistic predictions (generating synthetic types for known type shapes or preemptively allocating objects based on previously-seen shapes) still have to include fallback logic to maintain the mutable behavior, should it ever be needed. Again, optimization potential is limited, because the shape of the world can change on a whim and the VM has to be vigilent&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;The alternative summation of #1 and #2 is that types don't have to be statically declared, but they need to be statically defined. Most popular dynamic languages do neither, but all they really need to do is the latter.&lt;br /&gt;&lt;br /&gt;&lt;/div&gt;&lt;h3&gt;#3: You can't cheat the CPU&lt;/h3&gt;&lt;div&gt;&lt;br /&gt;Regardless of how clever you'd like to be in your code or language or VM or JIT, the limiting factor is how modern CPUs actually run your code. There's a long list of expectations you must meet to squeeze every last drop of speed out of a system, and diverging from those guidelines will always impose a penalty. This is the end...the bottom turtle...the unifying theory. It is, at the end of the day, the CPU you must appease to get the best performance. All other considerations fall out of that, and anywhere performance does not live up to expectations you are guaranteed to discover that someone tried to cheat the CPU.&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;Traditionally, static typing was the best way to guarantee we produced good CPU instructions. It gave us a clear picture of the world we could ponder and meditate over, eventually boiling out the secrets of the universe and producing the fastest possible code. But that always assumed a narrow vision of a world with unlimited resources. It assumed we could make all the right decisions for a program ahead of time and that no limitations outside our target instruction set would ever affect us. In the real world, however, CPUs have limited cache sizes, multiple threads, bottlenecked memory pipelines, and basic physics to contend with (you can only push so many electrons through a given piece of matter without blowing it up). Language and VM authors ignore the expectations of their target systems only at great peril.&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;Let's look at a few languages and where they fit.&lt;br /&gt;&lt;br /&gt;&lt;/div&gt;&lt;h3&gt;Language Scorecard&lt;/h3&gt;&lt;div&gt;&lt;br /&gt;Java is statically typed and types are of a fixed shape. This is the ideal situation mostly because of the type structure being predictable. Once encountered, a rose is just a rose. Given appropriate dynamic optimizations, there's no reason Java code can't compete with or surpass statically-typed and statically-compiled C/++, and in theory there's nothing preventing Java code from becoming optimal CPU instructions.&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;Dart is dynamically typed (or at least, types are optional and the VM doesn't care about them), but types are of a fixed shape. If programmers can tolerate fixed-shape types, Dart provides a very nice dynamic language that still can achieve the same optimizations as statically-typed Java or statically-compiled C/++.&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;Groovy is dynamically typed with some inference and optimization if you specify static types, but most (all?) types defined in Groovy are not guaranteed to be a fixed shape. As a result, even when specifying static types, guards must be inserted to check that those types' shapes have not changed. Groovy does, however, guarantee object shape is consistent over time, which avoids overhead from being able to reshape objects at runtime.&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;Ruby and JavaScript are dynamically typed and types and objects can change shape at runtime. This is a confluence of all the hardest-to-optimize language characteristics. In both cases, the best we can do is to attempt to predict common type and object shapes and insert guards for when we're wrong, but it's not possible to achieve the performance of a system with fully-predictable type and object shapes. Prove me wrong.&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;Now of course when I say it's not possible, I mean it's not possible for the general case. Specific cases of a known closed-world application can indeed be optimized as though the types and objects involved had static shapes. I do something along these lines in my RubyFlux compiler, which statically analyzes incoming Ruby code and assumes the methods it sees defined and the fields it sees accessed will be the only methods and fields it ever needs to worry about. But that requires omitting features that can mutate type and object structure, or else you have to have a way to know which types and objects those features will affect. Sufficiently smart compiler indeed.&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;Python has similar structural complexities to Ruby and adds in the additional complexity of an introspectable call stack. Under those circumstances, even on-stack execution state is not safe; a VM can't even make guarantees about the values it has in hand or the shape of a given call's activation. PyPy does an admirable job of attacking this problem by rewriting currently-running code and lifting on-stack state to the heap when it is accessed, but this approach prevents dropping unused local state (since you can't predict who might want to see it) and also fails to work under parallel execution (since you can't rewrite code another thread might be executing). Again, the dynamicity of a &quot;cool&quot; feature brings with it intrinsic penalties that are reducible but not removable.&lt;br /&gt;&lt;br /&gt;&lt;/div&gt;&lt;h3&gt;Get to the Damn Point, Already&lt;/h3&gt;&lt;div&gt;&lt;br /&gt;So what am I trying to say in all this? I started the evening by exploring a benchmark post comparing Dart's VM with JVM on the same benchmark. The numbers were not actually very exciting...with a line-by-line port from Dart to Java, Java came out slightly behind Dart. With a few modifications to the Java code, Java pulled slightly ahead. With additional modifications to the Dart code, it might leapfrog Java again. But this isn't interesting because Dart and Java can both rely on type and object shapes remaining consistent, and as a result the optimizations they perform can basically accomplish the same thing. Where it matters, they're similar enough that VMs don't care about the differences.&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;Where does this put languages I love, like Ruby? It's probably fair to concede that Ruby can't ever achieve the raw, straight-line performance of type-static (not statically-typed) languages like Dart or Java, regardless of the VM technologies involved. We'll be able to get close; JRuby can, with the help of invokedynamic, make method calls *nearly* as fast as Java calls, and by generating type shapes we can make object state *nearly* as predictable as Java types, but we can't go all the way. Regardless of how great the underlying VM is, if you can't hold to its immutable truths, you're walking against the wind. Ruby on Dart would probably not be any faster than Ruby on JVM, because you'd still have to implement mutable types and growable objects in pretty much the same way. Ruby on PyPy might be able to go farther, since the VM is designed for mutable types and growable objects, but you might have to sacrifice parallelism or accept that straight-line object-manipulating performance won't go all the way to a Java or Dart. Conversely, languages that make those type-static guarantees might be able to beat dynamic languages when running on dynamic language VMs (e.g. dart2js) for exactly the same reasons that they excel on their own VMs: they provide a more consistent view of the world, and offer no surprises to the VM that would hinder optimization. You trade dynamicity at the language level for predictability at the VM level.&lt;br /&gt;&lt;br /&gt;&lt;/div&gt;&lt;h3&gt;The Actual Lesson&lt;/h3&gt;&lt;div&gt;&lt;br /&gt;I guess the bottom line for me is realizing that there's always going to be a conflict between what programmers want out of programming languages and what's actually possible to give them. There's no magical fairy world where every language can be as fast as every other language, because there's no way to predict how every program is going to execute (or in truth, how a given program is going to execute given a general strategy). And that's ok; most of these languages can still get very close to each other in performance, and over time the dynamic type/object-shaped languages may offer ways to ratchet down some of that dynamism...or they might not care and just accept what limitations result. The important thing is for language users to recognize that nothing is free, and to understand the implications of language features and design decisions they make in their own programs.&lt;/div&gt;&lt;/div&gt;</content>
		<author>
			<name>Charles Nutter</name>
			<email>noreply@blogger.com</email>
			<uri>http://blog.headius.com/</uri>
		</author>
		<source>
			<title type="html">Headius</title>
			<subtitle type="html">Helping the JVM Into the 21st Century</subtitle>
			<link rel="self" href="http://blog.headius.com/feeds/posts/default"/>
			<id>tag:blogger.com,1999:blog-4704664917418794835</id>
			<updated>2013-05-21T18:00:07+00:00</updated>
		</source>
	</entry>

	<entry>
		<title type="html">Small Unix utilities written in Ruby - part 3</title>
		<link href="http://t-a-w.blogspot.com/2013/05/small-unix-utilities-written-in-ruby.html"/>
		<id>tag:blogger.com,1999:blog-27488238.post-4752325041435187955</id>
		<updated>2013-05-06T02:10:56+00:00</updated>
		<content type="html">&lt;div class=&quot;separator&quot;&gt;&lt;a href=&quot;http://3.bp.blogspot.com/-XAJdR7a6Ngg/UYblYejabNI/AAAAAAAABw4/U7WF1F8w1vc/s1600/naughty_cat_by_kevin_dooley_from_flickr_cc-by.jpg&quot; title=&quot;Naughty cat by kevin dooley from flickr (CC-BY)&quot;&gt;&lt;img alt=&quot;Naughty cat by kevin dooley from flickr (CC-BY)&quot; border=&quot;0&quot; height=&quot;480&quot; src=&quot;http://3.bp.blogspot.com/-XAJdR7a6Ngg/UYblYejabNI/AAAAAAAABw4/U7WF1F8w1vc/s640/naughty_cat_by_kevin_dooley_from_flickr_cc-by.jpg&quot; width=&quot;640&quot; /&gt;&lt;/a&gt;&lt;/div&gt;&lt;br /&gt;Here's the third instalment in my ongoing series (&lt;a href=&quot;http://t-a-w.blogspot.com/2012/07/collection-of-small-unix-utilities.html&quot;&gt;Part 1&lt;/a&gt;. &lt;a href=&quot;http://t-a-w.blogspot.com/2013/04/more-small-unix-utilities-written-in.html&quot;&gt;Part 2&lt;/a&gt;).&lt;br /&gt;&lt;br /&gt;&lt;a href=&quot;https://github.com/taw/unix-utilities&quot;&gt;All utilities mentioned are available on github&lt;/a&gt;.&lt;br /&gt;&lt;h3&gt;&lt;tt&gt;flickr_find&lt;/tt&gt;&lt;/h3&gt;Find Creative Commons licenced photos on flickr.&lt;br /&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;Usage example: &lt;tt&gt;flickr_find cute kittens&lt;/tt&gt;.&lt;/div&gt;&lt;h3&gt;&lt;tt&gt;flickr_get&lt;/tt&gt;&lt;/h3&gt;Download best quality version of a photo from flickr and annotate it with proper file name.&lt;br /&gt;&lt;br /&gt;Usage example:&lt;br /&gt;&lt;br /&gt;&amp;nbsp; &amp;nbsp; flickr_get http://www.flickr.com/photos/pagedooley/386303100/&lt;br /&gt;&lt;br /&gt;which will be saved as &lt;tt&gt;~/Downloads/naughty_cat_by_kevin_dooley_from_flickr_cc-by.jpg&lt;/tt&gt;&lt;br /&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;I've been using &lt;tt&gt;flickr_find&lt;/tt&gt; and &lt;tt&gt;flickr_get&lt;/tt&gt; for years on this blog.&lt;br /&gt;&lt;br /&gt;It requires &lt;tt&gt;objectiveflickr&lt;/tt&gt; gem.&lt;/div&gt;&lt;h3&gt;&lt;tt&gt;osx_suspend&lt;/tt&gt;&lt;/h3&gt;OSX surprisingly lacks an easy way to suspend your current session. This utility does just that.&lt;br /&gt;&lt;h3&gt;&lt;tt&gt;rand_passwd&lt;/tt&gt;&lt;/h3&gt;Have you ever needed to quickly generate a new password? This utility generates easy to type (lower case letters only) 12 character password, with 56.4 bits of entropy, so you never need to reuse the same password across multiple sites.&lt;br /&gt;&lt;h3&gt;&lt;tt&gt;webman&lt;/tt&gt;&lt;/h3&gt;The most annoying thing about man pages (and even more about this silly GNU info idea) is that they display in terminal, where they're really painful to search. What man really needs is in-browser display.&lt;br /&gt;&lt;br /&gt;Here comen webman. It is fuly intended to be used with &lt;tt&gt;alias man=webman&lt;/tt&gt; in your &lt;tt&gt;~/.bashrc&lt;/tt&gt;. It finds proper man page, groffs it to HTML, caches that in &lt;tt&gt;~/.man_cache&lt;/tt&gt; since groff is slow as hell for some reason, and opens your favourite browser.&lt;br /&gt;&lt;br /&gt;The code looks fairly overengineered, since the actual script I use also checks EC2 (so I can man page for both OSX and Linux programs with the same command), and this stripped down and somewhat refactored version is still a bit on the complex side.&lt;br /&gt;&lt;br /&gt;If it fails to find man page for any reason, it opens relevant Google search instead.&lt;br /&gt;&lt;br /&gt;If you need to open it in a terminal simply pass &lt;tt&gt;-T&lt;/tt&gt; flag.&lt;br /&gt;&lt;br /&gt;It wasn't tested outside OSX environment yet, so pull requests welcome. Further cleanup pull requests also very much welcome.</content>
		<author>
			<name>taw</name>
			<email>noreply@blogger.com</email>
			<uri>http://t-a-w.blogspot.com/search/label/ruby</uri>
		</author>
		<source>
			<title type="html">taw's blog</title>
			<subtitle type="html">The best kittens, technology, and video games blog in the world.</subtitle>
			<link rel="self" href="http://t-a-w.blogspot.com/feeds/posts/default/-/ruby?orderby=published"/>
			<id>tag:blogger.com,1999:blog-27488238</id>
			<updated>2013-05-21T08:00:35+00:00</updated>
		</source>
	</entry>

	<entry xml:lang="en-US">
		<title type="html">Computer Science Programming Basics in Ruby</title>
		<link href="http://feeds.oreilly.com/~r/oreilly/ruby/~3/DQM1JCRcrzg/"/>
		<id>http://oreilly.com/catalog/9781449355975/</id>
		<updated>2013-04-24T22:00:26+00:00</updated>
		<content type="html">&lt;a href=&quot;http://oreilly.com/catalog/9781449355975/&quot;&gt;&lt;img src=&quot;http://covers.oreilly.com/images/9781449355975/bkt.gif&quot; /&gt;&lt;/a&gt;&lt;p&gt;If you know basic high-school math, you can quickly learn and apply the core concepts of computer science with this concise, hands-on book. Led by a team of experts, you&amp;#8217;ll quickly understand the difference between computer science and computer programming, and you&amp;#8217;ll learn how algorithms help you solve computing problems.&lt;/p&gt;
	&lt;img src=&quot;http://feeds.feedburner.com/~r/oreilly/ruby/~4/DQM1JCRcrzg&quot; height=&quot;1&quot; width=&quot;1&quot; /&gt;</content>
		<author>
			<name>O'Reilly Media, Inc.</name>
			<uri>http://oreilly.com/ruby</uri>
		</author>
		<source>
			<title type="html">O'Reilly Media: Ruby and Rails</title>
			<subtitle type="html">A compilation of O'Reilly Media's information about the Ruby programming language from news, books, conferences, courses, community, and reports.</subtitle>
			<link rel="self" href="http://feeds.oreilly.com/oreilly/ruby"/>
			<id>http://oreilly.com/ruby</id>
			<updated>2013-04-24T22:00:26+00:00</updated>
			<rights type="html">Copyright O'Reilly Media, Inc.</rights>
		</source>
	</entry>

	<entry>
		<title type="html">magic/xml gem published</title>
		<link href="http://t-a-w.blogspot.com/2013/04/magicxml-gem-published.html"/>
		<id>tag:blogger.com,1999:blog-27488238.post-6123505419099230711</id>
		<updated>2013-04-14T22:54:19+00:00</updated>
		<content type="html">&lt;div class=&quot;separator&quot;&gt;&lt;a href=&quot;http://4.bp.blogspot.com/-A19omBc9jeA/UWsJOEAvpiI/AAAAAAAABwU/6j7SQoUCrVM/s1600/cat_scratch_fever__ottawa_2002_by_mikey_g_ottawa_from_flickr_cc-nc-nd.jpg&quot; title=&quot;&quot;&gt;&lt;img alt=&quot;&quot; /&gt;&lt;/a&gt;&lt;/div&gt;&lt;br /&gt;Once upon a time I built gems for my libraries, but then rubygems site migrated like three times, and I really didn't feel like keeping track with all that, so I stopped doing anything.&lt;br /&gt;&lt;br /&gt;Now I pushed &lt;tt&gt;&lt;a href=&quot;https://github.com/taw/magic-xml&quot;&gt;magic-xml&lt;/a&gt;&lt;/tt&gt; gem to relevant gem repositories (it has a dash like github repository name).&lt;br /&gt;&lt;br /&gt;Apparently bkkbrad made magic_xml (with underscore) gem based on earlier version as well. Which brings me to:&lt;br /&gt;&lt;h3&gt;Public service announcement&lt;/h3&gt;Everyone, it's time to talk serious business. Ruby community must decide if it wants dashes or underscores in gem name, and it must decide it now.&lt;br /&gt;&lt;br /&gt;&lt;ul&gt;&lt;li&gt;15578 gems have underscores&lt;/li&gt;&lt;li&gt;17127 gems have dashes&lt;/li&gt;&lt;li&gt;1465 mix both in their name!!!&lt;/li&gt;&lt;/ul&gt;This is insanity. It's also a pretty safe bet github has something to do with it - I love you guys, but it's really time to get your act together.&lt;br /&gt;&lt;br /&gt;I'll be using dashes, since that's what github seems to be promoting, and I tend to put my software on github these days.&lt;br /&gt;&lt;h3&gt;Other goodies&lt;/h3&gt;A lot of my small utilities depend on magic/xml, so this will allow me to publish them without having to bundle magic/xml library (even it's just one file, very old school).&lt;br /&gt;&lt;br /&gt;For now I just pushed lastfm_status program - which does precisely what its name implies - to&amp;nbsp;&lt;a href=&quot;https://github.com/taw/unix-utilities&quot;&gt;unix-utilities repository&lt;/a&gt;, but I'm sure there will be more, especially once I figure out which of my programs break half of Internet's Terms of Service enough to get banned, and which only a little ;-p</content>
		<author>
			<name>taw</name>
			<email>noreply@blogger.com</email>
			<uri>http://t-a-w.blogspot.com/search/label/ruby</uri>
		</author>
		<source>
			<title type="html">taw's blog</title>
			<subtitle type="html">The best kittens, technology, and video games blog in the world.</subtitle>
			<link rel="self" href="http://t-a-w.blogspot.com/feeds/posts/default/-/ruby?orderby=published"/>
			<id>tag:blogger.com,1999:blog-27488238</id>
			<updated>2013-05-21T08:00:35+00:00</updated>
		</source>
	</entry>

	<entry>
		<title type="html">More small Unix utilities written in Ruby</title>
		<link href="http://t-a-w.blogspot.com/2013/04/more-small-unix-utilities-written-in.html"/>
		<id>tag:blogger.com,1999:blog-27488238.post-6065321896797375866</id>
		<updated>2013-04-11T23:53:25+00:00</updated>
		<content type="html">&lt;div class=&quot;separator&quot;&gt;&lt;a href=&quot;http://2.bp.blogspot.com/-fekV9J4XVZY/UWcipMue2WI/AAAAAAAABwA/Zw5nk6NlI6I/s1600/eating_grass_2_by_lotusgreen_from_flickr_cc-nc-sa.jpg&quot; title=&quot;eating grass 2 by lotusgreen from flickr (CC-NC-SA)&quot;&gt;&lt;img alt=&quot;eating grass 2 by lotusgreen from flickr (CC-NC-SA)&quot; border=&quot;0&quot; height=&quot;482&quot; src=&quot;http://2.bp.blogspot.com/-fekV9J4XVZY/UWcipMue2WI/AAAAAAAABwA/Zw5nk6NlI6I/s640/eating_grass_2_by_lotusgreen_from_flickr_cc-nc-sa.jpg&quot; width=&quot;640&quot; /&gt;&lt;/a&gt;&lt;/div&gt;&lt;br /&gt;Here's the sequel to my &quot;&lt;a href=&quot;http://t-a-w.blogspot.com/2012/07/collection-of-small-unix-utilities.html&quot;&gt;collection of small Unix utilities written in Ruby&lt;/a&gt;&quot; post and &lt;a href=&quot;https://github.com/taw/unix-utilities&quot;&gt;github repository&lt;/a&gt;.&lt;br /&gt;&lt;h3&gt;Useful technique - &lt;tt&gt;Pathname&lt;/tt&gt;&lt;/h3&gt;One thing I forgot to mention the last time - &lt;tt&gt;Pathname&lt;/tt&gt; library.&lt;br /&gt;&lt;br /&gt;&lt;tt&gt;Pathname&lt;/tt&gt; is an objects-oriented way to look at paths in a file system. A Pathname object is not the same as a &lt;tt&gt;File&lt;/tt&gt; or &lt;tt&gt;Directory&lt;/tt&gt; object since it's not opened - and might not even exist yet. It's also not like &lt;tt&gt;String&lt;/tt&gt; since it has all the filesystem awareness.&lt;br /&gt;&lt;br /&gt;For very simple scripts it's fine to use just plain &lt;tt&gt;String&lt;/tt&gt;s to represent filesystem paths, but once it gets a bit more complicated your script will get a lot more readable with &lt;tt&gt;Pathname&lt;/tt&gt; - and it costs you nothing.&lt;br /&gt;&lt;br /&gt;Let's just look at &lt;tt&gt;fix_permissions&lt;/tt&gt; utility. Here's the core part:&lt;br /&gt;&lt;br /&gt;&lt;pre&gt;class Pathname&lt;br /&gt;&amp;nbsp; def script?&lt;br /&gt;&amp;nbsp; &amp;nbsp; read(2) == &quot;#!&quot;&lt;br /&gt;&amp;nbsp; end&lt;br /&gt;&lt;br /&gt;&amp;nbsp; def file_type&lt;br /&gt;&amp;nbsp; &amp;nbsp; `file -b #{self.to_s.shellescape}`.chomp&lt;br /&gt;&amp;nbsp; end&lt;br /&gt;&lt;br /&gt;&amp;nbsp; def should_be_executable?&lt;br /&gt;&amp;nbsp; &amp;nbsp; script? or file_type =~ /\b(Mach-O|executable)\b/&lt;br /&gt;&amp;nbsp; end&lt;br /&gt;end&lt;br /&gt;&lt;br /&gt;def fix_permissions(path)&lt;br /&gt;&amp;nbsp; Pathname(path).find do |fn|&lt;br /&gt;&amp;nbsp; &amp;nbsp; next if fn.directory?&lt;br /&gt;&amp;nbsp; &amp;nbsp; next if fn.symlink?&lt;br /&gt;&amp;nbsp; &amp;nbsp; next unless fn.executable?&lt;br /&gt;&amp;nbsp; &amp;nbsp; fn.chmod(0644) unless fn.should_be_executable?&lt;br /&gt;&amp;nbsp; end&lt;br /&gt;end&lt;/pre&gt;&lt;br /&gt;Since &lt;tt&gt;Pathname&lt;/tt&gt; overloads &lt;tt&gt;#to_str&lt;/tt&gt; method it can be transparently used in most contexts where &lt;tt&gt;String&lt;/tt&gt; is expected - including printing it, file operations, system/exec commands and so on. You'll rarely need to use &lt;tt&gt;#to_s&lt;/tt&gt; - mostly when you want to regexp it.&lt;br /&gt;&lt;br /&gt;I feel &lt;tt&gt;Pathname#shellescape&lt;/tt&gt; should exist, but since it doesn't that's one place where you need to use &lt;tt&gt;.to_s.shellescape&lt;/tt&gt; for now.&lt;br /&gt;&lt;br /&gt;So what does this script do? First we add a few methods to &lt;tt&gt;Pathname&lt;/tt&gt; class. It already knows if something is a &lt;tt&gt;directory?&lt;/tt&gt;, &lt;tt&gt;symlink?&lt;/tt&gt;, and &lt;tt&gt;executable?&lt;/tt&gt; (that is - has +x flag).&lt;br /&gt;&lt;br /&gt;We want to know if it is a script. And that's easy - just &lt;tt&gt;read(2)&lt;/tt&gt; as if it was a &lt;tt&gt;File&lt;/tt&gt; to read first two bytes. It looks much more elegant than &lt;tt&gt;File.read(path, 2) != &quot;#!&quot;&lt;/tt&gt; we'd need if we used &lt;tt&gt;String&lt;/tt&gt;s - not to mention how &lt;tt&gt;String&lt;/tt&gt; class is really no place for &lt;tt&gt;#script?&lt;/tt&gt; method so we'd probably use a standalone procedure.&lt;br /&gt;&lt;br /&gt;Next let's make &lt;tt&gt;file_type&lt;/tt&gt; method - and use &lt;tt&gt;#shellescape&lt;/tt&gt; to do it safely. Unfortunately that one is only defined on &lt;tt&gt;String&lt;/tt&gt;s.&lt;br /&gt;&lt;br /&gt;After that it's just one regexp away from &lt;tt&gt;should_be_executable?&lt;/tt&gt;.&lt;br /&gt;&lt;br /&gt;Once we defined that notice how easy it is to dig into directory trees with &lt;tt&gt;Pathname#find&lt;/tt&gt;, and then just use a few &lt;tt&gt;#query?&lt;/tt&gt; methods to ask the path what it is about, then &lt;tt&gt;#chmod&lt;/tt&gt; to setup proper flags.&lt;br /&gt;&lt;br /&gt;Other very useful methods not present in the script are &lt;tt&gt;+&lt;/tt&gt; for adding relative paths, &lt;tt&gt;#basename&lt;/tt&gt;/&lt;tt&gt;#dirname&lt;/tt&gt; for splitting it into components, and &lt;tt&gt;#relative_path_from&lt;/tt&gt; for creating relative paths.&lt;br /&gt;&lt;br /&gt;While I'm at it, use &lt;tt&gt;URI&lt;/tt&gt; objects for URIs you want to do something complicated with rather than regexping them - usually your code will look better too.&lt;br /&gt;&lt;br /&gt;&lt;h3&gt;Individual commands&lt;/h3&gt;&lt;h4&gt;colcut&lt;/h4&gt;Cuts long lines to specific number of characters for easy previewing.&lt;br /&gt;&lt;br /&gt;&lt;pre&gt; &amp;nbsp; &amp;nbsp;colcut 80 &amp;lt; file.xml&lt;/pre&gt;&lt;div&gt;&lt;h4&gt;fix_permissions&lt;/h4&gt;&lt;/div&gt;&lt;div&gt;Removes executable flag from files which shouldn't have it. Useful for archives that went through a Windows system, zip archive, or other system not aware of Unix executable flag.&lt;/div&gt;&lt;div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;It doesn't turn +x flag, only removes it if a file neither starts with &lt;tt&gt;#!&lt;/tt&gt;, nor is an executable according to &lt;tt&gt;file&lt;/tt&gt; utility.&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;Usage example:&lt;/div&gt;&lt;div&gt;&amp;nbsp; &amp;nbsp; &lt;br /&gt;&lt;pre&gt;    fix_permissions ~/Downloads&lt;/pre&gt;&lt;/div&gt;&lt;div&gt;&amp;nbsp; &amp;nbsp;&amp;nbsp;&lt;/div&gt;&lt;div&gt;If no parameters are passed, it fixes permissions in current directory.&lt;/div&gt;&lt;/div&gt;&lt;div&gt;&lt;h4&gt;progress&lt;/h4&gt;Display progress for piped file.&lt;br /&gt;&lt;br /&gt;Usage examples:&lt;br /&gt;&lt;br /&gt;&lt;pre&gt; &amp;nbsp; &amp;nbsp; &amp;nbsp; cat /dev/urandom | progress | gzip &amp;nbsp;&amp;gt;/dev/null&lt;br /&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;progress -l &amp;lt;file.txt | upload&lt;/pre&gt;&lt;br /&gt;By default it's in bytes mode. Use &lt;tt&gt;-l&lt;/tt&gt; to specify line mode.&lt;br /&gt;&lt;br /&gt;If progress is piped a file and it's in byte mode, it checks its size and uses that to display relative progress (like &lt;tt&gt;18628608/104857600 [17%]&lt;/tt&gt;). Otherwise it will only display number of bytes/lines piped through.&lt;br /&gt;&lt;br /&gt;You can also specify what counts as 100% explicitly:&lt;br /&gt;&lt;br /&gt;&lt;pre&gt; &amp;nbsp; &amp;nbsp; progesss 123456&lt;br /&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp;progress 128m&lt;br /&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp;progress -l 42042&lt;/pre&gt;&lt;br /&gt;It will happily go over 100% on display.&lt;br /&gt;&lt;h4&gt;since_soup&lt;/h4&gt;Link to soup posts starting from the post before one specified.&lt;br /&gt;&lt;br /&gt;Usage example:&lt;br /&gt;&amp;nbsp; &amp;nbsp; &lt;br /&gt;&lt;pre&gt;    since_soup http://taw.soup.io/post/307955954/Image&lt;/pre&gt;&lt;h4&gt;sortby&lt;/h4&gt;Sort input through arbitrary Ruby expression. A lot more flexible than Unix &lt;tt&gt;sort&lt;/tt&gt; utility.&lt;br /&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;Usage example:&lt;/div&gt;&lt;div&gt;&amp;nbsp; &amp;nbsp; &lt;br /&gt;&lt;pre&gt;    sortby '$_.length' &amp;lt;file.txt&lt;/pre&gt;&lt;br /&gt;&lt;/div&gt;&lt;br /&gt;&lt;/div&gt;</content>
		<author>
			<name>taw</name>
			<email>noreply@blogger.com</email>
			<uri>http://t-a-w.blogspot.com/search/label/ruby</uri>
		</author>
		<source>
			<title type="html">taw's blog</title>
			<subtitle type="html">The best kittens, technology, and video games blog in the world.</subtitle>
			<link rel="self" href="http://t-a-w.blogspot.com/feeds/posts/default/-/ruby?orderby=published"/>
			<id>tag:blogger.com,1999:blog-27488238</id>
			<updated>2013-05-21T08:00:35+00:00</updated>
		</source>
	</entry>

	<entry>
		<title type="html">Various old projects migrated to githtub</title>
		<link href="http://t-a-w.blogspot.com/2013/04/various-old-projects-migrated-to-githtub.html"/>
		<id>tag:blogger.com,1999:blog-27488238.post-6278060086086369914</id>
		<updated>2013-04-06T15:50:35+00:00</updated>
		<content type="html">&lt;div class=&quot;separator&quot;&gt;&lt;a href=&quot;http://3.bp.blogspot.com/-om9eMoYuxoE/UWAZJknrBMI/AAAAAAAABvg/ZbwYcJIXPuc/s1600/fluffy_buff_tom_and_bike_frame_by_chriss_pagani_from_flickr_cc-nc-nd.jpg&quot; title=&quot;Fluffy Buff Tom and Bike Frame by Chriss Pagani from flickr (CC-NC-ND)&quot;&gt;&lt;img alt=&quot;Fluffy Buff Tom and Bike Frame by Chriss Pagani from flickr (CC-NC-ND)&quot; border=&quot;0&quot; height=&quot;426&quot; src=&quot;http://3.bp.blogspot.com/-om9eMoYuxoE/UWAZJknrBMI/AAAAAAAABvg/ZbwYcJIXPuc/s640/fluffy_buff_tom_and_bike_frame_by_chriss_pagani_from_flickr_cc-nc-nd.jpg&quot; width=&quot;640&quot; /&gt;&lt;/a&gt;&lt;/div&gt;&lt;br /&gt;Once upon a time I did &lt;a href=&quot;http://t-a-w.blogspot.com/2012/03/software-triage.html&quot;&gt;&quot;software triage&quot;&lt;/a&gt;&amp;nbsp; to decide which of my software are viable, and which are dead.&lt;br /&gt;&lt;br /&gt;Today I took another look at it, and decided to move most of my old projects - even ones that are pretty much dead - from variety of places like Sourceforge, GNU Savannah, Google Code, and tarball dumps on ftp server to github.&lt;br /&gt;&lt;br /&gt;I don't really expect any of them to see much use, but there's always an off chance, and if I didn't move them to github, I might as well simply delete them from the Internet completely.&lt;br /&gt;&lt;br /&gt;Here's the list of migrated projects:&lt;br /&gt;&lt;ul&gt;&lt;li&gt;&lt;a href=&quot;https://github.com/taw/rpu&quot;&gt;RPU&lt;/a&gt; - my MSc thesis. If you want to see how to write compilers in OCaml, it might be somewhat useful.&lt;/li&gt;&lt;li&gt;&lt;a href=&quot;https://github.com/taw/ipod-lastfm-bridge&quot;&gt;iPod-last.fm bridge&lt;/a&gt; - I'm a Sansa Clip user now, and there's no way in hell I'm going back to iPods, but if you need this script updated (I have no idea if it still works or not), ask me, and I could probably figure out how to update it&lt;/li&gt;&lt;li&gt;&lt;a href=&quot;https://github.com/taw/tawbot&quot;&gt;tawbot&lt;/a&gt; - Wikipedia admin bot. I know once upon a time it had quite a few users, but I haven't heard from them in a while. If you need help with it, ask away.&lt;/li&gt;&lt;li&gt;&lt;a href=&quot;https://github.com/taw/xss-shield&quot;&gt;XSS Shield for Rails 1.2.x&lt;/a&gt; - very similar system is included in recent Rails, so I doubt anybody needs this today.&lt;/li&gt;&lt;li&gt;&lt;a href=&quot;https://github.com/taw/freetable&quot;&gt;freetable&lt;/a&gt; - HTML table generator. I know it had users once upon a time, no idea if they're still active.&lt;/li&gt;&lt;li&gt;&lt;a href=&quot;https://github.com/taw/jsme&quot;&gt;jsme&lt;/a&gt; - Driver to use joystick as mouse on Linux. I made it ages ago because I accidentally my whole mouse port. I really doubt anybody would need that today, or that it would even work.&lt;/li&gt;&lt;li&gt;&lt;a href=&quot;https://github.com/taw/gtkidp&quot;&gt;gtkidp&lt;/a&gt; - Interface for Internet Dictionary Project files. I like command line dictionaries, and I contributed to dictd stuff because Wikipedia made this kind of stuff cool, but these days it's probably not going to see much use. I don't even know if it works with recent varieties of Gtk.&lt;/li&gt;&lt;/ul&gt;&lt;div&gt;&lt;h3&gt;Still TODO&lt;/h3&gt;&lt;br /&gt;I'm still not sure what to do with &lt;a href=&quot;http://taw.chaosforge.org/&quot;&gt;the ftp server I used to put my stuff on&lt;/a&gt;. Using less eye-violating styling would be a good start. I'm not entirely sure why the hell I picked that color scheme in the first place, and if it was meant as some kind of a joke or not.&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;And there's still my local ~/everything git repository. I put &lt;a href=&quot;https://github.com/taw/unix-utilities&quot;&gt;a few of its utilities on github&lt;/a&gt;, but there's orders of magnitude more code there - some even doesn't violate any website's ToS. There are 218 top level directories there, I'm sure at least 10% of them could be made public without any major problems.&lt;br /&gt;&lt;br /&gt;And a lot of the software migrated to github still needs some serious work, like turning them into proper gems, compatibility with modern versions of everything and so on. If you have any special requests, just contact me.&lt;/div&gt;</content>
		<author>
			<name>taw</name>
			<email>noreply@blogger.com</email>
			<uri>http://t-a-w.blogspot.com/search/label/ruby</uri>
		</author>
		<source>
			<title type="html">taw's blog</title>
			<subtitle type="html">The best kittens, technology, and video games blog in the world.</subtitle>
			<link rel="self" href="http://t-a-w.blogspot.com/feeds/posts/default/-/ruby?orderby=published"/>
			<id>tag:blogger.com,1999:blog-27488238</id>
			<updated>2013-05-21T08:00:35+00:00</updated>
		</source>
	</entry>

	<entry xml:lang="en-US">
		<title type="html">Jump Start Sinatra</title>
		<link href="http://feeds.oreilly.com/~r/oreilly/ruby/~3/E1JzkG800Jo/"/>
		<id>http://oreilly.com/catalog/9780987332141/</id>
		<updated>2013-03-27T22:37:31+00:00</updated>
		<content type="html">&lt;a href=&quot;http://oreilly.com/catalog/9780987332141/&quot;&gt;&lt;img src=&quot;http://covers.oreilly.com/images/9780987332141/bkt.gif&quot; /&gt;&lt;/a&gt;&lt;p&gt;This short SitePoint book provides readers with a fun and yet practical introduction to Sinatra, a framework that makes web development with Ruby extremely simple. It's not intended to be a completely comprehensive guide to the framework or an in-depth Ruby tutorial, but will quickly get you up to speed with Sinatra and give you the confidence to start experimenting on your own.&lt;br /&gt;&lt;br /&gt; The book is built around a real-life example project: a content management system. It's a fun and easily understandable project that is used to demonstrate the concepts outlined in the book in a practical way.&lt;br /&gt;&lt;br /&gt; This is a clear, approachable and very easy-to-follow book that will get you to to speed with Sinatra in no time.&lt;/p&gt;
	&lt;img src=&quot;http://feeds.feedburner.com/~r/oreilly/ruby/~4/E1JzkG800Jo&quot; height=&quot;1&quot; width=&quot;1&quot; /&gt;</content>
		<author>
			<name>Darren Jones</name>
			<uri>http://oreilly.com/ruby</uri>
		</author>
		<source>
			<title type="html">O'Reilly Media: Ruby and Rails</title>
			<subtitle type="html">A compilation of O'Reilly Media's information about the Ruby programming language from news, books, conferences, courses, community, and reports.</subtitle>
			<link rel="self" href="http://feeds.oreilly.com/oreilly/ruby"/>
			<id>http://oreilly.com/ruby</id>
			<updated>2013-04-24T22:00:26+00:00</updated>
			<rights type="html">Copyright O'Reilly Media, Inc.</rights>
		</source>
	</entry>

	<entry xml:lang="en-US">
		<title type="html">Cucumber Recipes</title>
		<link href="http://feeds.oreilly.com/~r/oreilly/ruby/~3/Vwi-AOxOjPw/"/>
		<id>http://oreilly.com/catalog/9781937785017/</id>
		<updated>2013-02-12T21:35:42+00:00</updated>
		<content type="html">&lt;a href=&quot;http://oreilly.com/catalog/9781937785017/&quot;&gt;&lt;img src=&quot;http://covers.oreilly.com/images/9781937785017/bkt.gif&quot; /&gt;&lt;/a&gt;&lt;p&gt;You can test just about anything with Cucumber. We certainly have, and in &lt;i&gt;Cucumber Recipes&lt;/i&gt; we'll show you how to apply our hard-won field experience to your own projects. Once you've mastered the basics, this book will show you how to get the most out of Cucumber--from specific situations to advanced test-writing advice. With over forty practical recipes, you'll test desktop, web, mobile, and server applications across a variety of platforms. This book gives you tools that you can use today to automate any system that you encounter, and do it well.&lt;/p&gt;
	&lt;img src=&quot;http://feeds.feedburner.com/~r/oreilly/ruby/~4/Vwi-AOxOjPw&quot; height=&quot;1&quot; width=&quot;1&quot; /&gt;</content>
		<author>
			<name>O'Reilly Media, Inc.</name>
			<uri>http://oreilly.com/ruby</uri>
		</author>
		<source>
			<title type="html">O'Reilly Media: Ruby and Rails</title>
			<subtitle type="html">A compilation of O'Reilly Media's information about the Ruby programming language from news, books, conferences, courses, community, and reports.</subtitle>
			<link rel="self" href="http://feeds.oreilly.com/oreilly/ruby"/>
			<id>http://oreilly.com/ruby</id>
			<updated>2013-04-24T22:00:26+00:00</updated>
			<rights type="html">Copyright O'Reilly Media, Inc.</rights>
		</source>
	</entry>

	<entry xml:lang="en-US">
		<title type="html">Four short links: 1 February 2013</title>
		<link href="http://feeds.oreilly.com/~r/oreilly/ruby/~3/IOWfkOqH9S0/four-short-links-1-february-2013.html"/>
		<id>http://radar.oreilly.com/2013/02/four-short-links-1-february-2013.html</id>
		<updated>2013-02-01T12:41:13+00:00</updated>
		<content type="html">Icon Fonts are Awesome &amp;#8212; yes, yes they are. (via Fog Creek) What the Rails Security Issue Means for Your Startup &amp;#8212; excellent, clear, emphatic advice on how and why security matters and what it looks like when you take &amp;#8230;
	&lt;img src=&quot;http://feeds.feedburner.com/~r/oreilly/ruby/~4/IOWfkOqH9S0&quot; height=&quot;1&quot; width=&quot;1&quot; /&gt;</content>
		<author>
			<name>Nat Torkington</name>
			<uri>http://oreilly.com/ruby</uri>
		</author>
		<source>
			<title type="html">O'Reilly Media: Ruby and Rails</title>
			<subtitle type="html">A compilation of O'Reilly Media's information about the Ruby programming language from news, books, conferences, courses, community, and reports.</subtitle>
			<link rel="self" href="http://feeds.oreilly.com/oreilly/ruby"/>
			<id>http://oreilly.com/ruby</id>
			<updated>2013-04-24T22:00:26+00:00</updated>
			<rights type="html">Copyright O'Reilly Media, Inc.</rights>
		</source>
	</entry>

	<entry>
		<title type="html">Constant and Global Optimization in JRuby 1.7.1 and 1.7.2</title>
		<link href="http://blog.headius.com/2013/01/constant-and-global-optimization-in.html"/>
		<id>tag:blogger.com,1999:blog-4704664917418794835.post-5392245422382118146</id>
		<updated>2013-01-05T08:47:32+00:00</updated>
		<content type="html">&lt;div dir=&quot;ltr&quot;&gt;With every JRuby release, there's always at least a handful of optimizations. They range from tiny improvements in the compiler to perf-aware rewrites of core class methods, but they're almost always driven by real-world cases.&lt;br /&gt;&lt;br /&gt;In JRuby 1.7.1 and 1.7.2, I made several improvements to the performance of Ruby constants and global variables that might be of some interest to you, dear reader.&lt;br /&gt;&lt;br /&gt;&lt;h2&gt;Constants&lt;/h2&gt;&lt;div&gt;In Ruby, a constant is a lexically and hierarchically accessed variable that starts with a capital letter. Class and module names like Object, Kernel, String, are all constants defined under the Object class. When I say constants are both lexical and hierarchically accessed, what I mean is that at access time we first search outward through lexically-enclosing scopes, and failing that we search through the class hierarchy of the innermost scope. For example:&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;Here, the first two constant accesses inside class B are successful; the first (IN_FOO) is located lexically in Foo, because it encloses the body of class B. The second (IN_A) is located hierarchically by searching B's ancestors. The third access fails, because the IN_BAR constant is only available within the Bar module's scope, so B can't see it.&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;Constants also...aren't. It is possible to redefine a constant, or define new constants deeper in a lexical or hierarchical strcture that mask earlier ones. However in most code (i.e. &quot;good&quot; code) constants eventually stabilize. This makes it possible to perform a variety of optimizations against them, even though they're not necessarily static.&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;Constants are used heavily throughout Ruby, both for constant values like Float::MAX and for classes like Array or Hash. It is therefore especially important that they be as fast as possible.&lt;br /&gt;&lt;br /&gt;&lt;/div&gt;&lt;h2&gt;Global Variables&lt;/h2&gt;&lt;div&gt;Globals in Ruby are about like you'd expect...name/value pairs in a global namespace. They start with &amp;nbsp;$ character. Several global variables are &quot;special&quot; and exist in a more localized source, like $~ (last regular expression match in this call frame), $! (last exception raised in this thread), and so on. Use of these &quot;local globals&quot; mostly just amounts to special variable names that are always available; they're not really true global variables.&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;Everyone knows global variables should be discouraged, but that's largely referring to global variable use in normal program flow. Using global state across your application – potentially across threads – is a pretty nasty thing to do to yourself and your coworkers. But there are some valid uses of globals, like for logging state and levels, debugging flags, and truly global constructs like standard IO.&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;Here, we're using the global $DEBUG to specify whether logging should occur in MyApp#log. Those log messages are written to the stderr stream accessed via $stderr. Note also that $DEBUG can be set to true by passing -d at the JRuby command line.&lt;br /&gt;&lt;br /&gt;&lt;/div&gt;&lt;h2&gt;Optimizing Constant Access (pre-1.7.1)&lt;/h2&gt;&lt;div&gt;I've posted in the past about how JRuby optimizes constant access, so I'll just quickly review that here.&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;At a given access point, constant values are looked up from the current lexical scope and cached. Because constants can be modified, or new constants can be introduce that mask earlier ones, the JRuby runtime (org.jruby.Ruby) holds a global constant invalidator checked on each access to ensure the previous value is still valid.&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;On non-invokedynamic JVMs, verifying the cache involves an object identity comparison every time, which means a non-final value must be accessed via a couple levels of indirection. This adds a certain amount of overhead to constant access, and also makes it impossible for the JVM to fold multiple constant accesses away, or make static decisions based on a constant's value.&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;On an invokedynamic JVM, the cache verification is in the form of a SwitchPoint. SwitchPoint is a type of on/off guard used at invokedynamic call sites to represent a hard failure. Because it can only be switched off, the JVM is able to optimize the SwitchPoint logic down to what's called a &quot;safe point&quot;, a very inexpensive ping back into the VM. As a result, constant accesses under invokedynamic can be folded away, and repeat access or unused accesses are not made at all.&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;However, there's a problem. In JRuby 1.7.0 and earlier, the only way we could access the current lexical scope (in a StaticScope object) was via the current call frame's DynamicScope, a heap-based object created on each activation of a given body of code. In order to reduce the performance hit to methods containing constants, we introduced a one-time DynamicScope called the &quot;dummy scope&quot;, attached to the lexical scope and only created once. This avoided the huge hit of constructing a DynamicScope for every call, but caused constant-containing methods to be considerably slower than those without constants.&lt;br /&gt;&lt;br /&gt;&lt;/div&gt;&lt;h2&gt;Lifting Lexical Scope Into Code&lt;/h2&gt;&lt;div&gt;In JRuby 1.7.1, I decided to finally bite the bullet and make the lexical scope available to all method bodies, without requiring a DynamicScope intermediate. This was a&amp;nbsp;&lt;a href=&quot;https://github.com/jruby/jruby/compare/fb65c539a9b4f52d1d063dbe36de69217ab6a896...ad5d07291d09f57849f873d405607fbb6fed1544&quot;&gt;nontrivial piece of work&lt;/a&gt;&amp;nbsp;that took several days to get right, so although most of the work occurred before JRuby 1.7.0 was released, we opted to let it bake a bit before release.&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;The changes made it possible for all class, module, method, and block bodies to access their lexical scope essentially for free. It also helped us finally deliver on the promise of truly free constant access when running under invokedynamic.&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;So, does it work?&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;Assuming constant access is free, the three loops here should perform identically. The non-expression calls to foo and bar should disappear, since they both return a constant value that's never used. The calls for decrementing the 'a' variable should produce a constant value '1' and perform the same as the literal decrement in the control loop.&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;Here's Ruby (MRI) 2.0.0 performance on this benchmark.&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;The method call itself adds a significant amount of overhead here, and the constant access adds another 50% of that overhead. Ruby 2.0.0 has done a lot of work on performance, but the cost of invoking Ruby methods and accessing constants remains high, and constant accesses do not fold away as you would like.&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;Here's JRuby 1.7.2 performance on the same benchmark.&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;We obviously run all cases significantly faster than Ruby 2.0.0, but the important detail is that the method call adds only about 11% overhead to the control case, and constant access adds almost nothing.&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;For comparison, here's JRuby 1.7.0, which did not have free access to lexical scopes.&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;So by avoiding the intermediate DynamicScope, methods containing constant accesses are somewhere around 7x faster than before. Not bad.&lt;br /&gt;&lt;br /&gt;&lt;/div&gt;&lt;h2&gt;Optimizing Global Variables&lt;/h2&gt;&lt;div&gt;Because global variables have a much simpler structure than constants, they're pretty easy to optimize. I had not done so up to JRuby 1.7.1 mostly because I didn't see a compelling use case and didn't want to encourage their use. However, after Tony Arcieri pointed out that invokedynamic-optimized global variables could be used to add logging and profiling to an application with zero impact when disabled, I was convinced. Let's look at the example from above again.&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;In this example, we would ideally like there to be no overhead at all when $DEBUG is untrue, so we're free to add optional logging throughout the application with no penalty. In order to support this, two improvements were needed.&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;First, I modified our invokedynamic logic to cache global variables using a per-variable SwitchPoint. This makes access to mostly-static global variables as free as constant access, with the same performance improvements.&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;Second, I added some smarts into the compiler for conditional forms like &quot;if $DEBUG&quot; that would avoid re-checking the $DEBUG value at all if it were false the first time (and start checking it again if it were modified).&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;It's worth noting I also made this second optimization for constants; code like &quot;if DEBUG_ENABLED&quot; will also have the same performance characteristics.&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;Let's see how it performs.&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;In this case, we should again expect that all three forms have identical performance. Both the constant and the global resolve to an untrue value, so they should ideally not introduce any overhead compared to the bare method.&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;Here's Ruby (MRI) 2.0.0:&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;Both the global and the constant add overhead here in the neighborhood of 25% over an empty method. This means you can't freely add globally-conditional logic to your application without accepting a performance hit.&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;JRuby 1.7.2:&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;Again we see JRuby +&amp;nbsp;invokedynamic optimizing method calls considerably better than MRI, but additionally we see that the untrue global conditions add no overhead compared to the empty method. You can freely use globals as conditions for logging, profiling, and other code you'd like to have disabled most of the time.&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;And finally, JRuby 1.7.1, which optimized constants, did not optimize globals, and did not have specialized conditional logic for either:&lt;br /&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;&lt;/div&gt;&lt;h2&gt;Where Do We Go From Here?&lt;/h2&gt;&lt;div&gt;Hopefully I've helped show that we're really just seeing the tip of the iceberg as far as optimizing JRuby using invokedynamic. More than anything we want you to report real-world use cases that could benefit from additional optimization, so we can target our work effectively. And as always, please try out your apps on JRuby, enable JRuby testing in Travis CI, and let us know what we can do to make your JRuby experience better!&lt;/div&gt;&lt;/div&gt;</content>
		<author>
			<name>Charles Nutter</name>
			<email>noreply@blogger.com</email>
			<uri>http://blog.headius.com/</uri>
		</author>
		<source>
			<title type="html">Headius</title>
			<subtitle type="html">Helping the JVM Into the 21st Century</subtitle>
			<link rel="self" href="http://blog.headius.com/feeds/posts/default"/>
			<id>tag:blogger.com,1999:blog-4704664917418794835</id>
			<updated>2013-05-21T18:00:07+00:00</updated>
		</source>
	</entry>

	<entry>
		<title type="html">Easy Windows registry editing with JRuby</title>
		<link href="http://t-a-w.blogspot.com/2012/11/easy-windows-registry-editing-with-jruby.html"/>
		<id>tag:blogger.com,1999:blog-27488238.post-6621673722381386064</id>
		<updated>2012-11-29T04:37:05+00:00</updated>
		<content type="html">&lt;br /&gt;&lt;div class=&quot;separator&quot;&gt;&lt;a href=&quot;http://4.bp.blogspot.com/-FlC4OUCMw2g/ULbXbVj7XwI/AAAAAAAABmg/zLImZ0HVh9o/s1600/dscf0434_by_rabbit57i_from_flickr_cc-nc-nd.jpg&quot; title=&quot;DSCF0434 by rabbit57i from flickr (CC-NC-ND)&quot;&gt;&lt;img alt=&quot;DSCF0434 by rabbit57i from flickr (CC-NC-ND)&quot; border=&quot;0&quot; height=&quot;480&quot; src=&quot;http://4.bp.blogspot.com/-FlC4OUCMw2g/ULbXbVj7XwI/AAAAAAAABmg/zLImZ0HVh9o/s640/dscf0434_by_rabbit57i_from_flickr_cc-nc-nd.jpg&quot; width=&quot;640&quot; /&gt;&lt;/a&gt;&lt;/div&gt;&lt;br /&gt;&lt;div&gt;Like all people who came out of Unix tradition I approached Windows registry as something not to be touched even with a long stick, but it turned out not to be that bad.&lt;/div&gt;&lt;br /&gt;The first thing you need to know about Windows registry is that it has multiple roots. All our viewing and editing will apply to particular key only (usually HKEY_LOCAL_MACHINE).&lt;br /&gt;&lt;br /&gt;It looks like a pretty stupid decision, but then it comes from people who use C: D: etc. instead of single directory tree.&lt;br /&gt;&lt;br /&gt;&lt;pre&gt;&amp;nbsp; require &quot;win32/registry&quot;&lt;br /&gt;&amp;nbsp; def hklm&lt;br /&gt;&amp;nbsp; &amp;nbsp; Win32::Registry::HKEY_LOCAL_MACHINE&lt;br /&gt;&amp;nbsp; end&lt;/pre&gt;&lt;div&gt;Reading information from registry, like installation paths of various programs, is very easy:&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;&lt;pre&gt;&amp;nbsp; hklm.open('SOFTWARE\Wow6432Node\SEGA\Medieval II Total War')[&quot;AppPath&quot;] rescue nil&lt;br /&gt;&amp;nbsp; hklm.open('SOFTWARE\Wow6432Node\Microsoft\Windows\CurrentVersion\Uninstall\Steam App 4700')[&quot;InstallLocation&quot;] rescue nil&lt;br /&gt;&lt;/pre&gt;&lt;div&gt;If you want &lt;tt&gt;nil&lt;/tt&gt; instead of exception just &lt;tt&gt;rescue nil&lt;/tt&gt; the entire thing.&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;Writing to registry is very easy as well, here's actual example:&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;&lt;pre&gt;&amp;nbsp; &amp;nbsp; cv = hklm.create('SOFTWARE\Wow6432Node\SEGA\Medieval II Total War\Mods\Unofficial\Concentrated Vanilla')&lt;br /&gt;&amp;nbsp; &amp;nbsp; cv[&quot;Author&quot;]=&quot;Tomasz Wegrzanowski&quot;&lt;br /&gt;&amp;nbsp; &amp;nbsp; cv[&quot;ConfigFile&quot;]=&quot;concentrated_vanilla.cfg&quot;&lt;br /&gt;&amp;nbsp; &amp;nbsp; cv[&quot;DisplayName&quot;]=&quot;Concentrated Vanilla&quot;&lt;br /&gt;&amp;nbsp; &amp;nbsp; cv[&quot;FullName&quot;]=&quot;Concentrated Vanilla&quot;&lt;br /&gt;&amp;nbsp; &amp;nbsp; cv[&quot;Language&quot;]=&quot;english&quot;&lt;br /&gt;&amp;nbsp; &amp;nbsp; cv[&quot;Path&quot;]=&quot;mods/concentrated_vanilla&quot;&lt;br /&gt;&amp;nbsp; &amp;nbsp; cv[&quot;Version&quot;]=&quot;0.60&quot;&lt;br /&gt;&amp;nbsp; &amp;nbsp; cv[&quot;GameExe&quot;]=&quot;medieval2.exe&quot;&lt;br /&gt;&lt;/pre&gt;&lt;div&gt;And deleting things to uninstall:&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;&lt;pre&gt;&amp;nbsp; &amp;nbsp; hklm.delete_key('SOFTWARE\Wow6432Node\SEGA\Medieval II Total War\Mods\Unofficial\Concentrated Vanilla', true)&lt;/pre&gt;&lt;div&gt;&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;And that's about it. If you want to explore the registry either start &lt;tt&gt;regedit&lt;/tt&gt; program, or start &lt;tt&gt;jirb&lt;/tt&gt; and play with it from JRuby REPL.&lt;/div&gt;&lt;/div&gt;&lt;/div&gt;&lt;/div&gt;</content>
		<author>
			<name>taw</name>
			<email>noreply@blogger.com</email>
			<uri>http://t-a-w.blogspot.com/search/label/ruby</uri>
		</author>
		<source>
			<title type="html">taw's blog</title>
			<subtitle type="html">The best kittens, technology, and video games blog in the world.</subtitle>
			<link rel="self" href="http://t-a-w.blogspot.com/feeds/posts/default/-/ruby?orderby=published"/>
			<id>tag:blogger.com,1999:blog-27488238</id>
			<updated>2013-05-21T08:00:35+00:00</updated>
		</source>
	</entry>

	<entry>
		<title type="html">JRuby Swing GUIs with cheri gem</title>
		<link href="http://t-a-w.blogspot.com/2012/11/jruby-swing-guis-with-cheri-gem.html"/>
		<id>tag:blogger.com,1999:blog-27488238.post-1838364168379485601</id>
		<updated>2012-11-27T05:04:17+00:00</updated>
		<content type="html">&lt;br /&gt;&lt;br /&gt;&lt;div class=&quot;separator&quot;&gt;&lt;a href=&quot;http://4.bp.blogspot.com/-4iNiaLNCTNs/ULQ7ixU06fI/AAAAAAAABlo/oN169oAh4u8/s1600/pocketmew__by_sin_amigos_from_flickr_cc-by.jpg&quot; title=&quot;PocketMew by Sin Amigos from flickr (CC-BY)&quot;&gt;&lt;img alt=&quot;PocketMew by Sin Amigos from flickr (CC-BY)&quot; border=&quot;0&quot; height=&quot;640&quot; src=&quot;http://4.bp.blogspot.com/-4iNiaLNCTNs/ULQ7ixU06fI/AAAAAAAABlo/oN169oAh4u8/s640/pocketmew__by_sin_amigos_from_flickr_cc-by.jpg&quot; width=&quot;640&quot; /&gt;&lt;/a&gt;&lt;/div&gt;&lt;br /&gt;&lt;br /&gt;I am not in any way a fan of desktop GUI toolkits - HTML5 and jQuery totally spoiled me, so I resisted for a very long time making GUIs for my Total War tools - and happily enough, other people would sometimes make them for me.&lt;br /&gt;&lt;br /&gt;But this time I decided to make a desktop GUI, in JRuby, and that means one of the awful non-HTML toolkits.&lt;br /&gt;&lt;br /&gt;So my first idea was of course making a big window with a menu calling some functions, and big embedded HTML form with all stuff in HTML. I was even getting somewhere since Java Swing has HTML widget, but then it turned out it's HTML 3.2 only, no Javascript whatsoever, and serious pain to get data into and out of it.&lt;br /&gt;&lt;br /&gt;I also tried SWT, and hoped &lt;a href=&quot;https://github.com/taw/jruby-swt-cookbook&quot;&gt;danlucraft's cookbook&lt;/a&gt; would help me get somewhere with it, but I couldn't figure out most of the things I wanted to try, so I kept looking.&lt;br /&gt;&lt;br /&gt;Finally I found &lt;a href=&quot;http://cheri.rubyforge.org/&quot;&gt;this lovely cheri gem&lt;/a&gt;,&amp;nbsp;which didn't seem to such too hard. I've heard mostly horrible things about Swing API, but it was only as bad as the rumor says, at least for my simple use case.&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;I'll put all that code for public view eventually, but it's pretty massive, so here are just some tips for working with Swing and cheri.&lt;br /&gt;&lt;h3&gt;Basic window creation&lt;/h3&gt;Start a class and include &lt;tt&gt;Cheri::Swing&lt;/tt&gt; module. What you probably want to match HTML-ish behaviour is actually not a single layout manager but GridBagLayout (for actual layout) within ScrollPane (so you get scrollbars when content).&lt;br /&gt;&lt;br /&gt;That's the code:&lt;br /&gt;&lt;br /&gt;&lt;b&gt;class ConcentratedVanillaBuilder&lt;/b&gt;&lt;br /&gt;&lt;b&gt;&amp;nbsp; include Cheri::Swing&lt;/b&gt;&lt;br /&gt;&lt;b&gt;&lt;br /&gt;&lt;/b&gt;&lt;b&gt;&amp;nbsp; def initialize&lt;/b&gt;&lt;br /&gt;&lt;i&gt;&amp;nbsp; &amp;nbsp; @controls = {}&lt;/i&gt;&lt;br /&gt;&amp;nbsp; &amp;nbsp; &lt;b&gt;@frame = swing.frame('Concentrated Vanilla builder'){ |frm|&lt;/b&gt;&lt;br /&gt;&amp;nbsp; &amp;nbsp; &lt;b&gt;&amp;nbsp; size 800, 800&lt;/b&gt;&lt;br /&gt;&lt;b&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; default_close_operation :EXIT_ON_CLOSE&lt;/b&gt;&lt;br /&gt;&lt;i&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; build_menu!&lt;/i&gt;&lt;br /&gt;&lt;b&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; scroll_pane {&lt;/b&gt;&lt;br /&gt;&lt;b&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; panel {&lt;/b&gt;&lt;br /&gt;&lt;b&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; grid_bag_layout&lt;/b&gt;&lt;br /&gt;&lt;b&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; grid_table {&lt;/b&gt;&lt;br /&gt;&lt;b&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; background :WHITE&lt;/b&gt;&lt;br /&gt;&lt;i&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; build_form!&lt;/i&gt;&lt;br /&gt;&lt;b&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; }&lt;/b&gt;&lt;br /&gt;&lt;b&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; }&lt;/b&gt;&lt;br /&gt;&lt;b&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; }&lt;/b&gt;&lt;br /&gt;&lt;b&gt;&amp;nbsp; &amp;nbsp; }&lt;/b&gt;&lt;br /&gt;&lt;i&gt;&amp;nbsp; &amp;nbsp; load_settings! load_settings_file(&quot;settings/default.txt&quot;)&lt;/i&gt;&lt;br /&gt;&lt;b&gt;&amp;nbsp; &amp;nbsp; @frame.visible = true&lt;/b&gt;&lt;br /&gt;&lt;b&gt;&amp;nbsp; end&lt;/b&gt;&lt;br /&gt;&lt;b&gt;end&lt;/b&gt;&lt;br /&gt;&lt;div&gt;&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;This initialization is pretty generic (other than trivial matters of default window size and title), other than four italicized lines.&lt;br /&gt;&lt;br /&gt;&lt;h3&gt;Separate form buildings from settings&lt;/h3&gt;&lt;br /&gt;That's advice for GUIs that simply configure some settings and then run some script. You want to keep your settings in a nice Hash, and don't mix GUI code with settings defaults.&lt;br /&gt;&lt;br /&gt;So what you want are helper methods like these:&lt;br /&gt;&lt;pre&gt; &amp;nbsp;def checkbox(name, description)&lt;/pre&gt;&lt;br /&gt;&amp;nbsp; &amp;nbsp; grid_row{&lt;br /&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; @controls[&quot;checkbox-#{name}&quot;] = swing.check_box description, :a =&amp;gt; :w, :gridwidth =&amp;gt; 3&lt;br /&gt;&amp;nbsp; &amp;nbsp; }&lt;br /&gt;&amp;nbsp; end&lt;/div&gt;&lt;br /&gt;And then use methods on &lt;tt&gt;@controls[something]&lt;/tt&gt; to both get and set various fields. That's far easier than ton of on_change callbacks or whatever is their Swing equivalent.&lt;br /&gt;&lt;br /&gt;&lt;h3&gt;Use &lt;tt&gt;text_area&lt;/tt&gt; not &lt;tt&gt;label&lt;/tt&gt; for labels&lt;/h3&gt;&lt;br /&gt;Label widgets are pretty dumb, and non-editable text areas can handle things like multiline text and formatting a lot better.&lt;br /&gt;&lt;br /&gt;Just add some helper methods and pretend you're coding HTML:&lt;br /&gt;&lt;br /&gt;&lt;pre&gt; &amp;nbsp;def div_helpmsg(msg)&lt;/pre&gt;&lt;br /&gt;&amp;nbsp; &amp;nbsp; grid_row{&lt;br /&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; text_area(:a =&amp;gt; :w, :gridwidth =&amp;gt; 3){&lt;br /&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; editable false&lt;br /&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; text msg.gsub(/^\s+/, &quot;&quot;)&lt;br /&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; }&lt;br /&gt;&amp;nbsp; &amp;nbsp; }&lt;br /&gt;&amp;nbsp; end&lt;br /&gt;&lt;br /&gt;&amp;nbsp; def h1(msg)&lt;br /&gt;&amp;nbsp; &amp;nbsp; font = java.awt.Font.new('Dialog', java.awt.Font::BOLD, 24)&lt;br /&gt;&amp;nbsp; &amp;nbsp; grid_row{&lt;br /&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; text_area(:a =&amp;gt; :w, :gridwidth =&amp;gt; 3){&lt;br /&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; set_font font&lt;br /&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; editable false&lt;br /&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; text msg&lt;br /&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; }&lt;br /&gt;&amp;nbsp; &amp;nbsp; }&lt;br /&gt;&amp;nbsp; end&lt;br /&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;If things get too complicated, you can always go for full HTML widgets.&lt;br /&gt;&lt;h3&gt;Result&lt;/h3&gt;Half-finished result looks something like this. Not amazing, but it will do the trick.&lt;br /&gt;&lt;br /&gt;&lt;div class=&quot;separator&quot;&gt;&lt;a href=&quot;http://1.bp.blogspot.com/-sNfMW6qEkGI/ULQ58F-x5aI/AAAAAAAABlg/Ao487bxNEvE/s1600/Picture+1.png&quot;&gt;&lt;img border=&quot;0&quot; height=&quot;640&quot; src=&quot;http://1.bp.blogspot.com/-sNfMW6qEkGI/ULQ58F-x5aI/AAAAAAAABlg/Ao487bxNEvE/s640/Picture+1.png&quot; width=&quot;368&quot; /&gt;&lt;/a&gt;&lt;/div&gt;It will get released sometime soon, and then you'll be able to play &lt;a href=&quot;http://t-a-w.blogspot.com/2012/11/random-campaign-scenarios-for-medieval.html&quot;&gt;random scenarios&lt;/a&gt; everybody's waiting for.&lt;br /&gt;&lt;br /&gt;By the way if any Java / JRuby experts has better ideas, go ahead. Googling was unusually unhelpful to me here, and IRC and StackOverflow were as useless as they always are.</content>
		<author>
			<name>taw</name>
			<email>noreply@blogger.com</email>
			<uri>http://t-a-w.blogspot.com/search/label/ruby</uri>
		</author>
		<source>
			<title type="html">taw's blog</title>
			<subtitle type="html">The best kittens, technology, and video games blog in the world.</subtitle>
			<link rel="self" href="http://t-a-w.blogspot.com/feeds/posts/default/-/ruby?orderby=published"/>
			<id>tag:blogger.com,1999:blog-27488238</id>
			<updated>2013-05-21T08:00:35+00:00</updated>
		</source>
	</entry>

	<entry>
		<title type="html">Refining Ruby</title>
		<link href="http://blog.headius.com/2012/11/refining-ruby.html"/>
		<id>tag:blogger.com,1999:blog-4704664917418794835.post-5309576998658669333</id>
		<updated>2012-11-19T10:36:04+00:00</updated>
		<content type="html">&lt;div dir=&quot;ltr&quot;&gt;What does the following code do?&lt;br /&gt;&lt;br /&gt; If you answered &quot;it upcases two strings and adds them together, returning the result&quot; you might be wrong because of a new Ruby feature called &quot;refinements&quot;.&lt;br /&gt;&lt;br /&gt;Let's start with the problem refinements are supposed to solve: monkey-patching.&lt;br /&gt;&lt;br /&gt;&lt;h2&gt;Monkey-patching&lt;/h2&gt;&lt;div&gt;In Ruby, all classes are mutable. Indeed, when you define a new class, you're really just creating an empty class and filling it with methods. The ability to mutate classes at runtime has been used (or abused) by many libraries and frameworks to decorate Ruby's core classes with additional (or replacement) behavior. For example, you might add a &quot;camelize&quot; method to String that knows how to convert under_score_names to camelCaseNames. This is lovingly called &quot;monkey-patching&quot; by the Ruby community.&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;Monkey-patching can be very useful, and many patterns in Ruby are built around the ability to modify classes. It can also cause problems if a library patches code in a way the user does not expect (or want), or if two libraries try to apply conflicting patches. Sometimes, you simply don't want patches to apply globally, and this is where refinements come in.&lt;br /&gt;&lt;br /&gt;&lt;/div&gt;&lt;h2&gt;Localizing Monkeypatches&lt;/h2&gt;&lt;div&gt;Refinements have been discussed as a feature for several years, sometimes under the name &quot;selector namespaces&quot;. In essence, refinements are intended to allow monkey-patching only within certain limited scopes, like within a library that wants to use altered or enhanced versions of core Ruby types without affecting code outside the library. This is the case within the ActiveSupport library that forms part of the core of Rails.&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;ActiveSupport provides a number of extensions (patches) to the core Ruby classes like String#pluralize, Range#overlaps?, and Array#second. Some of these extensions are intended for use by Ruby developers, as conveniences that improve the readability or conciseness of code. Others exist mostly to support Rails itself. In both cases, it would be nice if we could prevent those extensions from leaking out of ActiveSupport into code that does not want or need them.&lt;br /&gt;&lt;br /&gt;&lt;/div&gt;&lt;h2&gt;Refinements&lt;/h2&gt;&lt;div&gt;In short, refinements provide a way to make class modifications that are only seen from within certain scopes. In the following example, I add a &quot;camelize&quot; method to the String class that's only seen from code within the Foo class.&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt; &lt;br /&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;With the Foo class refined, we can see that the &quot;camelize&quot; method is indeed available within the &quot;camelize_string&quot; method but not outside of the Foo class.&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt; &lt;br /&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;On the surface, this seems like exactly what we want. Unfortunately, there's a lot more complexity here than meets the eye.&lt;br /&gt;&lt;br /&gt;&lt;/div&gt;&lt;h2&gt;Ruby Method Dispatch&lt;/h2&gt;&lt;div&gt;In order to do a method call in Ruby, a runtime simply looks at the target object's class hierarchy, searches for the method from bottom to top, and upon finding it performs the call. A smart runtime will cache the method to avoid performing this search every time, but in general the mechanics of looking up a method body are rather simple.&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;In an implementation like JRuby, we might cache the method at what's called the &quot;call site&quot;—the point in Ruby code where a method call is actually performed. In order to know that the method is valid for future calls, we perform two checks at the call site: that the incoming object is of the same type as for previous calls; and that the type's hierarchy has not been mutated since the method was cached.&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;Up to now, method dispatch in Ruby has depended solely on the type of the target object. The calling context has not been important to the method lookup process, other than to confirm that visibility restrictions are enforced (primarily for protected methods, since private methods are rejected for non–self calls). That simplicity has allowed Ruby implementations to optimize method calls and Ruby programmers to understand code by simply determining the target object and methods available on it.&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;Refinements change everything.&lt;br /&gt;&lt;br /&gt;&lt;/div&gt;&lt;h2&gt;Refinements Basics&lt;/h2&gt;&lt;div&gt;Let's revisit the camelize example again.&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt; &lt;br /&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;The visible manifestation of refinements comes via the &quot;refine&quot; and &quot;using&quot; methods.&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;The &quot;refine&quot; method takes a class or module (the String class, in this case) and a block. Within the block, methods defined (camelize) are added to what might be called a patch set (a la monkey-patching) that can be applied to specific scopes in the future. The methods are not actually added to the refined class (String) except in a &quot;virtual&quot; sense when a body of code activates the refinement via the &quot;using&quot; method.&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;The &quot;using&quot; method takes a refinement-containing module and applies it to the current scope. Methods within that scope should see the refined version of the class, while methods outside that scope do not.&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;Where things get a little weird is in defining exactly what that scope should be and in implementing refined method lookup in such a way that does not negatively impact the performance of unrefined method lookup. In the current implementation of refinements, a &quot;using&quot; call affects all of the following scopes related to where it is called:&lt;/div&gt;&lt;div&gt;&lt;ul&gt;&lt;li&gt;The direct scope, such as the top-level of a script, the body of a class, or the body of a method or block&lt;/li&gt;&lt;li&gt;Classes down-hierarchy from a refined class or module body&lt;/li&gt;&lt;li&gt;Bodies of code run via eval forms that change the &quot;self&quot; of the code, such as module_eval&lt;/li&gt;&lt;/ul&gt;&lt;div&gt;It's worth emphasizing at this point that refinements can affect code far away from the original &quot;using&quot; call site. It goes without saying that refined method calls must now be aware of both the target type and the calling scope, but what of unrefined calls?&lt;br /&gt;&lt;br /&gt;&lt;/div&gt;&lt;/div&gt;&lt;h2&gt;Dynamic Scoping of Method Lookup&lt;/h2&gt;&lt;div&gt;Refinements (in their current form) basically cause method lookup to be dynamically scoped. In order to properly do a refined call, we need to know what refinements are active for the context in which the call is occurring and the type of the object we're calling against. The latter is simple, obviously, but determining the former turns out to be rather tricky.&lt;br /&gt;&lt;br /&gt;&lt;/div&gt;&lt;h3&gt;Locally-applied refinements&lt;/h3&gt;&lt;div&gt;In the simple case, where a &quot;using&quot; call appears alongside the methods we want to affect, the immediate calling scope contains everything we need. Calls in that scope (or in child scopes like method bodies) would perform method lookup based on the target class, a method name, and the hierarchy of scopes that surrounds them. The key for method lookup expands from a simple name to a name plus a call context.&lt;br /&gt;&lt;br /&gt;&lt;/div&gt;&lt;h3&gt;Hierarchically-applied refinements&lt;/h3&gt;&lt;div&gt;Refinements applied to a class must also affect subclasses, so even when we don't have a &quot;using&quot; call present we still may need to do refined dispatch. The following example illustrates this with a subclass of Foo (building off the previous example).&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt; &lt;br /&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;Here, the camelize method is used within a &quot;map&quot; call, showing that refinements used by the Foo class apply to Bar, its method definitions, and any subscopes like blocks within those methods. It should be apparent now why my first example might not do what you expect. Here's my first example again, this time with the Quux class visible.&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt; &lt;br /&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;The Quux class uses refinements from the BadRefinement module, effectively changing String#upcase to actually do String#reverse. By looking at the Baz class alone you can't tell what's supposed to happen, even if you are certain that str1 and str2 are always going to be String. Refinements have effectively localized the changes applied by the BadRefinement module, but they've also made the code more difficult to understand; the programmer (or the reader of the code) must know everything about the calling hierarchy to reason about method calls and expected results.&lt;br /&gt;&lt;br /&gt;&lt;/div&gt;&lt;h3&gt;Dynamically-applied refinements&lt;/h3&gt;&lt;div&gt;One of the key features of refinements is to allow block-based DSLs (domain-specific languages) to decorate various types of objects without affecting code outside the DSL. For example, an RSpec spec.&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt; &lt;br /&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;There's several calls here that we'd like to refine.&lt;/div&gt;&lt;div&gt;&lt;ul&gt;&lt;li&gt;The &quot;describe&quot; method is called at the top of the script against the &quot;toplevel&quot; object (essentially a singleton Object instance). We'd like to apply a refinement at this level so &quot;describe&quot; does not have to be defined on Object itself.&lt;/li&gt;&lt;li&gt;The &quot;it&quot; method is called within the block passed to &quot;describe&quot;. We'd like whatever self object is live inside that block to have an &quot;it&quot; method without modifying self's type directly.&lt;/li&gt;&lt;li&gt;The &quot;should&quot; method is called against an instance of MyClass, presumably a user-created class that does not define such a method. We would like to refine MyClass to have the &quot;should&quot; method only within the context of the block we pass to &quot;it&quot;.&lt;/li&gt;&lt;li&gt;Finally, the &quot;be_awesome&quot; method—which RSpec translates into a call to MyClass#awesome?—should be available on the self object active in the &quot;it&quot; block without actually adding be_awesome to self's type.&lt;/li&gt;&lt;/ul&gt;&lt;div&gt;In order to do this without having a &quot;using&quot; present in the spec file itself, we need to be able to dynamically apply refinements to code that might otherwise not be refined. The current implementation does this via Module#module_eval (or its argument-receiving brother, Module#module_exec).&lt;/div&gt;&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;A block of code passed to &quot;module_eval&quot; or &quot;instance_eval&quot; will see its self object changed from that of the original surrounding scope (the self at block creation time) to the target class or module. This is frequently used in Ruby to run a block of code as if it were within the body of the target class, so that method definitions affect the &quot;module_eval&quot; target rather than the code surrounding the block.&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;We can leverage this behavior to apply refinements to any block of code in the system. Because refined calls must look at the hierarchy of classes in the surrounding scope, every call in every block in every piece of code can potentially become refined in the future, if the block is passed via module_eval to a refined hierarchy. The following simple case might not do what you expect, even if the String class has not been modified directly.&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt; &lt;br /&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;Because the &quot;+&quot; method is called within a block, all bets are off. The str_ary passed in might not be a simple Array; it could be any user class that implements the &quot;inject&quot; method. If that implementation chooses, it can force the incoming block of code to be refined. Here's a longer version with such an implementation visible.&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt; &lt;br /&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;Suddenly, what looks like a simple addition of two strings produces a distinctly different result.&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt; &lt;br /&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;Now that you know how refinements work, let's discuss the problems they create.&lt;br /&gt;&lt;br /&gt;&lt;/div&gt;&lt;h2&gt;Implementation Challenges&lt;/h2&gt;&lt;div&gt;Because I know that most users don't care if a new, useful feature makes my life as a Ruby implementer harder, I'm not going to spend a great deal of time here.&amp;nbsp;My concerns revolve around the complexities of knowing when to do a refined call and how to discover those refinements.&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;Current Ruby implementations are all built around method dispatch depending solely on the target object's type, and much of the caching and optimization we do depends on that. With refinements in play, we must also search and guard against types in the caller's context, which makes lookup much more complicated. Ideally we'd be able to limit this complexity to only refined calls, but because &quot;using&quot; can affect code far away from where it is called, we often have no way to know whether a given call might be refined in the future. This is especially pronounced in the &quot;module_eval&quot; case, where code that isn't even in the same class hierarchy as a refinement must still observe it.&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;There are numerous ways to address the implementation challenges.&lt;br /&gt;&lt;br /&gt;&lt;/div&gt;&lt;h3&gt;Eliminate the &quot;module_eval&quot; Feature&lt;/h3&gt;&lt;div&gt;At present, nobody knows of an easy way to implement the &quot;module_eval&quot; aspect of refinements. The current implementation in MRI does it in a brute-force way, flushing the global method cache on every execution and generating a new, refined, anonymous module for every call. Obviously this is not a feasible direction to go; block dispatch will happen very frequently at runtime, and we can't allow refined blocks to destroy performance for code elsewhere in the system.&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;The basic problem here is that in order for &quot;module_eval&quot; to work, every block in the system must be treated as a refined body of code all the time. That means that calls inside blocks throughout the system need to search and guard against the calling context even if no refinements are ever applied to them. The end result is that those calls suffer complexity and performance hits across the board.&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;At the moment, I do not see (nor does anyone else see) an efficient way to handle the &quot;module_eval&quot; case. It should be removed.&lt;br /&gt;&lt;br /&gt;&lt;/div&gt;&lt;h3&gt;Localize the &quot;using&quot; Call&lt;/h3&gt;&lt;div&gt;No new Ruby feature should cause across-the-board performance hits; one solution is for refinements to be recognized at parse time. This makes it easy to keep existing calls the way they are and only impose refinement complexity upon method calls that are actually refined.&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;The simplest way to do this is also the most limiting and the most cumbersome: force &quot;using&quot; to only apply to the immediate scope. This would require every body of code to &quot;using&quot; a refinement if method calls in that body should be refined. Here's a couple of our previous examples with this modification.&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt; &lt;br /&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;This is obviously pretty ugly, but it makes implementation much simpler. In every scope where we see a &quot;using&quot; call, we simply force all future calls to honor refinements. Calls appearing outside &quot;using&quot; scopes do not get refined and perform calls as normal.&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;We can improve this by making &quot;using&quot; apply to child scopes as well. This still provides the same parse-time &quot;pseudo-keyword&quot; benefit without the repetition.&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt; &lt;br /&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;Even better would be to officially make &quot;using&quot; a keyword and have it open a refined scope; that results in a clear delineation between refined and unrefined code. I show two forms of this below; the first opens a scope like &quot;class&quot; or &quot;module&quot;, and the second uses a &quot;do...end&quot; block form.&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt; &lt;br /&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;It would be fair to say that requiring more explicit scoping of &quot;using&quot; would address my concern about knowing when to do a refined call. It does not, however, address the issues of locating active refinements at call time.&lt;br /&gt;&lt;br /&gt;&lt;/div&gt;&lt;h3&gt;Locating Refinements&lt;/h3&gt;&lt;div&gt;In each of the above examples, we still must pass some state from the calling context through to the method dispatch logic. Ideally we'd only need to pass in the calling object, which is already passed through for visibility checking. This works for refined class hierarchies, but it does not work for the RSpec case, since the calling object in some cases is just the top-level Object instance (and remember we don't want to decorate Object).&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;It turns out that there's already a feature in Ruby that follows lexical scoping: constant lookup. When Ruby code accesses a constant, the runtime must first search all enclosing scopes for a definition of that constant. Failing that, the runtime will walk the self object's class hierarchy. This is similar to what we want for the simplified version of refinements.&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;If we assume we've localized refinements to only calls within &quot;using&quot; scopes, then at parse time we can emit something like a RefinedCall for every method call in the code. A RefinedCall would be special in that it uses both the containing scope and the target class to look up a target method. The lookup process would proceed as follows:&lt;/div&gt;&lt;div&gt;&lt;ol&gt;&lt;li&gt;Search the call's context for refinements, walking lexical scopes only&lt;/li&gt;&lt;li&gt;If refinements are found, search for the target method&lt;/li&gt;&lt;li&gt;If a refined method is found, use it for the call&lt;/li&gt;&lt;li&gt;Otherwise, proceed with normal lookup against the target object's class&lt;/li&gt;&lt;/ol&gt;&lt;div&gt;Because the parser has already isolated refinement logic to specific calls, the only change needed is to pass the caller's context through to method dispatch.&lt;br /&gt;&lt;br /&gt;&lt;/div&gt;&lt;/div&gt;&lt;h2&gt;Usability Concerns&lt;/h2&gt;&lt;div&gt;There are indeed flavors of refinements that can be implemented reasonably efficiently, or at least implemented in such a way that unrefined code will not pay a price. I believe this is a requirement of any new feature: do no harm. But harm can come in a different form if a new feature makes Ruby code harder to reason about. I have some concerns here.&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;Let's go back to our &quot;module_eval&quot; case.&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt; &lt;br /&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;Because there's no &quot;using&quot; anywhere in the code, and we're not extending some other class, most folks will assume we're simply concatenating strings here. After all, why would I expect my &quot;+&quot; call to do something else? Why &lt;b&gt;should&lt;/b&gt;&amp;nbsp;my &quot;+&quot; call ever do something else here?&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;Ruby has many features that might be considered a little &quot;magical&quot;. In most cases, they're only magic because the programmer doesn't have a good understanding of how they work. Constant lookup, for example, is actually rather simple...but if you don't know it searches both lexical and hierarchical contexts, you may be confused where values are coming from.&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;The &quot;module_eval&quot; behavior of refinements simply goes too far. It forces every Ruby programmer to second-guess every block of code they pass into someone else's library or someone else's method call. The guarantees of standard method dispatch no longer apply; you need to know if the method you're calling will change what calls your code makes. You need to understand the internal details of the target method. That's a terrible, terrible thing to do to Rubyists.&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;The same goes for refinements that are active down a class hierarchy. You can no longer extend a class and know that methods you call actually do what you expect. Instead, you have to know whether your parent classes or their ancestors refine some call you intend to make. I would argue this is considerably &lt;b&gt;worse&lt;/b&gt;&amp;nbsp;than directly monkey-patching some class, since at least in that case every piece of code has a uniform view.&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;The problems are compounded over time, too. As libraries you use change, you need to again review them to see if refinements are in play. You need to understand all those refinements just to be able to reason about your own code. And you need to hope and pray two libraries you're using don't define different refinements, causing one half of your application to behave one way and the other half of your application to behave another way.&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;I believe the current implementation of refinements introduces more complexity than it solves, mostly due to the lack of a strict lexical &quot;using&quot;. Rubyists should be able to look at a piece of code and know what it does based solely on the types of objects it calls. Refinements make that impossible.&lt;br /&gt;&lt;br /&gt;&lt;i&gt;Update:&lt;/i&gt;&amp;nbsp;Josh Ballanco points out another usability problem: &quot;using&quot; only affects method bodies defined temporally after it is called. For example, the following code only refines the &quot;bar&quot; method, not the &quot;foo&quot; method.&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;This may simply be an artifact of the current implementation, or it may be specified behavior; it's hard to tell since there's no specification of any kind other than the implementation and a handful of tests. In any case, it's yet another confusing aspect, since it means the order in which code is loaded can actually change which refinements are active.&lt;br /&gt;&lt;br /&gt;&lt;/div&gt;&lt;h2&gt;tl;dr&lt;/h2&gt;&lt;div&gt;My point here is not to beat down refinements. I agree there are cases where they'd be very useful, especially given the sort of monkey-patching I've seen in the wild. But the current implementation overreaches; it provides several features of questionable value, while simultaneously making both performance and understandability harder to achieve. Hopefully we'll be able to work with Matz and ruby-core to come up with a more reasonable, limited version of refinements...or else convince them not to include refinements in Ruby 2.0.&lt;/div&gt;&lt;/div&gt;</content>
		<author>
			<name>Charles Nutter</name>
			<email>noreply@blogger.com</email>
			<uri>http://blog.headius.com/</uri>
		</author>
		<source>
			<title type="html">Headius</title>
			<subtitle type="html">Helping the JVM Into the 21st Century</subtitle>
			<link rel="self" href="http://blog.headius.com/feeds/posts/default"/>
			<id>tag:blogger.com,1999:blog-4704664917418794835</id>
			<updated>2013-05-21T18:00:07+00:00</updated>
		</source>
	</entry>

	<entry>
		<title type="html">Synchronized compressed logging the Unix way</title>
		<link href="http://t-a-w.blogspot.com/2010/07/synchronized-compressed-logging-unix.html"/>
		<id>tag:blogger.com,1999:blog-27488238.post-5317250054178740771</id>
		<updated>2012-10-18T06:16:17+00:00</updated>
		<content type="html">&lt;div class=&quot;separator&quot;&gt;&lt;a href=&quot;http://3.bp.blogspot.com/_IYGc_MWwkfw/TD48YwTnFDI/AAAAAAAAA-U/j6kkizBxLCY/s1600/tiny_tiny_kitten_4_weeks_old_by_georgeh23_from_flickr_cc-nc-nd.jpg&quot; title=&quot;tiny tiny kitten 4 weeks old by GeorgeH23 from flickr (CC-NC-ND)&quot;&gt;&lt;img alt=&quot;tiny tiny kitten 4 weeks old by GeorgeH23 from flickr (CC-NC-ND)&quot; border=&quot;0&quot; height=&quot;360&quot; src=&quot;http://3.bp.blogspot.com/_IYGc_MWwkfw/TD48YwTnFDI/AAAAAAAAA-U/j6kkizBxLCY/s640/tiny_tiny_kitten_4_weeks_old_by_georgeh23_from_flickr_cc-nc-nd.jpg&quot; width=&quot;640&quot; /&gt;&lt;/a&gt;&lt;/div&gt;In good Unix tradition if a program generates some data, in general it should write it to STDOUT, and you'll redirect it to the right file yourself.&lt;br /&gt;&lt;br /&gt;There are two problems with that, both easily solvable in separation:&lt;br /&gt;&lt;ul&gt;&lt;li&gt;If it's a lot of data, you want to store it compressed. It would be bad Unix to put compression directly in the program - the right way is to pipe its output through gzip with &lt;tt&gt;program | gzip &amp;gt;logfile.gz&lt;/tt&gt;. gzip is really fast, and usually adequate.&lt;/li&gt;&lt;li&gt;You want to be able to see what were the last lines written out by the program at any time. Especially if it appears frozen. Sounds trivial, but thanks to a horrible misdesign of libc, and everything else based on it, data you write gets buffered before being actually written - a totally reasonable thing - and there are no limits whatsoever how long it can stay in buffers! Fortunately it is possible to turn this misfeature off with a single line of &lt;tt&gt;STDOUT.sync=true&lt;/tt&gt; or equivalent in other languages.&lt;/li&gt;&lt;/ul&gt;Unfortunately while both fixes involve a single line of obvious code - there's no easy way to solve them together. Even if you flushed all data from the program to gzip, gzip can hold onto it indefinitely. Now unlike libc which is simply broken, gzip has a good reason - compression doesn't work on one byte at a time - it takes a big chunk, compresses it, and only then writes it all out.&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;Still, even if it has good reasons not to flush data as soon as possible, it can and very much should flush it every now and then - with flushing every few seconds reduction in compression ratio will be insignificant, and it will be possible to find out why the program frozen almost right away. The underlying zlib library totally has this feature - unfortunately command line gzip utility doesn't expose it.&lt;br /&gt;&lt;br /&gt;So I wrote this:&lt;br /&gt;&lt;br /&gt;&lt;pre&gt;#!/usr/bin/env ruby&lt;br /&gt;&lt;br /&gt;require 'thread'&lt;br /&gt;require 'zlib'&lt;br /&gt;&lt;br /&gt;def gzip_stream(io_in, io_out, flush_freq)&lt;br /&gt;&amp;nbsp; fh = Zlib::GzipWriter.wrap(io_out)&lt;br /&gt;&amp;nbsp; lock = Mutex.new&lt;br /&gt;&amp;nbsp; Thread.new{&lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; while true&lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; lock.synchronize{&lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; return if fh.closed?&lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; fh.flush if fh.pos &amp;gt; 0&lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; }&lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; sleep flush_freq&lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; end&lt;br /&gt;&amp;nbsp; }&lt;br /&gt;&amp;nbsp; io_in.each{|line|&lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; lock.synchronize{&lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; fh.print(line)&lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; }&lt;br /&gt;&amp;nbsp; }&lt;br /&gt;&amp;nbsp; fh.close&lt;br /&gt;end&lt;br /&gt;&lt;br /&gt;gzip_stream(STDIN, STDOUT, 5)&lt;/pre&gt;&lt;br /&gt;&lt;br /&gt;It reads lines on stdin, writes them to stdout, and flushes every 5 seconds (or whatever you configure) in a separate Ruby thread. Ruby green threads are little more than a wrapper over &lt;tt&gt;select()&lt;/tt&gt; in case you're wondering. The check that &lt;tt&gt;fh.pos&lt;/tt&gt; is non-zero is required as flushing before you write something seems to result in invalid output.&lt;br /&gt;&lt;br /&gt;Now you can &lt;tt&gt;program | gzip_stream &amp;gt;logfile.gz&lt;/tt&gt; without worrying about data getting stuck on the way (if you flush in your program that is).</content>
		<author>
			<name>taw</name>
			<email>noreply@blogger.com</email>
			<uri>http://t-a-w.blogspot.com/search/label/ruby</uri>
		</author>
		<source>
			<title type="html">taw's blog</title>
			<subtitle type="html">The best kittens, technology, and video games blog in the world.</subtitle>
			<link rel="self" href="http://t-a-w.blogspot.com/feeds/posts/default/-/ruby?orderby=published"/>
			<id>tag:blogger.com,1999:blog-27488238</id>
			<updated>2013-05-21T08:00:35+00:00</updated>
		</source>
	</entry>

	<entry>
		<title type="html">So You Want To Optimize Ruby</title>
		<link href="http://blog.headius.com/2012/10/so-you-want-to-optimize-ruby.html"/>
		<id>tag:blogger.com,1999:blog-4704664917418794835.post-6804148115098747648</id>
		<updated>2012-10-15T12:40:18+00:00</updated>
		<content type="html">&lt;div dir=&quot;ltr&quot;&gt;I was recently asked for a list of &quot;hard problems&quot; a Ruby implementation really needs to solve before reporting benchmark numbers. You know...the sort of problems that might invalidate early perf numbers because they impact how you optimize Ruby. This post is a rework of my response...I hope you find it informative!&lt;br /&gt;&lt;h4&gt;Fixnum to Bignum promotion&lt;/h4&gt;In Ruby, Fixnum math can promote to Bignum when the result is out of Fixnum's range. On implementations that use tagged pointers to represent Fixnum (MRI, Rubinius, MacRuby), the Fixnum range is somewhat less than the base CPU bits (32/64). On JRuby, Fixnum is always a straight 64-bit signed value.&lt;br /&gt;&lt;br /&gt;This promotion is a performance concern for a couple reasons:&lt;br /&gt;&lt;ul&gt;&lt;li&gt;Every math operation that returns a new Fixnum must be range-checked. This slows all Fixnum operations.&lt;/li&gt;&lt;li&gt;It is difficult (if not impossible) to predict whether a Fixnum math operation will return a Fixnum or a Bignum. Since Bignum is always represented as a full object (not a primitive or a tagged pointer) this impacts optimizing Fixnum math call sites.&lt;/li&gt;&lt;/ul&gt;&lt;h4&gt;Floating-point performance&lt;/h4&gt;A similar concern is the performance of floating point values. Most of&amp;nbsp;the native implementations have tagged values for Fixnum but only one&amp;nbsp;I know of (Macruby) uses tagged values for Float. This can skew&amp;nbsp;expectations because an implementation may perform very well on integer math and&amp;nbsp;considerably worse on floating-point math due to the objects created (and collected). JRuby uses objects for both Fixnum and Float, so performance is roughly equivalent (and slower than I'd like).&lt;br /&gt;&lt;h4&gt;Closures&lt;/h4&gt;&lt;div&gt;Any language that supports closures (&quot;blocks&quot; in Ruby) has to deal with efficiently accessing frame-local data from calls down-stack. In Java, both anonymous inner classes and the upcoming lambda feature treat frame-local values (local variables, basically) as immutable...so their values can simply be copied into the closure object or carried along in some other way. In Ruby, local variables are always mutable, so an eventual activation of a closure body needs to be able to write into its containing frame. If a runtime does not support arbitrary frame access (as is the case on the JVM) it may have to allocate a separate data structure to represent those frame locals...and that impacts performance.&lt;/div&gt;&lt;h4&gt;Bindings and eval&lt;/h4&gt;The eval methods in Ruby can usually accept an optional binding under which to run. This means any call to binding must return a fully-functional execution environment, and in JRuby this means both eval and binding force a full deoptimization of the surrounding method body.&lt;br /&gt;&lt;br /&gt;There's an even more unpleasant aspect to this, however: every block can be used as a binding too.&lt;br /&gt;&lt;br /&gt;All blocks can be&amp;nbsp;turned into Proc and used as bindings, which means every block in the&amp;nbsp;system has to have full access to values in the containing call frame. Most implementers hate this feature, since it means that optimizing call frames in the presence of blocks is much more difficult. Because they can be used as a binding, that of course means&amp;nbsp;literally all frame data must be accessible: local variables;&amp;nbsp;frame-local $ variables like $~; constants lookup environment; method visibility; and so on.&lt;br /&gt;&lt;h4&gt;callcc and Continuation&lt;/h4&gt;JRuby doesn't implement callcc since the JVM doesn't support continuations, but any implementation hoping to optimize Ruby will have to take a stance here. Continuations obviously make optimization more difficult since you can branch into and out of execution contexts in rather unusual ways.&lt;br /&gt;&lt;h4&gt;Fiber implementation&lt;/h4&gt;In JRuby, each Fiber runs on its own thread (though we pool the native thread to reduce Fiber spin-up costs). Other than that they&amp;nbsp;operate pretty much like closures.&lt;br /&gt;&lt;br /&gt;A Ruby implementer needs to decide whether it will use C-style native stack juggling (which makes optimizations like frame elimination trickier to implement) or give Fibers their own stacks in which to execute independently.&lt;br /&gt;&lt;h4&gt;Thread/frame/etc local $globals&lt;/h4&gt;Thread globals are easy, obviously. All(?) host systems already have some repesentation of thread-local values.&amp;nbsp;The tricky ones are explicit frame&amp;nbsp;globals like $~ and $_ and implicit frame-local values like&amp;nbsp;visibility, etc.&lt;br /&gt;&lt;br /&gt;In the case of $~ and $_, the challenge is not in representing accesses of them directly but in handling implicit reads and writes of them that cross call boundaries. For example, calling [] on a String and passing a Regexp will cause the caller's frame-local $~ (and related values) to be updated to the MatchData for the pattern match that happens inside []. There are a number of core Ruby methods like this that can reach back into the caller's frame and read or write these values. This obviously makes reducing or eliminating call frames very tricky.&lt;br /&gt;&lt;br /&gt;In JRuby, we track all core methods that read or write these values, and if we see those methods called in a body of code (the names, mind you...this is a static inspection), we will stand up a call frame for that body. This is not ideal. We would like to move these values into a separate stack that's lazily allocated only when actually needed, since methods that cross frames like String#[] force other methods like Array#[] to deoptimize too.&lt;br /&gt;&lt;h4&gt;C extension support&lt;/h4&gt;If a given Ruby implementation is likely to fit into the &quot;native&quot; side of Ruby&amp;nbsp;implementations (as opposed to implementations like JRuby or IronRuby that target an existing managed runtime), it will need to have a C extension story.&lt;br /&gt;&lt;br /&gt;Ruby's C&amp;nbsp;extension API is easier to support than some languages' native APIs (e.g. no reference-counting as in Python)&amp;nbsp;but it still very much impacts how a runtime optimizes. Because the API needs to return forever-valid object references, implementations that don't give out pointers will have to maintain a handle table. The API includes a number of macros that provide access to object internals; they'll need to be simulated or explicitly unsupported. And the API makes no guarantees about concurrency and provides few primitives for controlling concurrent execution, so most implementations will need to lock around native downcalls.&lt;br /&gt;&lt;br /&gt;An alternative for a new Ruby implementation is to expect extensions to be written in the host runtime's native language (Java or other JVM languages for JRuby; C# or other .NET languages for IronRuby, etc). However this imposes a burden on folks implementing language extensions, since they'll have to support yet another language to cover all Ruby implementations.&lt;br /&gt;&lt;br /&gt;Ultimately, though, the unfortunate fact for most &quot;native&quot; impls is that regardless of how fast&amp;nbsp;you can run Ruby code, the choke point is often going to be the C API&amp;nbsp;emulation, since it will require a lot of handle-juggling and indirection&amp;nbsp;compared to MRI. So without supporting the C API, there's a very large&amp;nbsp;part of the story missing...a part of the story that accesses frame&amp;nbsp;locals, closure bodies, bindings, and so on.&lt;br /&gt;&lt;br /&gt;Of course if you can run Ruby code as fast as C, maybe it won't&amp;nbsp;matter. :) Users can just implement their extensions in Ruby.&amp;nbsp;JRuby is starting to approach that kind of performance for non-numeric,&amp;nbsp;non-closure cases, but that sort of perf is not yet widespread enough to&amp;nbsp;bank on.&lt;br /&gt;&lt;h4&gt;Ruby 1.9 encoding support&lt;/h4&gt;Any benchmark that touches anything relating to binary text&amp;nbsp;data must have encoding support, or you're really fudging the&amp;nbsp;numbers. Encoding touches damn near everything, and can add a significant amount of overhead to String-manipulating benchmarks.&lt;br /&gt;&lt;h4&gt;Garbage collection and object allocation&lt;/h4&gt;It's easy for a new impl to show good performance on benchmarks that&amp;nbsp;do no allocation (or little allocation) and require no GC, like raw numerics (fib, tak, etc).&amp;nbsp;Macruby and Rubinius, for example, really shine here. But many impls&amp;nbsp;have drastically different performance when an algorithm starts&amp;nbsp;allocating objects.&amp;nbsp;Very&amp;nbsp;few applications are doing pure integer numeric algorithms, so object&lt;br /&gt;allocation and GC performance are an absolutely critical part of the performance story.&lt;br /&gt;&lt;h4&gt;Concurrency / Parallelism&lt;/h4&gt;If you intend to be an impl that supports parallel thread execution,&amp;nbsp;you're going to have to deal with various issues before publishing&amp;nbsp;numbers. For example, threads can #kill or #raise each other, which in&lt;br /&gt;a truly parallel runtime requires periodic safepoints/pings to know&amp;nbsp;whether a cross-thread event has fired. If you're not handling those&amp;nbsp;safepoints, you're not telling the whole story, since they impact execution.&lt;br /&gt;&lt;br /&gt;There's also the thread-safety of runtime structures to be considered. As an example,&amp;nbsp;Rubinius until recently had a hard lock around a data structure responsible for invalidating call sites, which&amp;nbsp;meant that its simple inline cache could see a severe performance&amp;nbsp;degradation at polymorphic call sites (they've since added polymorphic caching to ameliorate this case). The thread-safety of a Ruby implementation's core runtime structures can drastically impact even straight-line, non-concurrent performance.&lt;br /&gt;&lt;br /&gt;Of course, for an impl that doesn't support parallel execution (which&amp;nbsp;would put it in the somewhat more limited realm of MRI), you can get away with GIL&amp;nbsp;scheduling tricks. You just won't have a very good in-process scaling story.&lt;br /&gt;&lt;h4&gt;Tracing/debugging&lt;/h4&gt;All current impls support tracing or debugging APIs, though some (like&lt;br /&gt;JRuby) require you to enable support for them via command-line or compile-time flags. A Ruby implementation needs to have an answer for&amp;nbsp;this, since the runtime-level hooks required will have an impact...and may&amp;nbsp;require users to opt-in.&lt;br /&gt;&lt;h4&gt;ObjectSpace&lt;/h4&gt;ObjectSpace#each_object needs to be addressed before talking about&amp;nbsp;performance. In JRuby, supporting each_object over arbitrary types was&amp;nbsp;a major performance issue, since we had to track all objects in a&amp;nbsp;separate data structure in case they were needed. We ultimately&amp;nbsp;decided each_object would only work with Class and Module, since those&amp;nbsp;were the major practical use cases (and tracking Class/Module hierarchies is far easier than tracking all objects in the system).&lt;br /&gt;&lt;br /&gt;Depending on how a Ruby implementation tracks in-memory objects (and depending on the level of accuracy expected from ObjectSpace#each_object) this can impact how allocation logic and GC are optimized.&lt;br /&gt;&lt;h4&gt;Method invalidation&lt;/h4&gt;Several implementations can see severe global effects due to methods like Object#extend&amp;nbsp;blowing all global caches (or at least several caches), so you need to be&amp;nbsp;able to support #extend in a reasonable way before talking about&amp;nbsp;performance. Singleton objects also have a similar effect, since they&amp;nbsp;alter the character of method caches by introducing new anonymous types at&amp;nbsp;any time (and sometimes, in rapid succession).&lt;br /&gt;&lt;br /&gt;In JRuby, singleton and #extend effects are limited to the call sites that see them. I also have an experimental branch that's smarter about type identity, so simple anonymous types (that have only had modules included or extended into them) will not damage caches at all. Hopefully we'll land that in a future release.&lt;br /&gt;&lt;h4&gt;Constant lookup and invalidation&lt;/h4&gt;I believe all implementations have implemented constant cache&amp;nbsp;invalidation as a global invalidation, though there are other more&amp;nbsp;complicated ways to do it. The main challenge is the fact that constant lookup is tied to both lexical scope and class hiearchy, so invalidating individual constant lookup sites is usually infeasible. Constant lookup is also rather tricky&amp;nbsp;and must be implemented correctly&amp;nbsp;before talking about the performance of any benchmark that references&amp;nbsp;constants.&lt;br /&gt;&lt;h4&gt;Rails&lt;/h4&gt;&lt;div&gt;Finally, regardless of how awesome a new Ruby implementation claims to be, most users will simply ask &quot;but does it run Rails?&quot; You can substitute your favorite framework or library, if you like...the bottom line is that an awesome Ruby implementation that doesn't run any Ruby applications is basically useless. Beware of crowing about your victory over Ruby performance before you can run code people actually care about.&lt;/div&gt;&lt;/div&gt;</content>
		<author>
			<name>Charles Nutter</name>
			<email>noreply@blogger.com</email>
			<uri>http://blog.headius.com/</uri>
		</author>
		<source>
			<title type="html">Headius</title>
			<subtitle type="html">Helping the JVM Into the 21st Century</subtitle>
			<link rel="self" href="http://blog.headius.com/feeds/posts/default"/>
			<id>tag:blogger.com,1999:blog-4704664917418794835</id>
			<updated>2013-05-21T18:00:07+00:00</updated>
		</source>
	</entry>

	<entry>
		<title type="html">Explanation of Warnings From MRI's Test Suite</title>
		<link href="http://blog.headius.com/2012/09/explanation-of-warnings-from-mris-test.html"/>
		<id>tag:blogger.com,1999:blog-4704664917418794835.post-5284790001981835990</id>
		<updated>2012-09-26T15:06:40+00:00</updated>
		<content type="html">&lt;div dir=&quot;ltr&quot;&gt;JRuby has, for some time now, run the same &lt;a href=&quot;https://github.com/jruby/jruby/tree/master/test/externals/ruby1.9&quot;&gt;test suite as MRI&lt;/a&gt; (C Ruby, Matz's Ruby). Because not all tests pass, we use &lt;a href=&quot;https://github.com/seattlerb/minitest-excludes&quot;&gt;minitest-excludes&lt;/a&gt; to mask out the failures, and over time we unmask stuff as we fix it.&lt;br /&gt;&lt;br /&gt;However, there's a number of warnings we get from the suite that are nonfatal and unmaskable. I thought I'd show them to you and tell their stories.&lt;br /&gt;&lt;br /&gt;&lt;b&gt;JRuby 1.9 mode only supports the `psych` YAML engine; ignoring `syck`&lt;/b&gt;&lt;br /&gt;&lt;b&gt;&lt;br /&gt;&lt;/b&gt;&lt;div&gt;When we started implementing support for the new &quot;psych&quot; YAML engine that Aaron Patterson created (atop libyaml) for Ruby 1.9, we decided that we would not support the broken &quot;syck&quot; engine anymore. The libyaml version is strictly YAML spec compliant, and this is our contribution to ridding the world of &quot;syck&quot;'s broken YAML forever.&lt;br /&gt;&lt;br /&gt;&lt;b&gt;GC.stress= does nothing on JRuby&lt;/b&gt;&lt;br /&gt;&lt;b&gt;&lt;br /&gt;&lt;/b&gt;&lt;/div&gt;&lt;div&gt;JRuby does not have direct control over the JVM's GC, and so we can't implement things like GC.stress=, which MRI uses to put the GC into &quot;stress&quot; mode (GCing much more frequently to better test GC stability and behavior). There are flags for the JVM to do this sort of testing, but since we don't really need to test the JVM's GC for correctness and stability, we have not exposed those flags directly.&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;This flag is used in a number of MRI tests to force GC to happen more often and/or to actually test GC behaviors.&lt;br /&gt;&lt;br /&gt;&lt;b&gt;SAFE levels are not supported in JRuby&lt;/b&gt;&lt;br /&gt;&lt;b&gt;&lt;br /&gt;&lt;/b&gt;&lt;/div&gt;&lt;div&gt;JRuby does not support standard Ruby's security model, &quot;safe levels&quot;, because we believe safe levels are a flawed, too-coarse mechanism. On JRuby, you can use standard Java security policies.&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;We have debated mapping the various Ruby safe levels to equivalent sets of Java security permissions, but have never gotten around to it.&lt;br /&gt;&lt;br /&gt;&lt;b&gt;GC.enable does nothing on JRuby / GC.disable does nothing on JRuby&lt;/b&gt;&lt;br /&gt;&lt;b&gt;&lt;br /&gt;&lt;/b&gt;&lt;/div&gt;&lt;div&gt;There's no standard API on the JVM to disable the garbage collector completely, so GC.enable and GC.disable do nothing in JRuby.&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;It's also interesting to note that while you &lt;b&gt;can&lt;/b&gt;&amp;nbsp;request a GC run from the JVM by calling System.gc, JRuby also stubs out Ruby's GC.start. We opted to do this because GC.start is used in some Ruby libraries as a band-aid around Ruby's sometimes-slow GC, but the same call on JRuby is both unnecessary (because GC overhead is rarely a problem) and a major performance hit (because it triggers a full GC over the entire heap).&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;/div&gt;</content>
		<author>
			<name>Charles Nutter</name>
			<email>noreply@blogger.com</email>
			<uri>http://blog.headius.com/</uri>
		</author>
		<source>
			<title type="html">Headius</title>
			<subtitle type="html">Helping the JVM Into the 21st Century</subtitle>
			<link rel="self" href="http://blog.headius.com/feeds/posts/default"/>
			<id>tag:blogger.com,1999:blog-4704664917418794835</id>
			<updated>2013-05-21T18:00:07+00:00</updated>
		</source>
	</entry>

	<entry>
		<title type="html">An experiment in static compilation of Ruby: FASTRUBY!</title>
		<link href="http://blog.headius.com/2012/09/an-experiment-in-static-compilation-of.html"/>
		<id>tag:blogger.com,1999:blog-4704664917418794835.post-4242550722639887199</id>
		<updated>2012-09-16T22:46:42+00:00</updated>
		<content type="html">&lt;div dir=&quot;ltr&quot;&gt;While at GoGaRuCo this weekend, I finally made good on an experiment I had been thinking about for a while: &lt;a href=&quot;https://github.com/headius/fastruby&quot;&gt;a static compiler for Ruby&lt;/a&gt;. I thought I'd share it with you good people today.&lt;br /&gt;&lt;br /&gt;First we have a simple Ruby script with a class in it:&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;We compile it with fastruby, and it produces two .java source files: Hello.java and RObject.java.&lt;br /&gt;&lt;br /&gt;Hello.java implements the methods the Ruby class does in the script, and calls the same methods (with some mangling for invalid Java method names like _plus_ and _lt_).&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;RObject.java implements stubs for &lt;u&gt;all&lt;/u&gt; method names seen in the script. As a result, all dynamic calls can just be virtual invocations against RObject. Classes that implement one of the methods will just work and the call is direct. Classes that don't implement the called method will raise an error.&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;RKernel comes with fastruby, and provides Kernel-level methods like &quot;puts&quot;, plus methods for coercing to Java types like toBoolean and toString. It also caches some built-in singleton values like nil.&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;And there's a few other classes for this script to work. It should be easy to see how we could fill them out to do everything the equivalent Ruby classes do.&lt;br /&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;I don't have any support for a &quot;main&quot; method yet, so I wrote a little runner script to test it.&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;And away we go!&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;&lt;div&gt;This is about 30% faster than JRuby with invokedynamic. It is not doing any boundschecking (for rolling over to Bignum) but it is also not caching 1...256 Fixnum objects like JRuby does, nor caching them in any calls along the way (note that it creates three new RFixnums for every recursion that JRuby would not recreate). I call that pretty good.&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;Obviously because this is designed to compile the whole system at once, we could also emit optimized versions of methods that look like they're doing math. That is yet to come, if I continue this little experiment at all.&lt;br /&gt;&lt;br /&gt;There's also some fun possibilities here. By specifying Java types, the compiler could add normal Java methods. Implementing interfaces could be done directly. And Android applications built with this tool would be entirely statically optimizable, only shipping the small amount of code they actually call and having a very minimal runtime.&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;Pretty neat?&lt;/div&gt;&lt;/div&gt;&lt;/div&gt;</content>
		<author>
			<name>Charles Nutter</name>
			<email>noreply@blogger.com</email>
			<uri>http://blog.headius.com/</uri>
		</author>
		<source>
			<title type="html">Headius</title>
			<subtitle type="html">Helping the JVM Into the 21st Century</subtitle>
			<link rel="self" href="http://blog.headius.com/feeds/posts/default"/>
			<id>tag:blogger.com,1999:blog-4704664917418794835</id>
			<updated>2013-05-21T18:00:07+00:00</updated>
		</source>
	</entry>

	<entry>
		<title type="html">Avoiding Hash Lookups in a Ruby Implementation</title>
		<link href="http://blog.headius.com/2012/09/avoiding-hash-lookups-in-ruby.html"/>
		<id>tag:blogger.com,1999:blog-4704664917418794835.post-7858745721921352272</id>
		<updated>2012-09-04T02:02:34+00:00</updated>
		<content type="html">&lt;div dir=&quot;ltr&quot;&gt;I had an interesting realization tonight: I'm terrified of hash tables. Specifically, my work on JRuby (and even more directly, my work optimizing JRuby) has made me terrified to ever consider using a hash table in the hot path of any program or piece of code if there's any possibility of eliminating it. And what I've learned over the years is that the vast majority of execution-related (as opposed to data-related, purely dynamic-sourced lookup tables) hash tables are totally unnecessary.&lt;br /&gt;&lt;br /&gt;Some background might be interesting here.&lt;br /&gt;&lt;h2&gt;Hashes are a Language Designer's First Tool&lt;/h2&gt;&lt;div&gt;Anyone who's ever designed a simple language knows that pretty much everything you do is trivial to implement as a hash table. Dynamically-expanding tables of functions or methods? Hash table! Variables? Hash table! Globals? Hash table!&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;In fact, some languages never graduate beyond this phase and remain essentially gobs and gobs of hash tables even in fairly recent implementations. I won't name your favorite language here, but I will name one of mine: Ruby.&lt;/div&gt;&lt;h2&gt;Ruby: A Study in Hashes All Over the Freaking Place&lt;/h2&gt;&lt;div&gt;As with many dynamic languages, early (for some definition of &quot;early&quot;) implementations of Ruby used hash tables all over the place. Let's just take a brief tour through the many places hash tables are used in Ruby 1.8.7&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;(Author's note: 1.8.7 is now, by most measures, the &quot;old&quot; Ruby implementation, having been largely supplanted by the 1.9 series which boasts a &quot;real&quot; VM and optimizations to avoid most hot-path hash lookup.)&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;In Ruby (1.8.7), all of the following are (usually) implemented using hash lookups (and of these, many are hash lookups nearly every time, without any caching constructs):&lt;/div&gt;&lt;div&gt;&lt;ul&gt;&lt;li&gt;Method Lookup: Ruby's class hierarchy is essentially a tree of hash tables that contain, among other things, methods. Searching for a method involves searching the target object's class. If that fails, you must search the parent class, and so on. In the absence of any sort of caching, this can mean you search all the way up to the root of the hierarchy (Object or Kernel, depending what you consider root) to find the method you need to invoke. This is also known as &quot;slow&quot;.&lt;/li&gt;&lt;li&gt;Instance Variables: In Ruby, you do not declare ahead of time what variables a given class's object instances will contain. Instead, instance variables are allocated as they're assigned, like a hash table. And in fact, most Ruby implementations still use a hash table for variables under some circumstances, even though most of these variables can be statically determined ahead of time or dynamically determined (to static ends) at runtime.&lt;/li&gt;&lt;li&gt;Constants: Ruby's constants are actually &quot;mostly&quot; constant. They're a bit more like &quot;const&quot; in C, assignable once and never assignable again. Except that they &lt;b&gt;are&lt;/b&gt;&amp;nbsp;assignable again through various mechanisms. In any case, constants are also not declared ahead of time and are not purely a hierarchically-structured construct (they are both lexically and hierarchically scoped), and as a result the simplest implementation is a hash table (or chains of hash tables), once again.&lt;/li&gt;&lt;li&gt;Global Variables: Globals are frequently implemented as a top-level hash table even in modern, optimized language. They're also evil and you shouldn't use them, so most implementations don't even bother making them anything other than a hash table.&lt;/li&gt;&lt;li&gt;Local Variables: Oh yes, Ruby has not been immune to the greatest evil of all: purely hash table-based local variables. A &quot;pure&quot; version of Python would have to do the same, although in practice no implementations really support that (and yes, you can manipulate the execution frame to gain &quot;hash like&quot; behavior for Python locals, but you must surrender your Good Programmer's Card if you do). In Ruby's defense, however, hash tables were only ever used for closure scopes (blocks, etc), and no modern implementations of Ruby use hash tables for locals in any way.&lt;/li&gt;&lt;/ul&gt;&lt;div&gt;There are other cases (like class variables) that are less interesting than these, but this list serves to show how easy it is for a language implementer to fall into the &quot;everything's a hash, dude!&quot; hole, only to find they have an incredibly flexible and totally useless language. Ruby is not such a language, and almost all of these cases can be optimized into largely static, predictable code paths with nary a hash calculation or lookup to be found.&lt;/div&gt;&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;How? I'm glad you asked.&lt;/div&gt;&lt;h2&gt;JRuby: The Quest For Fewer Hashes&lt;/h2&gt;&lt;div&gt;If I were to sum up the past 6 years I've spent optimizing JRuby (and learning how to optimize dynamic languages) it would be with the following phrase: Get Rid Of Hash Lookups.&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;When I tweeted about this realization yesterday, I got a few replies back about better hashing algorithms (e.g. &quot;perfect&quot; hashes) and a a few replies from puzzled folks (&quot;what's wrong with hashes?&quot;), which made me realize that it's not always apparent how unnecessary most (execution-related) hash lookups really are (and from now on, when I talk about unnecessary or optimizable hash lookups, I'm talking about execution-related hash lookups; you data folks can get off my back right now).&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;So perhaps we should talk a little about why hashes are bad in the first place.&lt;/div&gt;&lt;h2&gt;What's Wrong With a Little Hash, Bro?&lt;/h2&gt;&lt;div&gt;The most obvious problem with using hash tables is the mind-crunching frustration of finding THE PERFECT HASH ALGORITHM. Every year there's a new way to calculate String hashes, for example, that's [ better | faster | securer | awesomer ] than all precedents. JRuby, along with many other languages, actually released a security fix last year to patch the great hash collision DoS exploit so many folks made a big deal about (while us language implementers just sighed and said &quot;maybe you don't actually want a hash table here, kids&quot;). Now, the implementation we put in place has again been &quot;exploited&quot; and we're told we need to move to cryptographic hashing. Srsly? How about we just give you a crypto-awesome-mersenne-randomized hash impl you can use for all your outward-facing hash tables and you can leave us the hell alone?&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;But I digress.&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;Obviously the cost of calculating hash codes is the first sin of a hash table. The second sin is deciding how, based on that hash code, you will distribute buckets. Too many buckets and you're wasting space. Too few and you're more likely to have a collision. Ahh, the intricate dance of space and time plagues us forever.&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;Ok, so let's say we've got some absolutely smashing hash algorithm and foresight enough to balance our buckets so well we make Lady Justice shed a tear. We're still screwed, my friends, because we've almost certainly defeated the prediction and optimization capabilities of our VM or our M, and we've permanently signed over performance in exchange for ease of implementation.&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;It is conceivable that a really good machine can learn our hash algorithm really well, but in the case of string hashing we still have to walk &lt;b&gt;some&lt;/b&gt;&amp;nbsp;memory to give us reasonable assurance of unique hash codes. So there's performance sin #1 violated: never read from memory.&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;Even if we ignore the cost of calculating a hash code, which at worst requires reading some object data from memory and at best requires reading a cached hash code from elsewhere in memory, we have to contend with how the buckets are implemented. Most hash tables implement the buckets as either of the typical list forms: an array (contiguous memory locations in a big chunk, so each element must be dereferenced...O(1) complexity) or a linked list (one entry chaining to the next through some sort of memory dereference, leading to O(N) complexity for searching collided entries).&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;Assuming we're using simple arrays, we're &lt;b&gt;still&lt;/b&gt;&amp;nbsp;making life hard for the machine since it has to see through at least one and possibly several mostly-opaque memory references. By the time we've got the data we're after, we've done a bunch of memory-driven calculations to find a chain of memory dereferences. And you wanted this to be fast?&lt;/div&gt;&lt;h2&gt;Get Rid Of The Hash&lt;/h2&gt;&lt;div&gt;Early attempts (of mine and others) to optimize JRuby centered around making hashing as cheap as possible. We made sure our tables only accepted interned strings, so we could guarantee they'd already calculated and cached their hash values. We used the &quot;programmer's hash&quot;, switch statements, to localize hash lookups closer to the code performing them, rather than trying to balance buckets. We explored complicated implementations of hierarchical hash tables that &quot;saw through&quot; to parents, so we could represent hierarchical method table relationships in (close to) O(1) complexity.&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;But we were missing the point. The problem was in our representing any of these language features as hash tables to begin with. And so we started working toward the implementation that has made JRuby actually become the fastest Ruby implementation: eliminate all hash lookups from hot execution paths.&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;How? Oh right, that's what we were talking about. I'll tell you.&lt;/div&gt;&lt;h2&gt;Method Tables&lt;/h2&gt;&lt;div&gt;I mentioned earlier that in Ruby, each class contains a method table (a hash table from method name to a piece of code that it binds) and method lookup proceeds up the class hierarchy. What I didn't tell you is that both the method tables and the hierarchy are mutable at runtime.&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;Hear that sound? It's the static-language fanatics' heads exploding. Or maybe the &quot;everything must be mutable always forever or you are a very bad monkey&quot; fanatics. Whatever.&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;Ruby is what it is, and the ability to mix in new method tables and patch existing method tables at runtime is part of what makes it attractive. Indeed, it's a huge part of what made frameworks like Rails possible, and also a huge reason why other more static (or more reasonable, depending on how you look at it) languages have had such difficulty replicating Rails' success.&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;Mine is not to reason why. Mine is but to do and die. I have to make it fast.&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;Proceeding from the naive implementation, there are certain truths we can hold at various times during execution:&lt;/div&gt;&lt;div&gt;&lt;ul&gt;&lt;li&gt;Most method table and hierarchy manipulation will happen early in execution. This was true when I started working on JRuby and it's largely true now, in no small part due to the fact that optmizing method tables and hierarchies that are wildly different all the time is really, really hard (so no implementer does it, so no user should do it). Before you say it: even prototype-based languages like Javascript that appear to have no fixed structure do indeed settle into a finite set of predictable, optimizable &quot;shapes&quot; which VMs like V8 can take advantage of.&lt;/li&gt;&lt;li&gt;When changes do happen, they only affect a limited set of observers. Specifically, only call sites (the places where you actually make calls in code) need to know about the changes, and even they only need to know about them if they've already made some decision based on the old structure.&lt;/li&gt;&lt;/ul&gt;&lt;div&gt;So we can assume method hierarchy structure is mostly static, and when it isn't there's only a limited set of cases where we care. How can we exploit that?&lt;/div&gt;&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;First, we implement what's called an &quot;inline cache&quot; at the call sites. In other words, every place where Ruby code makes a method call, we keep a slot in memory for the most recent method we looked up. In another quirk of fate, it turns out most calls are &quot;monomorphic&quot; (&quot;one shape&quot;) so caching more than one is &lt;b&gt;usually&lt;/b&gt;&amp;nbsp;not beneficial.&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;When we revisit the cache, we need to know we've still got the right method. Obviously it would be stupid to do a full search of the target object's class hierarchy all over again, so what we want is to simply be able to examine the type of the object and know we're ok to use the same method. In JRuby, this is (usually) done by assigning a unique serial number to every class in the system, and caching that serial number along with the method at the call site.&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;Oh, but wait...how do we know if the class or its ancestors have been modified?&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;A simple implementation would be to keep a single global serial number that gets spun every time any method table or class hierarchy anywhere in the system is modified. If we assume that those changes eventually stop, this is good enough; the system stabilizes, the global serial number never changes, and all our cached methods are safely tucked away for the machine to branch-predict and optimize to death. This is how Ruby 1.9.3 optimizes inline caches (and I believe Ruby 2.0 works the same way).&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;Unfortunately, our perfect world isn't quite so perfect. Methods do get defined at runtime, especially in Ruby where people often create one-off &quot;singleton methods&quot; that only redefine a couple methods for very localized use. We don't want such changes to blow all inline caches everywhere, do we?&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;Let's split up the serial number by method name. That way, if you are only redefining the &quot;foobar&quot; method on your singletons, only inline caches for &quot;foobar&quot; calls will be impacted. Much better! This is how Rubinius implements cache invalidation.&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;Unfortunately again, it turns out that the methods people override on singletons are very often common methods like &quot;hash&quot; or &quot;to_s&quot; or &quot;inspect&quot;, which means that a purely name-based invalidator still causes a large number of call sites to fail. Bummer.&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;In JRuby, we went through the above mechanisms and several others, finally settling on one that allows us to only ever invalidate the call sites that &lt;b&gt;actually&lt;/b&gt;&amp;nbsp;called a given method against a given type. And it's actually pretty simple: we spin the serial numbers on the individual classes, rather than in any global location.&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;Every Ruby class has one parent and zero or more children. The parent connection is obviously a hard link, since at various points during execution we need to be able to walk up the class hierarchy. In JRuby, we also add a &lt;b&gt;weak&lt;/b&gt;&amp;nbsp;link from parents to children, updated whenever the hierarchy changes. This allows changes anywhere in a class hiearchy to cascade down to all children, localizing changes to just that subhierarchy rather than inflicting its damage upon more global scopes.&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;Essentially, by actively invalidating down-hierarchy classes' serial numbers, we automatically know that matching serial numbers at call sites mean the cached method is 100% ok to use. We have reduced O(N) hierarchically-oriented hash table lookups to a single identity check. Victory!&lt;/div&gt;&lt;h2&gt;Instance Variables&lt;/h2&gt;&lt;div&gt;Optimizing method lookups actually turned out to be the easiest trick we had to pull. Instance variables defied optimization for a good while. Oddly enough, most Ruby implementations stumbled on a reasonably simple mechanism at the same time.&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;Ruby instance variables can be thought of as C++ or Java fields that only come into existence at runtime, when code actually starts using them. And where C++ and Java fields can be optimized right into the object's structure, Ruby instance variables have typically been implemented as a hash table that can grow and adapt to a running program as it runs.&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;Using a hash table for instance variables has some obvious issues:&lt;/div&gt;&lt;div&gt;&lt;ul&gt;&lt;li&gt;The aforementioned performance costs of using hashes&lt;/li&gt;&lt;li&gt;Space concerns; a collection of buckets already consumes space for some sort of table, and too many buckets means you are using &lt;b&gt;way&lt;/b&gt;&amp;nbsp;more space per object than you want&lt;/li&gt;&lt;/ul&gt;&lt;div&gt;At first you might think this problem can be tackled exactly the same way as method lookup, but you'd be wrong. What do we cache at the call site? It's not code we need to keep close to the point of use, it's the steps necessary to reach a point in a given object where a value is stored (ok, that could be considered code...just bear with me for a minute).&lt;/div&gt;&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;There are, however, truths we can exploit in this case as well.&lt;/div&gt;&lt;div&gt;&lt;ul&gt;&lt;li&gt;A given class of objects will generally reference a small, finite number of variable names during the lifetime of a given program.&lt;/li&gt;&lt;li&gt;If a variable is accessed once, it is very likely to be accessed again.&lt;/li&gt;&lt;li&gt;The set of variables used by a particular class of objects is largely unique to that class of objects.&lt;/li&gt;&lt;li&gt;The majority of the variables ever to be accessed can be determined by inspecting the code contained in that class and its superclasses.&lt;/li&gt;&lt;/ul&gt;&lt;div&gt;This gives us a lot to work with. Since we can localize the set of variables to a given class, that means we can store something at the class level. How about the actual layout of the values in object instances of that class?&lt;/div&gt;&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;This is how most current implementations of Ruby actually work.&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;In JRuby, as instance variables are first assigned, we bump a counter on the class that indicates an offset into an instance variable table associated with instances of that class. Eventually, all variables have been encountered and that table and that counter stop changing. Future instances of those objects, then, know exactly how larger the table needs to be and which variables are located where.&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;Invalidation of a given instance variable &quot;call site&quot; is then once again a simple class identity check. If we have the same class in hand, we know the offset into the object is guaranteed to be the same, and therefore we can go straight in without doing any hash lookup whatsoever.&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;Rubinius does things a little differently here. Instead of tracking the offsets at runtime, the Rubinius VM will examine all code associated with a class and use that to make a guess about how many variables will be needed. It sets up a table on the class ahead of time for those statically-determined names, and allocates exactly as much space for the object's header + those variables in memory (as opposed to JRuby, where the object and its table are two separate objects). This allows Rubinius to pack those known variables into a tighter space without hopping through the extra dereference JRuby has, and in many cases, this can translate to faster access.&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;However, both cases have their failures. In JRuby's version, we pay the cost of a second object (an array of values) and a pointer dereference to reach it, even if we can cache the offset 100% successfully at the call site. This translates to larger memory footprints and somewhat slower access times. In Rubinius, variables that are dynamically allocated fall back on a simple hash table, so dynamically-generated (or dynamically-mutated) classes may end up accessing some values in a much slower way than others.&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;The quest for perfect Ruby instance variable tables continues, but at least we have the tools to almost completely eliminate hashes right now.&lt;/div&gt;&lt;h2&gt;Constants&lt;/h2&gt;&lt;div&gt;The last case I'm going to cover in depth is that of &quot;constant&quot; values in Ruby.&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;Constants are, as I mentioned earlier, stored on classes in another hash table. If that were their only means of access, they would be uninteresting; we could use exactly the same mechanism for caching them as we do for methods, since they'd follow the same structure and behavior (other than being somewhat more static than method tables). Unfortunately, that's not the case; constants are located based on both lexical and hierarchical searches.&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;In Ruby, if you define a class or module, all constants lexically contained in that type's enclosing scopes are also visible within the type. This makes it possible to define new lexically-scoped aliased for values that might otherwise be difficult to retrieve without walking a class hierarchy or requiring a parent/child relationship to make those aliases visible. It also defeats nearly all reasonable mechanisms for eliminating hash lookups.&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;When you access a constant in Ruby, the implementation must first search all lexically-enclosing scopes. Each scope has a type (class or module) associated, and we check that type (and not its parents) for the constant name in question. Failing that, we fall back on the current type's class hierarchy, searching all the way up to the root type. Obviously, this could be far more searching than even method lookup, and we want to eliminate it.&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;If we had all the space in the world and no need to worry about dangling references, using our down-hierarchy method table invalidation would actually work very well here. We'd simply add another hierarchy for invalidation: lexical scopes. In practice, however, this is not feasible (or at least I have not found a way to make it feasible) since there are &lt;b&gt;many times&lt;/b&gt;&amp;nbsp;more lexical scopes in a given system than there are types, and a large number of those scopes are transient; we'd be tracking thousands or tens of thousands of parent/child relationships weakly all over the codebase. Even worse, invalidation due to constant updates or hierarchy changes would have to proceed both down the class hierarchy and throughout all lexically-enclosing scopes in the entire system. Ouch!&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;The current state of the art for Ruby implementations is basically our good old global serial number. Change a constant anywhere in Ruby 1.9.3, Rubinius, or JRuby, and you have just caused all constant access sites to invalidate (or they'll invalidate next time they're encountered). Now this sounds bad, perhaps because I told you it was bad above for method caching. But remember that the majority of Ruby programmers advise and practice the art of keeping constants...constant. Most of the big-name Ruby folks would call it a bug if your code is continually assigning or reassigning constants at runtime; there are other structures you could be using that are better suited to mutation, they might say. And in general, most modern Ruby libraries and frameworks do keep constants constant.&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;I'll admit we could do better here, especially if the world changed such that mutating constants was considered proper and advisable. But until that happens, we have again managed to eliminate hash lookups by caching values based on a (hopefully rarely modified) global serial number.&lt;/div&gt;&lt;h2&gt;The Others&lt;/h2&gt;&lt;div&gt;I did not go into the others because the solutions are either simple or not particularly interesting.&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;Local variables in any sane language (flame on!) are statically determinable at parse/compile time (rather than being dynamically scoped or determined at runtime). In JRuby, Ruby 1.9.3, and Rubinius, local variables are in all cases a simple tuple of offset into an execution frame and some depth at which to find the appropriate frame in the case of closures.&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;Global variables are largely discouraged, and usually only accessed at boot time to prepare more locally-defined values (e.g. configuration or environment variable access). In JRuby, we have experimented with mechanisms to cache global variable accessor logic in a way similar to instance variable accessors, but it turned out to be so rarely useful that we never shipped it.&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;Ruby also has another type of variable called a &quot;class variable&quot;, which follows lookup rules almost identical to methods. We don't currently optimize these in JRuby, but it's on my to-do list.&lt;/div&gt;&lt;h2&gt;Final Words&lt;/h2&gt;&lt;div&gt;There are of course many other ways to avoid hash lookups, with probably the most robust and ambitious being code generation. Ruby developers, JIT compiler writers, and library authors have all used code generation to take what is a mostly-static lookup table and turn it into actually-static code. But you must be careful here to not fall into the trap of simply stuffing your hash logic into a switch table; you're still doing a calculation and some kind of indirection (memory dereference or code jump) to get to your target. Analyze the situation and figure out what immutable truths there are you can exploit, and you too can avoid the evils of hashes.&lt;/div&gt;&lt;/div&gt;</content>
		<author>
			<name>Charles Nutter</name>
			<email>noreply@blogger.com</email>
			<uri>http://blog.headius.com/</uri>
		</author>
		<source>
			<title type="html">Headius</title>
			<subtitle type="html">Helping the JVM Into the 21st Century</subtitle>
			<link rel="self" href="http://blog.headius.com/feeds/posts/default"/>
			<id>tag:blogger.com,1999:blog-4704664917418794835</id>
			<updated>2013-05-21T18:00:07+00:00</updated>
		</source>
	</entry>

	<entry xml:lang="en-US">
		<title type="html">Regular Expressions Cookbook</title>
		<link href="http://feeds.oreilly.com/~r/oreilly/ruby/~3/OPSAAJ84_5E/"/>
		<id>http://oreilly.com/catalog/9781449319434/</id>
		<updated>2012-08-27T20:38:38+00:00</updated>
		<content type="html">&lt;a href=&quot;http://oreilly.com/catalog/9781449319434/&quot;&gt;&lt;img src=&quot;http://covers.oreilly.com/images/9781449319434/bkt.gif&quot; /&gt;&lt;/a&gt;&lt;p&gt;Take the guesswork out of using regular expressions. With more than 140 practical recipes, this cookbook provides everything you need to solve a wide range of real-world problems. Novices will learn basic skills and tools, and programmers and experienced users will find a wealth of detail. Each recipe provides samples you can use right away.&lt;/p&gt;
	&lt;img src=&quot;http://feeds.feedburner.com/~r/oreilly/ruby/~4/OPSAAJ84_5E&quot; height=&quot;1&quot; width=&quot;1&quot; /&gt;</content>
		<author>
			<name>O'Reilly Media, Inc.</name>
			<uri>http://oreilly.com/ruby</uri>
		</author>
		<source>
			<title type="html">O'Reilly Media: Ruby and Rails</title>
			<subtitle type="html">A compilation of O'Reilly Media's information about the Ruby programming language from news, books, conferences, courses, community, and reports.</subtitle>
			<link rel="self" href="http://feeds.oreilly.com/oreilly/ruby"/>
			<id>http://oreilly.com/ruby</id>
			<updated>2013-04-24T22:00:26+00:00</updated>
			<rights type="html">Copyright O'Reilly Media, Inc.</rights>
		</source>
	</entry>

	<entry>
		<title type="html">Script to convert Google+ takeout into a single easy to use document</title>
		<link href="http://t-a-w.blogspot.com/2012/08/script-to-convert-google-takeout-into.html"/>
		<id>tag:blogger.com,1999:blog-27488238.post-4592098926813700629</id>
		<updated>2012-08-27T03:06:47+00:00</updated>
		<content type="html">&lt;div class=&quot;separator&quot;&gt;&lt;a href=&quot;http://4.bp.blogspot.com/-JAivAqkmyIY/UDq41qm7vcI/AAAAAAAABUE/M1wy416imuw/s1600/search_cat_by_zenera_from_flickr_cc-sa.jpg&quot; title=&quot;Search cat by zenera from flickr (CC-SA)&quot;&gt;&lt;img alt=&quot;Search cat by zenera from flickr (CC-SA)&quot; border=&quot;0&quot; height=&quot;529&quot; src=&quot;http://4.bp.blogspot.com/-JAivAqkmyIY/UDq41qm7vcI/AAAAAAAABUE/M1wy416imuw/s640/search_cat_by_zenera_from_flickr_cc-sa.jpg&quot; width=&quot;640&quot; /&gt;&lt;/a&gt;&lt;/div&gt;&lt;br /&gt;&lt;a href=&quot;https://www.google.com/takeout/&quot;&gt;Google+&lt;/a&gt; did many things wrong like their retarded and discriminatory real name policy, but one surprising thing they did right that almost everybody else gets wrong was making it easy to export all your data using &lt;a href=&quot;https://www.google.com/takeout/&quot;&gt;Google Takeout&lt;/a&gt;.&lt;br /&gt;&lt;br /&gt;Unfortunately Google+ posts from Takeout (and pretty much everything else from Takeout) are pretty hard to use directly, but we're all hackers, so it's not a big deal to reformat them, and at least this one time it doesn't involve breaking any Terms of Service or working around any rate limiters, captchas, and other such nonsense just to get your own data.&lt;br /&gt;&lt;br /&gt;I wrote a script to process Takeout archive into a single easy to search HTML document. Since it's pretty short, I put it in &lt;a href=&quot;https://github.com/taw/unix-utilities&quot;&gt;unix-utilities repository on github&lt;/a&gt;&amp;nbsp;(the one &lt;a href=&quot;http://t-a-w.blogspot.com/2012/07/collection-of-small-unix-utilities.html&quot;&gt;I wrote about earlier&lt;/a&gt;) instead of making a new repository for it.&lt;br /&gt;&lt;br /&gt;It's very easy to use (Stream/ directory is how it's packed in Takeout .zip):&lt;br /&gt;&lt;pre&gt;&lt;code&gt;process_gplus_takeout Stream/ output.html&lt;/code&gt;&lt;/pre&gt;It removes everything except actual content and attachments, and sorts entries by date. If you want to include different things or filter them, it should be pretty easy to &lt;a href=&quot;https://github.com/taw/unix-utilities/blob/master/bin/process_gplus_takeout&quot;&gt;modify the script&lt;/a&gt;.&lt;br /&gt;&lt;br /&gt;It's even a reasonable example of how to use Hpricot to mass-process a lot of HTML documents if that's a new thing to you.&lt;br /&gt;&lt;br /&gt;About the only hard part is not arranging computations in a way that doesn't load DOM of every single HTML file in memory simultaneously, but extracts them one by one instead, and frees DOM in between. It probably doesn't even matter in this case, since it's just a few MBs of HTML, so even all DOMs will fit in memory together, but it's a good practice in general.</content>
		<author>
			<name>taw</name>
			<email>noreply@blogger.com</email>
			<uri>http://t-a-w.blogspot.com/search/label/ruby</uri>
		</author>
		<source>
			<title type="html">taw's blog</title>
			<subtitle type="html">The best kittens, technology, and video games blog in the world.</subtitle>
			<link rel="self" href="http://t-a-w.blogspot.com/feeds/posts/default/-/ruby?orderby=published"/>
			<id>tag:blogger.com,1999:blog-27488238</id>
			<updated>2013-05-21T08:00:35+00:00</updated>
		</source>
	</entry>

</feed>
