January 25

Why I think you should try dvorak

Posted by mtoledo . Filed under productivity | 8 Comments

There has, once again, been a lot of discussion about the whole qwerty vs dvorak debate. Most of the times I read discussions about it on Hacker News and  other sites, I really feel they are missing the point a lot of times.

So I wanted to talk about why I moved to dvorak, and my whole assessment about the whole dvorak vs qwerty debate.

So, specially if you are a qwerty ‘typer’, read on. I really hope this will help you in making the right call between sticking to qwerty or trying dvorak.

A bit of history about me

I used to get ~80wpm scores on TypeRacer on qwerty, with some 100wpm ones. I’d also attribute some of that slowness to my international keyboard (and the fact all text on type racer is in english), so overall I was a quite fast qwerty typer.

I decided to learn dvorak around a year from now. So, when I had a 3 week vacation I thought it would be the perfect opportunity to learn it, since I could immerse myself in it during this time, instead of having to shift back to qwerty during work.

I started training dvorak on the 2nd week of my vacation. After these 2 weeks I still couldn’t type as fast as I could type qwerty, but I was already pretty sure I’d never go back to qwerty. About a month after I started, I could type well enough so it wouldn’t really stay in the way, although still slower than I typed qwerty.

I’ve been on dvorak for 6 months now. I type better on dvorak as I ever did on qwerty, and I don’t plan to go back to qwerty.

What are all the ‘ambi-typers’ typing at?

That’s a pretty interesting question. Of all the people that type qwerty AND dvorak, what are they using? I looked at all those discussions and it seems all the people that can type both layouts choose to type dvorak when given both options. Most of the people that choose qwerty can only type qwerty.

Therefore, I suppose people are not really asking ‘is dvorak really better than qwerty?’, because really, the only people that can answer this honestly are the ones that can type both layouts, and they seem to say ‘yes, dvorak is better’.

People who only type qwerty are really asking ‘if I already type qwerty, and I type pretty well, is it worth it to learn dvorak?’ – I think dvorak typers have not really done a very good job helping qwerty typers answer this question. And mixing this with the ‘which is better’ only makes matter worse.

Programming’s dirtiest little secret: “typing”

Steve Yegge has posted an article called “Programming’s dirtiest little secret” in which he elaborates in multiple ways about the very important role typing has for the programmer. In fact, he says “programmers who don’t touch type fit a profile”, meaning they “have to make sacrifices in order to sustain their productivity”. If you haven’t read that article yet, go ahead an read it. Unless you really understand the role that typing plays in our profession, whatever I say here won’t be of any use.

Now, I’m pretty sure Steve Yegge types in qwerty, so you can even say quoting him to defend my any point about learning dvorak is useless. But I have a broader point to make, really: Typing plays a very broad and critical role in programming.

Teach yourself programming in 10 years. And a keyboard layout?

Given its critical role in the life of a programmer, it really baffles me, just as it does Steve Yegge, how little time and emphasis programmers put to improving their typing skills.

Peter Norvig has written an article called “Teach yourself programming in 10 years”, in which he talks about how it takes around 10 years to master a field in a world class level, and criticizing the “Teach yourself C in 3 days” kind of books. We all know learning to program is a lifelong process, and that learning a programming language can take years.

Not such for a keyboard layout. Like I said, after only 2 weeks on dvorak I was already sure it was worth it and that I’d never go back to qwerty. 2 weeks! Compared to the time that it takes to learn other skills in our profession, that’s a really short time. I’d go as far as saying that it wouldn’t take any dvorak typer more than 3 months to be able to make a very objective assessment about if dvorak or qwerty would be better for him.

But no one does it. Even if most of the people that put up to learning dvorak until they are as good on it as on qwerty choose dvorak as their layout, and even though typing overall plays a very important role in the programmer’s work, and even if the time investment is relatively low compared to learning other programming skills, no one does it.

To me, this feels exactly like one other situation. Now, before telling you which, I know you’ll probably not agree with me. Though, it will add a lot more weight to your disagreeing arguments if you can type with both layouts.

So here it is: To me, programmers that only type qwerty are like programmers that only program in one programming language. It’s impossible for us to think about how better or worse another programming language is without knowing it first. But then and again you’ll see some people saying “I already program in *language*. Its turing complete, so there’s nothing you can do in any other language that I can’t do with mine. So, there’s no need to learn a new language.” Except people that say this about programming languages are usually frowned upon, since most good programmers *do* know multiple programming languages, and know that’s false.

Except we’re stuck in a world where most programmers only know one keyboard layout.

Why do I think dvorak is better than qwerty

So, I now type as good on dvorak as I ever typed on qwerty. Even if going back to qwerty is within very close reach, I’m not going to use qwerty again. Why’s that? Why, having the option of using any of these layouts, I’m sticking to dvorak?

  • It’s more comfortable: Now, I want you to read as I said it: its *more* comfortable, not just comfortable. Now, people who only type qwerty can’t really notice how awkward it is to type it until they can type a more comfortable layout. It’s like driving an uncomfortable car when you are in college: You don’t really notice it as being that uncomfortable at all. But when you start driving a more comfortable car, it becomes immediately evident. The problem is that it takes a few months before you can “drive” dvorak, so you need go through that to realize how uncomfortable qwerty is.
  • It’s easier to touchtype: One thing that will happen with qwerty typers is that many of them didn’t have formal training in typing, started learning by themselves, so most of them will have different vices which are very hard to get rid of. Learning dvorak with the knowledge of another layout already makes it very easy to learn it without vices, compared to when you learned qwerty, which was before you knew how to type.
  • It’s easier to program on: On dvorak, you have a bunch of very often used symbols very close to reach. All the following symbols, very often used in many programming languages, are in places where letters q, w, e and z are used in qwerty: ‘ , . ” < > ; :   – There’s also a layout for programmers called “Programmer Dvorak” which is optimized for programmers. Even if I eventually ditched it for the standard dvorak, its evident from the moment you start using it that it makes a lot of the things you do daily way easier.
  • You could type faster: Now, I won’t say any such things as ‘dvorak is faster than qwerty’ with the same certainty that I’ll say about being more comfortable. But really, chances are that a keyboard layout that has all the keys you use most often close to home row and that didn’t choose the top row based on the letters that make up the word “typewriter” will make it easier for you to type faster in it. I type faster in it than I typed in qwerty after only 6 months of practice, and the world’s fastest typer also uses dvorak.

The negatives of using dvorak

Now, dvorak is no silver bullet. Besides, there are other keyboard layouts around besides qwerty and dvorak, which are just the two most famous.

In any case, I suppose most of the drawbacks of using dvorak are related to its lack of mainstream adoption, so should apply to any other layout, even the theoretical “best layout that could ever be made particularly for you”.

  • You need to switch back when using other peoples’ computers: It really sucks to have to switch someone else’s machine settings just so you can type dvorak in his computer. It can also be an issue if you’re pairing using the same keyboard, even though in that case its way more justfiable to setup the os with a simple way to switch keyboard layouts. Most modern os’s support that. There’s even a USB to dvorak converter you can carry around! But that’s all pretty awkward. In any case, it can make it way harder when someone calls you in for help and you have to use their machine as is.
  • Nearly no availability of keyboards with dvorak layout: Buying a dvorak keyboard is like buying a game for the mac. There are some places that sell them, but sometimes not the one you want, and you have to go through extra hoops. It’s kind of bad when you have a mac book pro’s keyboard backlight in a bunch of keys that don’t make sense to you. Its not a big issue really beause, when you learn to dvorak, you’ll learn to touchtype, and won’t be looking at the keyboard anyway.
  • Having to relearn all your hotkeys: This actually sounds much harder than it really is, but maybe it was just easier for me than for someone else. In any case, I recommend that if you want to learn dvorak, you relearn all your hotkeys rather than try to hack your way into using the new layout with old hotkeys. Also, you’ll notice that some hotkeys (like vi’s “hjkl” navigation command) depend deeply on keyboard layout, so those might need adjustments. It really didn’t affect me much on emacs: none of my keybinding changes are layout caused, but your mileage may vary.

Will I forget qwerty if I learn dvorak?

Finally, this question seems afflict a lot of prospective switchers. Do you forget qwerty when you learn dvorak?

To speak for myself: I’ve certainly forgotten qwerty. Even though its still kind of in my muscle memory, and thinking that I’d probably go back to my previous stage in a few days, I definitely can’t touch type qwerty right now.

Then again, some people have claimed that they haven’t forgotten qwerty. On Steve Yegge’s post he even mentions a friend that typed 120wpm on qwerty and claimed to type even faster on dvorak. So your mileage may vary as well. I suppose it depends on how often you use both layouts. Since I haven’t used qwerty pretty much ever since I switched, I believe not using qwerty will cause you to forget it, but using it sporadically will keep it alive for you.

So, should I try it?

I hope that in those ramblings I have made a point for you to try it. Explore it like you do with an unknown programming language, just for the prospective benefit of that exploration. Think about all the things you didn’t know were bad until you tried something better.

Think about the importance of typing for your profession, and how that’s a healthy way for you to find some room for improvement in it.

Finally, ask people that have switched about their experiences. Why they think their switches were worth it. I’m pretty sure they should have even more info than I had in my short experience, and that could inspire you to decide to try it for yourself.

The cost of learning it is not that high, and people that have paid it do seem to think its worth it. Remember, if you don’t like it, you can always not use it. The worst that could happen is that the effort you put on learning it actually helps you correct some of your qwerty vices.

August 5

Beware of Rails Optimistic Locking and MySQL

Posted by mtoledo . Filed under rails, ruby | 7 Comments

There’s a caveat when using rails optimistic locking inside callbacks in a fault tolerant way with mysql’s default settings. Of course that’s a lot of things and sounds like a very specific scenario, but its not that much. Let me break it down:

Rails optimistic locking

Rails will automatically create locks around a record being updated in parallel if you insert a column named ‘lock_version’ which defaults to 0 in it.

The way it does that is by incrementing the ‘lock_version’ column on every update, and checking this value after each update. If the value read on the time of the update is not the same as the local copy you updated, it throws an ‘ActiveRecord::StaleObjectError’.

This is best exemplified by the following snippet:


>> user1 = User.first
=> #<User id: 1, user_id: 1, coins: 5247248, lock_version: 0>
>> user2 = User.first
=> #<User id: 1, user_id: 1, coins: 5247248, lock_version: 0>
>> user1.coins = 10
=> 10
>> user1.save
=> true
>> user2.coins = 20
=> 20
>> user2.save
ActiveRecord::StaleObjectError: Attempted to update a stale object

Rails callbacks and transactions

So, now that we know how optimistic locking works, lets take a quick look at callbacks and transactions.

In rails, there’s a multitude of callbacks (16 to be precise) that you can use to add hooks on the lifecycle of your object. You can check them out for yourself on the console:


>> ActiveRecord::Callbacks::CALLBACKS
=> ["after_find", "after_initialize", "before_save", "after_save", "before_create", "after_create", "before_update", "after_update", "before_validation", "after_validation", "before_validation_on_create", "after_validation_on_create", "before_validation_on_update", "after_validation_on_update", "before_destroy", "after_destroy"]
>> ActiveRecord::Callbacks::CALLBACKS.size
=> 16

The interesting part for us is that the whole callback chain (that is, the method being hooked to (create, destroy, update) and all their callbacks (before_create, after_create, etc)) happens in the same database transaction. This means that rails opens a transaction before executing the first callback, and commits the transaction on the database only after all callbacks have been called without being rolled back.

Creating fault tolerant callback for your rails model

There are many different ways to make sure a user’s request won’t fail in the event that a ‘StaleObjectError’ is thrown in a optimistically locked object that’s been updated in parallel. One such approach for fields that are incremented / decremented, like the coins from our user below, is just trying again.


# class User < ActiveRecord::Base

def add_coins(coins)
  begin
    self.coins += coins
    self.save!
  rescue ActiveRecord::StaleObjectError
    self.reload
    retry
  end
end

In the case above, if the user happens to have coins be added simultaneously by two different processes, the second will clash and throw a ‘StaleObjectError’. Though, it’ll reload the user, try to increment the coins again and finish successfully (assuming it doesn’t clash with another process again, in which case it will just retry until its the sole process updating the row. Remember this is *optimistic locking* afterall, so this shouldn’t happen so often).

So in the above case we can ensure multiple processes update the user record simultaneously without having to return an error for the user that read the stale object.

The infinite loop

All is not well though. An infinite loop will arise in case the code above is used inside a callback, if you are on a default mysql installation:


#class Monster < ActiveRecord::Base

before_destroy :add_coins_to_user

def add_coins_to_user
  user = find_user_which_dealt_last_blow
  user.add_coins(self.loot)
end

In the example above, the ‘Monster’ class will add the amount of coins to the user that it has when its destroyed by it. The ‘add_coins_to_user’ will be called from the ‘before_destroy’ callback. Although it would seem this should work perfectly, and infinite loop will ensue in case two monsters are killed in parallel by the same user. To understand why, we need to revisit transaction isolation levels.

Transaction isolation levels in MySQL

There are four isolation levels available in most commonly in use databases. Those are:

  • READ UNCOMMITTED – A transaction can read data from other transactions that have not been committed yet (dirty reads).
  • READ COMMITTED – A transaction can read data from other transactions after those transactions have been committed.
  • REPEATABLE READ – The data available for the transaction to read will be the same as it was when the transaction began even if other transactions change it.
  • SERIALIZABLE – Places a lock on every read so that no transaction can read a row previously read by other transactions.

Although a detailed description of the use of each isolation level is material enough for another post, the important fact here is:

“Most other databases use the READ COMMITTED transaction level, but mysql uses the REPEATABLE READ transaction level by default.”

What this means is that, if two transactions read the same row containing the same version number (say, the version 0), even though the first transaction has saved a new version of the row (version 1), the second transaction will continue to read version 0 until its committed or rolled back. Associated with the retry clause on the rescue, the code will get stuck in the submit, read version 0, rescue, retry loop.

This means that, the algorithm proposed for fault tolerance above is only suited for the READ COMMITTED transaction level. In this level, after transaction 1 changes the version number, transaction 2 will read the version 1 on reload, and all will proceed as expected.

Changing MySQL’s default transaction level

Now, changing the default transaction level is not the only way of fixing this particular issue and might not even be the best way on your situation. You could reeengineer your code so the retry is called outside of a callback (and therefore outside of a transaction) or you could return false on the before filter (therefore rolling back the transaction) and just retrying outside of the callback (where a new transaction will read the new value), etc.

Though, I’ll show you how to update MySQL’s transaction level, so that it behaves like most other databases do by default. I hope I don’t have to tell you that if you attempt this in a codebase that expects the REPEATABLE READ transaction level, that code will break.

The way to change MySQL’s transaction level is my editing it’s config file, either on your user or the global config file. Here are where mine are located.


~ $ locate my.cnf
/etc/mysql/my.cnf
/home/mtoledo/.my.cnf

It is possible that your version of mysql has a bug where the user version of the file is supposed to be “my.cnf” instead of “.my.cnf”. If that’s the case, workaround it accordingly (or better yet, update your database).

The change is made in the [mysqld] section of the file. You only need to add the differences if its a local file, or add it inside the correct part of the file in the global config. I’ll show you both so you can pick whichever serves you best.


~ $ cat .my.cnf
[mysqld]
transaction_isolation = READ_COMMITED

~ $ grep -B 10 -A 4 transaction-isolation /etc/mysql/my.cnf
[mysqld]
#
# * Basic Settings
#

#
# * IMPORTANT
#   If you make changes to these settings and your system uses apparmor, you may
#   also need to also adjust /etc/apparmor.d/usr.sbin.mysqld.
#
transaction-isolation = READ-COMMITTED
user		= mysql
pid-file	= /var/run/mysqld/mysqld.pid
socket		= /var/run/mysqld/mysqld.sock
port		= 3306

With those in place, the code will behave as expected, and no infinite loop will occur.

Finally, if you want to make sure your application has the right transaction isolation level it needs to work, you can add a file in initializers with the following script:


ActiveRecord::Base.connection.execute("select @@global.tx_isolation, @@tx_isolation").all_hashes.first.each do |var, val|
  if val != 'READ-COMMITTED'
    puts "#{var} != 'READ-COMMITTED'\n\tPlease add 'transaction-isolation = READ-COMMITTED' to your my.cnf"
    exit
  end
end

This will print a warning message in case the transaction level is not the one expected so that you can act accordingly.

Thanks a lot to coderrr for the help figuring this out, and for the warning script above.

July 1

Watch out for using ActiveRecord’s update_attributes on dirty objects

Posted by mtoledo . Filed under rails, ruby | 2 Comments

I’ve recently found out a very odd particularity about how ActiveRecord behaves when relationship properties through the update_attributes method in ActiveRecord::Base. In fact due to its simple implementation, its actually a behavior of any saving of relationships on dirty records.


# in rails ActiveRecord::Base (base.rb)

# Updates all the attributes from the passed-in Hash and saves the record. If the object is invalid, the saving will
# fail and false will be returned.
def update_attributes(attributes)
  self.attributes = attributes
  save
end

As we can see, update_attributes just updates the attributes property on the model and calls save. The unintuitive behaviour will happen when dealing with an association’s foreign key attribute and its auto generated association method.

To exemplify, I’ll show you this behavior by using the ‘user_id’ foreign key and the ‘user’ association on the ‘task’ model, in a ‘belongs_to’ association:


>> t = Task.new
=> #<Task id: nil, name: nil, category: nil, created_at: nil, updated_at: nil, user_id: nil>
>> t.user = User.find(1)
=> #<User id: 1, login: "mtoledo", ... >
# the association sets the user_id to 1
>> t.user_id
=> 1
# update_attributes the user_id to 3
>> t.update_attributes :user_id => 3, :name => 'Test'
=> true
# we might expect user_id to be 3 now!!
>> t.user_id
=> 1
>> t.save
=> true
>> t.user.id
=> 1

As can be seen from the example above, even though I called ‘update_attribute’ with a user_id of 3, a call to user_id yields 1, the previous value, which you would expect to be overwritten. Saving the object saves it to user with id 1, not 3.

Notice how this behavior is not exclusive of update_attributes and can be reproduced by direct calls to save, where user and user_id conflict. Though, you’ll only realize this after your object has been saved.


>> t = Task.new :name => 'Test'
=> #<Task id: nil, name: "Test", category: nil, created_at: nil, updated_at: nil, user_id: nil>
# set user to that of id 1
>> t.user = User.find(1)
=> #<User id: 1, login: "mtoledo", ..
# set user_id to 3
>> t.user_id = 3
=> 3
# queries to the user_id yield 3 before saving the object
>> t.user_id
=> 3
>> t.save
=> true
# after saving it, though, they yield 1
>> t.user_id
=> 1

Trying to figure out where did rails set the user_id from 3 back to 1, I found out first that it was something orthogonal to the actual save method. When the record is new, save eventually calls the private method ‘create’, which simply adds quotes around the properties of your model and calls insert into to your table, returning the new found id for the row.


# Creates a record with values matching those of the instance attributes
# and returns its id.
def create
  if self.id.nil? && connection.prefetch_primary_key?(self.class.table_name)
   self.id = connection.next_sequence_value(self.class.sequence_name)
 end

  quoted_attributes = attributes_with_quotes

  statement = if quoted_attributes.empty?
    connection.empty_insert_statement(self.class.table_name)
  else
    "INSERT INTO #{self.class.quoted_table_name} " +
    "(#{quoted_column_names.join(', ')}) " +
    "VALUES(#{quoted_attributes.values.join(', ')})"
  end

  self.id = connection.insert(statement, "#{self.class.name} Create",
  self.class.primary_key, self.id, self.class.sequence_name)

  @new_record = false
  id
end

No sign of any replacement of values. Trying to call the ‘attributes_with_quotes’ method he uses for his query on my recently created object shows the following:


>> t.name = 'Test'
=> "Test"
>> t.user = User.find(1)
=> #<User id: 1, login: "mtoledo", ...
>> t.user_id = 3
=> 3
>> t.send(:attributes_with_quotes)
=> {"name"=>"'Test'", "category"=>"NULL", "updated_at"=>"NULL", "user_id"=>"3", "created_at"=>"NULL"}

Notice user_id is 3. Since he uses this value on the INSERT INTO statement, its odd that it contrasts with my rails log’s id of 1:


-- Task Create (3.2ms)
INSERT INTO `tasks` (`name`, `category`, `updated_at`, `user_id`, `created_at`)
VALUES('Test', NULL, '2009-07-01 21:52:52', 1, '2009-07-01 21:52:52')

Notice though that ‘user_id’ is not the only difference. The ‘created_at’ and ‘updated_at’ attributes are also set on the logs, but nowhere to be seen on create. This means some hooks might be at play here.

Digging into ActiveRecord’s declaration of ‘belongs_to’ method (which is the one used to declare the user association above) in associations.rb, we find out what happens to the association. If the object is already saved, it assigns its id’s value to the foreign key:

# rails: associations.rb
def belongs_to(association_id, options = {})
  # ... omitted: some STI stuff
  else
   association_accessor_methods(reflection, BelongsToAssociation)
   association_constructor_method(:build,  reflection, BelongsToAssociation)
   association_constructor_method(:create, reflection, BelongsToAssociation)

    method_name = "belongs_to_before_save_for_#{reflection.name}".to_sym
  define_method(method_name) do
    association = instance_variable_get(ivar) if instance_variable_defined?(ivar)

    if !association.nil?
      if association.new_record?
        association.save(true)
      end

      if association.updated?
        self[reflection.primary_key_name] = association.id # <== there you go
      end
    end
  end
  before_save method_name
end

Adding this dynamically declared method as a before_save hook, rails guarantees that whatever was set for the belongs_to association will override the foreign key on save.

Given that we may favor manipulating the foreign key (user_id) directly than the association (user) in order to prevent additional database queries, but also that we might not have been the only one involved in the object’s life cycle, and also given rails’ attitude of not throwing errors when things might go wrong, its important to keep this behavior in mind.

May 26

Quick Tip – Using dual monitor with compiz on Ubuntu with Intel 965GM

Posted by mtoledo . Filed under linux | 1 Comment

One of the most frustrating things of using Ubuntu with my dual big monitors is that I couldn’t run compiz with it. Since I have an Intel 965GM card, due to some mesa driver bug, it won’t let you have a desktop bigger than 2048 x 2048 (summing up all monitors).

I had struggled with this issue multiple times in the past, but eventually decided to just leave it and use metacity instead.

Though, Ludovico Cavedon has posted a fix to that bug on his third party repository. All you need to do is add those repositories to your /etc/apt/sources.list and update. Compiz should then work with textures sizes bigger than 2048 x 2048 on Intel X3100 965 GM card :-)

May 21

Using scriptaculous’ Sortable and InPlaceEditor at the same time

Posted by mtoledo . Filed under javascript, ruby | 1 Comment

For a couple of days I’ve been trying to implement a scriptaculous’ InPlaceEditor on one of my pet project’s scriptaculous’ Sortable list.

First I had no luck with the easy to find InPlaceEditor rails helpers that you easily find on google: The traditional in_place_editing plugin required controller changes and didn’t support rest too well. Nakajima’s “better_edit_in_place” was written in javascript rather than rails, so it would take some hacking to get it working with the authenticity token. I was starting have thoughts about implementing my own.

Luckily, I found in a forum a suggestion to use simplificator’s ‘inplace’ plugin (which I’ve forked here). It worked wonders for me, and aside from the small verbosiness on the helper declaration, it worked pretty well out of the box.

Though, the issue was that, now that I had a sortable list and an in place editor, everytime I dragged an item to reorder, it entered edit mode on mouse release. Of course that was underirable behavior, but how to fix it?

On the github’s page on scriptaculous there’s a fix which is ~20 LOC long. I didn’t like that approach. I thought a simple “Event.stop()” would make it work if added to the right place, but that right place was nowhere to be found.

Finally, I decided to change the editing from click to double clicking, so that wouldn’t be triggered by sorting! That proved to be quite easy:


Ajax.InPlaceEditor.Listeners.dblclick = Ajax.InPlaceEditor.Listeners.click
delete Ajax.InPlaceEditor.Listeners.click

Of course that monkey hack needs to be called before you declare your InPlaceEditor. If you are using the ‘inplace’ plugin, you might want to show the user how he can interact with your list items on mouseover:


<%= editable_content_tag :span, item, 'name', true, nil, {}, {:clickToEditText => 'drag to reorder and doubleclick to edit'} %>

That’s pretty much all there’s to it, but as a solution to this issue it’s as simple as it gets. If you know of a better solution for this issue, specially if it involves editing with a single click, tell us at the comments.

May 19

Using find and xargs on directories with spaces

Posted by mtoledo . Filed under linux | 3 Comments

Quick tip:

I recently moved my music library from the windows partition into the Linux one.

One of the things I had to do was remove those pesky ‘desktop.ini’ files windows creates for the cd thumbails.

Ends up using xargs as I wanted at the beginning didn’t work correctly because of the spaces in the directory names:


find . -name desktop.ini  | xargs  rm

This won’t work because it will use spaces as separators. The correct way of doing it is:


find . -name desktop.ini -print0 | xargs -0 rm

where “print0″ and “-0″ tells find and xargs to use the NUL ASCII character rather than spaces to separate arguments.

Thanks to Not So Frequently Asked Questions, where I found the solution :)

May 11

“git add . ” mistakenly deletes a directory which wasn’t touched. Bug?

Posted by mtoledo . Filed under linux | No Comments

I was having some weird issues with git on my pet project:

Basically, git tried to automatically delete my ‘restful_authentication’ plugin everytime I did a “git add .”, even if my working tree was clean.

For instance, after a git pull (fully synchronized with origin):

~/projects/bushi2do master $ git pull
Already up-to-date.
~/projects/bushi2do master $ git status
# On branch master
nothing to commit (working directory clean)
~/projects/bushi2do master $ ls vendor/plugins/
acts_as_list  restful_authentication

Notice my ‘restful_authentication’ is there, untouched.

Now, if I ‘git add .’ :

~/projects/bushi2do master $ git add .
~/projects/bushi2do master $ git status
# On branch master
# Changes to be committed:
#   (use "git reset HEAD <file>..." to unstage)
#
#       new file:   vendor/plugins/restful_authentication
#       deleted:    vendor/plugins/restful_authentication/.gitignore
#       deleted:    vendor/plugins/restful_authentication/CHANGELOG
#       deleted:    vendor/plugins/restful_authentication/README.textile
#       deleted:    vendor/plugins/restful_authentication/Rakefile
#       deleted:    vendor/plugins/restful_authentication/TODO
# ...

Notice how he said “new file:” and then “deleted: “. The “deleted: ” entries occur for every file on my restful_authentication plugin and they are already staged. Curiously, if I look into the directory, its still there, untouched as I expected:

# ...
#       deleted:    vendor/plugins/restful_authentication/rails/init.rb
#       deleted:    vendor/plugins/restful_authentication/restful-authentication.gemspec
#       deleted:    vendor/plugins/restful_authentication/tasks/auth.rake
#
~/projects/bushi2do master $ ls vendor/plugins/
acts_as_list  restful_authentication

Curiously enough, if I proceed with the commit and then push to remote, my local copy will still work correctly, because my restful authentication plugin will still be here. But my production server will be broken, because it will be removed from the remote branch.

Oddly enough also, even if I “git reset –hard HEAD” and then “git add .”, the files are still deleted:

~/projects/bushi2do master $ git reset --hard HEAD
HEAD is now at d2d3364 Removing redundant route
~/projects/bushi2do master $ git status
# On branch master
nothing to commit (working directory clean)
~/projects/bushi2do master $ git add .
~/projects/bushi2do master $ git status
# On branch master
# Changes to be committed:
#   (use "git reset HEAD <file>..." to unstage)
#
#       new file:   vendor/plugins/restful_authentication
#       deleted:    vendor/plugins/restful_authentication/.gitignore
#       deleted:    vendor/plugins/restful_authentication/CHANGELOG
#       deleted:    vendor/plugins/restful_authentication/README.textile
#       deleted:    vendor/plugins/restful_authentication/Rakefile
#       deleted:    vendor/plugins/restful_authentication/TODO
# ...

Very weird indeed, and I don’t know how to explain this behavior.

This was only fixed by removing the version control on the “restful_authentication” directory:

~/projects/bushi2do master $ rm -rf vendor/plugins/restful_authentication/.git
~/projects/bushi2do master $ git add .
~/projects/bushi2do master $ git status
# On branch master
nothing to commit (working directory clean)

The downside to this fix is that you can no longer ‘git pull’ on the plugin’s directory to grab the updates.

Anyone suffered the same problem using git + restful_authentication or some other plugin? Any other known workaround?

Anyway, I hope this helps people to save some time if they stumble on the same issue.

April 14

Why I don’t object to Array#sum

Posted by mtoledo . Filed under ruby | 7 Comments

Raganwald has written a very elaborate piece about his “objection to Array#sum“. Since I patircularly like Array#sum, I thought I might weight in on my own blog.

First, I think that his piece has 2 different objections:

1) Array shouldn’t return true for a “responds_to? :sum” call if it has contents that can’t be summed ( like [1, 2, 'three'])

2) Some other object should be responsible for providing the “sum” service, as well as being responsible for inspecting the contents of the array for making sure they can be summed

Even though in his article he doesn make this distinction, I believe they are different issues.

So, my opinion to the first objection: “Should ‘responds_to? :sum’ always return true?”

Of course the answer to this is “yes”, even if ’sum’ was declared somewhere else instead of on the Array. The reason is that, as long as this function is not implemented in the array object, there’s no way to inspect its contents unless we pass a parameter to the ‘responds_to?’ method. And there’s no way to do that. The only parameter that ‘responds_to?’ receives is the method name. You can’t pass the method argument’s values to have a conditional return. The class either always responds to that method, or it doesn’t.

The same is true when you are writing interfaces for a statically typed language like Java. Interfaces only define 4 things: The method name, and the type of parameters it receives, the type of object it returns, and which errors does it throw. There’s no way to know if the method is supported for some particular set of parameters. The only thing you can know is what kind of errors the method throws if something goes wrong.

That’s exactly how an interface written in java would be if Array implemented a ’sum’ method. If you ignored generics, there’s no way you can conditionally implement that method on the Array class based on its contents. And there’s no way you can conditionally implement that interface in any other object either. There’s not even a way to support the method ’sum()’ on only ‘Array<Integer>’.

My point is, there’s no way you can inspect parameters or state to conditionally implement an interface. The interface is always tied to the class, not to the parameters of the method or to its state, and it will throw the declared errors if the situation arises.

The second quetion is: “Well, should then array#sum be moved to some other object or a standalone function which actually guarantees the semantic of the sum method?”

Well, I do think there’s a valid alternative to this, but I don’t think it’s neither having a standalone function ’sum()’  nor having some other ‘ArrayUtils#sum()’.

The reason why I don’t like any of the former alternatives is that there’s really no semantic gains or distinction by using any of the above compared to the old Array#sum. If you can call Array#sum in situations which are invalid and therefore it shouldn’t be supported, then there’s nothing that ArrayUtils#sum helps you with. If calling Array#sum doesn’t really tell you if that’s a semantically valid operation, calling sum() in any other fashion doesn’t really help either.

Besides, those alternatives create a pattern in code which I don’t like, which is the pre-chaning of calls. So, instead of:


['1', '2', '3', nil, '5'].compact.map(&amp;:to_i).map {|i| i * 10}.sum

You now have:


sum(map(map(compact([1, '2', '3', nil, '5']), lambda {|i| i * 10}, lambda &amp;:to_i )))

Or something like this (didnt really check this code for errors). Now, you have to read code from right to left. And if you argue that some things like map belongs to it and compact don’t, than you’ll have to read both from left to right and from right to left.

Particularly, having to read code from right to left sometimes is what I don’t like about lisp, and being able to chain a lot of calls and to read code from left to right is one of the reasons I like ruby so much.

So, given that exporting the method ’sum’ to another class still doesn’t help me tell if I can use it with a correct semantic meaning on an array any more than calling array#sum, I think it only brings the drawbacks above, without any of the benefits.

Now, I do think there could be a valid alternative to this: If had a NumberArray class. Then, I could check for the types of all the methods which change the state of the Array to make sure that whenever #sum was called, it would have a correct semantic meaning, since I’m enforcing its semantics both on my class name and on all my methods. And then I could remove the #sum method from the Array class and move it to the NumberArray class.

But then, that’s a design decision as any other when you have the following question: “Should I create a new type for providing this service? Or should I throw an error if it hasn’t been called correctly?”. The former has the benefit of having more semantically correct classes and services provided on them. The later has the benefit of not numerous numbers of types, and of being able to easily handle cases where both types are needed.

For instance, if you have Users that can delete another user, you could create a Admin specialization that implements the ‘delete_user’ service, so that you don’t have to throw errors when normal users try to delete users, which really doesn’t make sense. But if a user can delete another user if he has the same email address as the other user, then you’ll run into trouble, since you can’t inspect state to implement the Admin interface conditionally.

Being able to throw errors conditionally according to the state, rather than counting on object types and interfaces, can yield code that’s easier to read  than moving functions to some other objects, without being any less semantic. And it also allows for code that’s more flexible and extensible than moving code to specialized classes that carry more semantics. That’s really why I like Array#sum the way it is.

April 9

On caching expensive case conditions

Posted by mtoledo . Filed under ruby | No Comments

Ruby has a pretty expressive and flexible case statement. It actually differs from most other language’s ’switch’ statements in that it auto-breaks on each condition instead of falling through.

// C

switch (x) {
case 1:
doSomething();
break;
case 2:
doSomethingElse();
break;
}

# ruby

case x
when 1: do_something
when 2: do_something_else
end

I do think not having to break was the right decision, but not everyone agrees and it does have some shortcomings like the one posted here: . Many times, shortcomings in ruby will happen in operators you can’t override to bend to your will.

One such case is when you want to cache expensive operations in a case statement. Suppose you’ll have a case statement similar to the above, but with very expensive operations.

case x
when 1: do_something_that_takes_a_long_time
when 2: do_something_else_that_takes_a_long_time
end

Now, assume that that case statement is going to be called very frequently, and that the expensive methods’ response will continue to be the same through time. There are more options for doing that than it seems like at a glance.

The first option is just adding an @@array which stores the expensive operation, and look it up on your case statement.

@@lookup_array = [do_something_that_takes_a_long_time,  do_something_else_that_takes_a_long_time]
case x
when 1: @@lookup_array[0]
when 2: @@lookup_array[1]
end

This drawback to this approach is that you segregate the condition from the behavior. The solution would be to lazily initialize those values. Then they’ll be displayed just beside their condition.

@@lookup_hash = {}
case x
when 1: @@lookup_hash[1] ||= do_something_that_takes_a_long_time
when 2: @@lookup_hash[2] ||= do_something_else_that_takes_a_long_time
end

Of course one of the drawbacks here is that sometimes you don’t want to lazily initialize an expensive operation, since this means the first consumer of each conditions waits an intolerable time. Another drawback is that if your conditions are not numbers, you can’t use a hash for them, and doing lookups become convoluted. To fix those you’d have to do something like this.

@@lookup_array = [do_something_that_takes_a_long_time, nil]
@@conditions = [1..3, 4..6]
case x
when conditions[0]: @@lookup_array[0]
when conditions[1]: @@lookup_array[1] ||= do_something_else_that_takes_a_long_time
end

In this case I did both options: lazily initializing it so its more readable, or starting it up in the beginning but without any easy way to get from the condition to the operation that is cached for it. None of the options really satisfies and, and of course the fact that the condition is also stored someplace else makes this approach nothing more than something curious to come up with as it really can’t be used in a real situation.

So, what could be a good solution for this is to just use a hash instead of a case statement altogether, and then doing the lookup through it.

@@lookup_hash = {
1 => do_something_that_takes_a_long_time,
2 => do_something_else_that_takes_a_long_time
}

@@lookup_hash[x]

The code above totally replaces the case statement of the first example, while keeping the condition in a nice readable way close to the operation and at the same time caching the expensive operations without lazily initializing them, just like we wanted.

But what about the situations where the lookup can’t be done through a hash key, like if it was a range rather than a number?

@@lookup_hash = {
1..3 => do_something_that_takes_a_long_time
4..6 => do_something_else_that_takes_a_long_time
}

@@lookup_hash.detect {|k,v| break v  if k === x } # thx coderrr ;)

With the snippet above, which you can encapsulate in a different function, a monkey patch or whateve you prefer, you can basically simulate the functionality of the case statement’s ‘when’ clause, but with the flexibility of not only caching your expensive operation, but also being able to do all sorts of manipulations you can do on a hash (that you can’t do on a case statement), like adding conditions in runtime or recalculating those conditions.

Of course this has the drawback of you not being able to do overlapping conditions, since hashes in ruby 1.8 don’t respect insertion order. This will be fixed in ruby 1.9 but if you really mustdo this in ruby 1.8 you can use a bi-dimensional array.

@@lookup_array = [
[1..3, do_something_that_takes_a_long_time],
[4..6, do_something_else_that_takes_a_long_time]
]

@@lookup_array.detect {|k,v| break v if k === x }

This approach also allows you to easily change your manipulation of the hash so you do a fallthrough like a C style switch statement, which is impossible to do with a vanilla ruby case statement.

Doing this sort of metaprogramming in ruby is quite easy in some situations, quite a bit harder on others. Luckly, ruby is powerful enough that you don’t have to resort to those things most of the time, an so that you can resort to them if you need to.

Update: In the last example, you can actually use a |k, v| param to the block even if its being yielded an array, and it will automatically map each of the indexes of the array to each parameter.

April 8

Using ‘map’ effectively on ruby Hashes

Posted by mtoledo . Filed under ruby | 2 Comments

Ruby is a very powerful language, and the methods available to manipulate its 2 main data structures, Array and Hash, are really good. Though, some of them are really obscure, and for some other manipulations you are on your own. This happens specially with the Hash class.

To me, this is probably because although both Array and Hash are Enumerables, Enumerable’s design seems to be really made to fit Array, and for  Hash manipulation some things on enumerable look like afterthoughts, driving you back to Array.

One such method is the ‘map’ (or ‘collect’) method. If you want to turn an array of strings into numbers you can:


%w{1 2 3 4 5 6}.map {|string| string.to_i}

# => [1, 2, 3, 4, 5, 6]

Since you are just calling one method and without parameters, you can even use the shorthand version:


%w{1 2 3 4 5 6}.map &:to_i

# => [1, 2, 3, 4, 5, 6]

That’s pretty concise, and can be applied to a variety of situations. But things start to get ugly when you want to do similar things to a Hash’s values.


{:a => '1', :b => '2', :c => '3'}.map {|key, value| value.to_i}

# => [2, 3, 1]

Well, yeah, that won’t work. I would need to somehow return the key and value pairs, not just the values.


{:a => '1', :b => '2', :c => '3'}.map {|key, value| [key,value.to_i]}

# => [[:b, 2], [:c, 3], [:a, 1]]

Well, this structure is far from what I wanted, since I can’t do key lookups on a bidimensional array the way I can with hashes.

To fix that, you can use Hash’s [] class method with ‘flatten’. But it already stops being too consise.


Hash[*{:a => '1', :b => '2', :c => '3'}.map {|key, value|
[key,value.to_i]}.flatten]

# => {:b=>2, :c=>3, :a=>1}

This approach also has another drawback: It won’t work in case your values are arrays, since ‘flatten’  will make them flat as well.


Hash[*{:a => ['1', '2'], :b => ['3'], :c => ['4']}.map {|key, value| [key,value.map(&:to_i)]}.flatten]

# ArgumentError: odd number of arguments for Hash

Of course by now all the consiseness has already gone away, but sometimes you still need a solution which works.

The solution is using Hash#merge. Most of the time we use ‘merge’ to extend our Hash, but remember it actually overrides the value when you merge it with the same key.


h = {:b=>["3"], :c=>["4"], :a=>["1", "2"]}

# => {:b=>["3"], :c=>["4"], :a=>["1", "2"]}

h.each {|k, v| h.merge!({k => v.map(&:to_i)})}

# => {:b=>[3], :c=>[4], :a=>[1, 2]}

Or course one drawback to this approach is that you have to create a separate variable for your hash, so you can reference it inside the merge. This is probably not a problem since it’s got so convoluted it’s not even fun anymore.

Anyway, if you want to, you can do it with an inline hash with the help from inject.


{:a=>["1", "2"], :b=>["3"], :c=>["4"]}.map {|k, v| [k, v.map(&:to_i)]}
.inject({}) {|hash, array| hash[array[0]] = array[1]; hash}

# => {:b=>[3], :c=>[4], :a=>[1, 2]}

Of course you’ll probably spend some time trying to match all those brackets. :)

So yea, your best hope is to never have to apply a simple modifier call, as map, in your hash values, or at least hope they are not arrays.

Thanks to coderr for going with me through some of this stuff. Check his blog for some great stuff ruby stuff.

Let me know in the comments if there’s an easier way to do what I propose here. :)