April 8
Using ‘map’ effectively on ruby Hashes
Ruby is a very powerful language, and the methods available to manipulate its 2 main data structures, Array and Hash, are really good. Though, some of them are really obscure, and for some other manipulations you are on your own. This happens specially with the Hash class.
To me, this is probably because although both Array and Hash are Enumerables, Enumerable’s design seems to be really made to fit Array, and for Hash manipulation some things on enumerable look like afterthoughts, driving you back to Array.
One such method is the ‘map’ (or ‘collect’) method. If you want to turn an array of strings into numbers you can:
%w{1 2 3 4 5 6}.map {|string| string.to_i}
# => [1, 2, 3, 4, 5, 6]
Since you are just calling one method and without parameters, you can even use the shorthand version:
%w{1 2 3 4 5 6}.map &:to_i
# => [1, 2, 3, 4, 5, 6]
That’s pretty concise, and can be applied to a variety of situations. But things start to get ugly when you want to do similar things to a Hash’s values.
{:a => '1', :b => '2', :c => '3'}.map {|key, value| value.to_i}
# => [2, 3, 1]
Well, yeah, that won’t work. I would need to somehow return the key and value pairs, not just the values.
{:a => '1', :b => '2', :c => '3'}.map {|key, value| [key,value.to_i]}
# => [[:b, 2], [:c, 3], [:a, 1]]
Well, this structure is far from what I wanted, since I can’t do key lookups on a bidimensional array the way I can with hashes.
To fix that, you can use Hash’s [] class method with ‘flatten’. But it already stops being too consise.
Hash[*{:a => '1', :b => '2', :c => '3'}.map {|key, value|
[key,value.to_i]}.flatten]
# => {:b=>2, :c=>3, :a=>1}
This approach also has another drawback: It won’t work in case your values are arrays, since ‘flatten’ will make them flat as well.
Hash[*{:a => ['1', '2'], :b => ['3'], :c => ['4']}.map {|key, value| [key,value.map(&:to_i)]}.flatten]
# ArgumentError: odd number of arguments for Hash
Of course by now all the consiseness has already gone away, but sometimes you still need a solution which works.
The solution is using Hash#merge. Most of the time we use ‘merge’ to extend our Hash, but remember it actually overrides the value when you merge it with the same key.
h = {:b=>["3"], :c=>["4"], :a=>["1", "2"]}
# => {:b=>["3"], :c=>["4"], :a=>["1", "2"]}
h.each {|k, v| h.merge!({k => v.map(&:to_i)})}
# => {:b=>[3], :c=>[4], :a=>[1, 2]}
Or course one drawback to this approach is that you have to create a separate variable for your hash, so you can reference it inside the merge. This is probably not a problem since it’s got so convoluted it’s not even fun anymore.
Anyway, if you want to, you can do it with an inline hash with the help from inject.
{:a=>["1", "2"], :b=>["3"], :c=>["4"]}.map {|k, v| [k, v.map(&:to_i)]}
.inject({}) {|hash, array| hash[array[0]] = array[1]; hash}
# => {:b=>[3], :c=>[4], :a=>[1, 2]}
Of course you’ll probably spend some time trying to match all those brackets. :)
So yea, your best hope is to never have to apply a simple modifier call, as map, in your hash values, or at least hope they are not arrays.
Thanks to coderr for going with me through some of this stuff. Check his blog for some great stuff ruby stuff.
Let me know in the comments if there’s an easier way to do what I propose here. :)
Array#flatten takes an argument.
>> Hash[*{:a => ['1', '2'], :b => ['3'], :c => ['4']}.map {|key, value| [key,value.map(&:to_i)]}.flatten(1)]
=> {:b=>[3], :a=>[1, 2], :c=>[4]}
And your final example, we can drop the awful awful inject(collection) pattern, which is almost always a sign you should be in the very least, splitting out into more operations, and using de-structuring and restructuring methods.
Hash[*{:a=>["1", "2"], :b=>["3"], :c=>["4"]}.map {|k, v| [k, v.map(&:to_i)]}.flatten(1)]
Hey raggi, :-)
Yea, it gets much easier with 1.9, but 1.8 doesn’t allow params on Array#flatten :-)
>> [].flatten 1
ArgumentError: wrong number of arguments (1 for 0)
from (irb):4:in `flatten’
As for the beauty of the implementation, I totally agree that’s absurdely ugly, and that this shouldn’t be used in any production code at all. My point was exactly that: since there’s no proper way to elegantly map hash keys or values (not even merge unless you do some manual conditions), maybe we should have an alternative or have Hash#map return a hash, instead of a bi-dimensional array, like it probably should.