<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	xmlns:georss="http://www.georss.org/georss" xmlns:geo="http://www.w3.org/2003/01/geo/wgs84_pos#" xmlns:media="http://search.yahoo.com/mrss/"
	>

<channel>
	<title>The Knut Hellan Blog &#187; Uncategorized</title>
	<atom:link href="http://knuthellan.com/category/uncategorized/feed/" rel="self" type="application/rss+xml" />
	<link>http://knuthellan.com</link>
	<description>General geekyness and starting a company</description>
	<lastBuildDate>Sun, 20 May 2012 20:20:42 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.com/</generator>
<cloud domain='knuthellan.com' port='80' path='/?rsscloud=notify' registerProcedure='' protocol='http-post' />
<image>
		<url>http://s2.wp.com/i/buttonw-com.png</url>
		<title>The Knut Hellan Blog &#187; Uncategorized</title>
		<link>http://knuthellan.com</link>
	</image>
	<atom:link rel="search" type="application/opensearchdescription+xml" href="http://knuthellan.com/osd.xml" title="The Knut Hellan Blog" />
	<atom:link rel='hub' href='http://knuthellan.com/?pushpress=hub'/>
		<item>
		<title>Iterating over joins in Pig</title>
		<link>http://knuthellan.com/2012/04/20/iterating-over-joins-in-pig/</link>
		<comments>http://knuthellan.com/2012/04/20/iterating-over-joins-in-pig/#comments</comments>
		<pubDate>Fri, 20 Apr 2012 14:22:45 +0000</pubDate>
		<dc:creator>Knut O. Hellan</dc:creator>
				<category><![CDATA[Uncategorized]]></category>

		<guid isPermaLink="false">http://knuthellan.com/?p=372</guid>
		<description><![CDATA[Apache Pig is a fantastic language for processing data. It is sometimes incredibly annoying, but it beats the hell out of writing a ton of map reduces and chaining them together. When iterating over joins, an issue that I know that I&#8217;m not the only one having ran into is referencing data after a join [...]<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=knuthellan.com&#038;blog=6371883&#038;post=372&#038;subd=knuthellan&#038;ref=&#038;feed=1" width="1" height="1" />]]></description>
			<content:encoded><![CDATA[<p>Apache Pig is a fantastic language for processing data. It is sometimes incredibly annoying, but it beats the hell out of writing a ton of map reduces and chaining them together. When iterating over joins, an issue that I know that I&#8217;m not the only one having ran into is referencing data after a join in pig.</p>
<p>Normally, you access fields using the dereference operators . or # depending on the data type. The period symbol, . is used for tuples and bags, i.e. tuple.field0, tuple.field1, bag.field0, bag.field1. Maps are dereference with a hash, #, i.e. map#&#8217;field0&#8242;, map#&#8217;field1&#8242;.</p>
<p>This does not work after a join. The expected iteration after a JOIN:</p>
<p><code><br />
joined = JOIN list0 BY key, list1 BY key;<br />
purified = FOREACH joined GENERATE list0.key;<br />
</code></p>
<p>This will fail with the obscure error: &#8220;scalar has more than one row in the output&#8221;. This error message is a known problem is and there is a <a href="https://issues.apache.org/jira/browse/PIG-2134" title="ticket for this in Pig's Jira"></a>. As can be seen from the ticket, the correct way to iterate over the join is by using the relation operator, :: instead of the dereferencing operators like this:</p>
<p><code><br />
joined = JOIN list0 BY key, list1 BY key;<br />
purified = FOREACH joined GENERATE list0::key;<br />
</code></p>
<p>If you fall for the temptation of skipping the name of the list to get the field from like this:<br />
<code><br />
joined = JOIN list0 BY key, list1 BY key;<br />
purified = FOREACH joined GENERATE key;<br />
</code></p>
<p>You will get the more informative message: &#8220;Found more than one match: list0::key, list1::key&#8221;.</p>
<p>What you are really doing after a join is addressing columns in relations. For users, addressing columns in a relation with a period would be easier, but using :: might make the underlying code easier to understand.</p>
<br />  <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/knuthellan.wordpress.com/372/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/knuthellan.wordpress.com/372/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godelicious/knuthellan.wordpress.com/372/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/delicious/knuthellan.wordpress.com/372/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gofacebook/knuthellan.wordpress.com/372/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/facebook/knuthellan.wordpress.com/372/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gotwitter/knuthellan.wordpress.com/372/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/twitter/knuthellan.wordpress.com/372/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gostumble/knuthellan.wordpress.com/372/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/stumble/knuthellan.wordpress.com/372/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godigg/knuthellan.wordpress.com/372/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/digg/knuthellan.wordpress.com/372/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/goreddit/knuthellan.wordpress.com/372/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/reddit/knuthellan.wordpress.com/372/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=knuthellan.com&#038;blog=6371883&#038;post=372&#038;subd=knuthellan&#038;ref=&#038;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://knuthellan.com/2012/04/20/iterating-over-joins-in-pig/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
	
		<media:content url="" medium="image">
			<media:title type="html">knuthellan</media:title>
		</media:content>
	</item>
		<item>
		<title>A fistful of links</title>
		<link>http://knuthellan.com/2009/06/10/a-fistful-of-links/</link>
		<comments>http://knuthellan.com/2009/06/10/a-fistful-of-links/#comments</comments>
		<pubDate>Wed, 10 Jun 2009 13:08:34 +0000</pubDate>
		<dc:creator>Knut O. Hellan</dc:creator>
				<category><![CDATA[Uncategorized]]></category>

		<guid isPermaLink="false">http://knuthellan.wordpress.com/?p=122</guid>
		<description><![CDATA[Here&#8217;s a list of links I tweeted over the last week or so (as requested by @dmpetersson). I considered shamelessly removing the RT info, but came to my senses and carried them over. Clearly shows that the majority of my links are retweets, but then again more people might discover the interesting people who originally [...]<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=knuthellan.com&#038;blog=6371883&#038;post=122&#038;subd=knuthellan&#038;ref=&#038;feed=1" width="1" height="1" />]]></description>
			<content:encoded><![CDATA[<p>Here&#8217;s a list of links I tweeted over the last week or so (as requested by @dmpetersson). I considered shamelessly removing the RT info, but came to my senses and carried them over. Clearly shows that the majority of my links are retweets, but then again more people might discover the interesting people who originally tweeted the links and follow them. I will start posting these digests weekly going forwards.</p>
<ul>
<li><a href="http://www.engadget.com/2009/06/09/fedora-11-packs-a-next-gen-file-system-faster-boot-times-all-t/">Fedora 11 is here</a> &#8211; looks like a nice upgrade from Fedora 10 and remains a good alternative to Ubuntu.</li>
<li><a href="http://jan.prima.de/~jan/plok/archives/175-Benchmarks-You-are-Doing-it-Wrong.html">Benchmarks: You are Doing it Wrong</a> &#8211; interesting read about benchmarking web applications</li>
<li>RT @Venture_Capital <a href="http://dealbook.blogs.nytimes.com/2009/06/09/do-young-venture-capitalists-have-an-edge/">Do Young Venture Capitalists Have an Edge?</a> &#8211; Experience is good, but so is openness to new ideas</li>
<li>RT @DanekS <a href="http://www.getfreepublicitynow.com/?p=310">How to write a tech press release</a> &#8211; don&#8217;t show off. Journalists and editors may not understand geek</li>
<li>RT @tferriss: Find out which interests Google associates with your cookie &#8211; <a href="http://www.google.com/ads/preferences/">who does Google think you are?</a></li>
<li>RT @ruby_news: <a href="http://www.infoq.com/news/2009/05/rack10">Rack 1.0</a> Released </li>
<li>RT @tferriss: <a href="http://www.usabilitypost.com/2008/09/29/a-guide-to-choosing-colors-for-your-brand/">How to choose colors for your brand</a> &#8211; great samples:  via @gmc</li>
</ul>
<br />  <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/knuthellan.wordpress.com/122/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/knuthellan.wordpress.com/122/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godelicious/knuthellan.wordpress.com/122/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/delicious/knuthellan.wordpress.com/122/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gofacebook/knuthellan.wordpress.com/122/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/facebook/knuthellan.wordpress.com/122/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gotwitter/knuthellan.wordpress.com/122/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/twitter/knuthellan.wordpress.com/122/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gostumble/knuthellan.wordpress.com/122/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/stumble/knuthellan.wordpress.com/122/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godigg/knuthellan.wordpress.com/122/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/digg/knuthellan.wordpress.com/122/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/goreddit/knuthellan.wordpress.com/122/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/reddit/knuthellan.wordpress.com/122/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=knuthellan.com&#038;blog=6371883&#038;post=122&#038;subd=knuthellan&#038;ref=&#038;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://knuthellan.com/2009/06/10/a-fistful-of-links/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
	
		<media:content url="" medium="image">
			<media:title type="html">knuthellan</media:title>
		</media:content>
	</item>
		<item>
		<title>CouchDB to the rescue</title>
		<link>http://knuthellan.com/2009/02/17/couchdb-to-the-rescue/</link>
		<comments>http://knuthellan.com/2009/02/17/couchdb-to-the-rescue/#comments</comments>
		<pubDate>Tue, 17 Feb 2009 12:40:03 +0000</pubDate>
		<dc:creator>Knut O. Hellan</dc:creator>
				<category><![CDATA[Uncategorized]]></category>
		<category><![CDATA[Storage]]></category>

		<guid isPermaLink="false">http://knuthellan.wordpress.com/?p=14</guid>
		<description><![CDATA[Got CouchDB installed on my Fedora box. This thing is sweet. Working with a RESTful JSON/HTTP storage system is so much easier than old-fashioned SQL databases. If I were to store users and a lists of stuff per users where this stuff could be shared among more users in a realtional db, I would create [...]<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=knuthellan.com&#038;blog=6371883&#038;post=14&#038;subd=knuthellan&#038;ref=&#038;feed=1" width="1" height="1" />]]></description>
			<content:encoded><![CDATA[<p>Got CouchDB installed on my Fedora box. This thing is sweet. Working with a RESTful JSON/HTTP storage system is so much easier than old-fashioned SQL databases. If I were to store users and a lists of stuff per users where this stuff could be shared among more users in a realtional db, I would create a table for the users indexed on userid, a table for the stuff indexed by stuffid and a table of userid to stuffid relations. Then of course I would need lots of boilerplate code to work with the database.</p>
<p>In CouchDB, I would have a database of users where a user doc is stored under /users/&lt;userid&gt;. The stuff would be stored as documents under /stuff/&lt;stuffid&gt; and the relations would be stored either in the user document or in a separate database /userstuff/&lt;userid&gt;/. An important difference is that no matter if the relation information was stored in the user database or in a separate database, the document stored at that location would have to be replaced whenever stuff is added or removed for a user. This makes me prefer putting the relations in a separate database rather than keep updating the user document.</p>
<p>It was hard to believe that you could get a simpler interface than pure HTTP, but still I had to test <a href="http://github.com/jchris/couchrest/tree/master">CouchRest</a> by Chris Anderson. This made working with CouchDB even easier. With all this stuff in blaze, all development is a breeze since I don&#8217;t have to spend time on the nitty-gritty repetitive low-level boilerplate stuff. </p>
<br />  <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/knuthellan.wordpress.com/14/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/knuthellan.wordpress.com/14/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godelicious/knuthellan.wordpress.com/14/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/delicious/knuthellan.wordpress.com/14/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gofacebook/knuthellan.wordpress.com/14/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/facebook/knuthellan.wordpress.com/14/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gotwitter/knuthellan.wordpress.com/14/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/twitter/knuthellan.wordpress.com/14/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gostumble/knuthellan.wordpress.com/14/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/stumble/knuthellan.wordpress.com/14/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godigg/knuthellan.wordpress.com/14/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/digg/knuthellan.wordpress.com/14/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/goreddit/knuthellan.wordpress.com/14/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/reddit/knuthellan.wordpress.com/14/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=knuthellan.com&#038;blog=6371883&#038;post=14&#038;subd=knuthellan&#038;ref=&#038;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://knuthellan.com/2009/02/17/couchdb-to-the-rescue/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
	
		<media:content url="" medium="image">
			<media:title type="html">knuthellan</media:title>
		</media:content>
	</item>
	</channel>
</rss>
