Self-Preserving Artificial Intelligence

I’m sure most people have heard of the dilemma of whether to design self-driving cars to reduce the number of deaths or to protect their driver. To those of you who haven’t, picture this; you are sitting in your cars which is driving along in a partially blind curve. The car discovers a crowd of people in […]

Named Entities

Here’s an introduction to named entities, named entity recognition (NER), and named entity disambiguation (entity linking). There is also information about how this is useful for Companybook. I originally held this presentation for a Data Science Meetup in Oslo. It’s aimed at data scientists.

Pig lovers meet TOP

Have you ever needed to get the top n items for a key in Pig? For instance the most popular three items in each country for an online store? You could always solve this the hard way by calculating a threshold per country and then filter on that threshold. This is neither to write or execute. What you […]

Crash course in Erlang

This is a summary of a talk I held Monday May 14 2012 at an XP Meetup in Trondheim. It is meant as a teaser for listeners to play with Erlang themselves. First, some basic concepts. Erlang has a form of constant called atom that is defined on first use. They are typically used as […]

Iterating over joins in Pig

Apache Pig is a fantastic language for processing data. It is sometimes incredibly annoying, but it beats the hell out of writing a ton of map reduces and chaining them together. When iterating over joins, an issue that I know that I’m not the only one having ran into is referencing data after a join […]

Hadoop Status Reporting from Ruby

Hadoop Map-Reduce is a great tool for analyzing and processing large amount of data. There are a few things one needs to keep in mind when working with Hadoop. This is the simple solution to one possibly annoying problem. Hadoop expects reducers to emit something regularly. If a reducer runs for a long time without […]