I’m sure most people have heard of the dilemma of whether to design self-driving cars to reduce the number of deaths or to protect their driver. To those of you who haven’t, picture this; you are sitting in your cars which is driving along in a partially blind curve. The car discovers a crowd of people in […]
Here’s an introduction to named entities, named entity recognition (NER), and named entity disambiguation (entity linking). There is also information about how this is useful for Companybook. I originally held this presentation for a Data Science Meetup in Oslo. It’s aimed at data scientists.
Have you ever needed to get the top n items for a key in Pig? For instance the most popular three items in each country for an online store? You could always solve this the hard way by calculating a threshold per country and then filter on that threshold. This is neither to write or execute. What you […]
This is a summary of a talk I held Monday May 14 2012 at an XP Meetup in Trondheim. It is meant as a teaser for listeners to play with Erlang themselves. First, some basic concepts. Erlang has a form of constant called atom that is defined on first use. They are typically used as […]
Apache Pig is a fantastic language for processing data. It is sometimes incredibly annoying, but it beats the hell out of writing a ton of map reduces and chaining them together. When iterating over joins, an issue that I know that I’m not the only one having ran into is referencing data after a join […]
Hadoop Map-Reduce is a great tool for analyzing and processing large amount of data. There are a few things one needs to keep in mind when working with Hadoop. This is the simple solution to one possibly annoying problem. Hadoop expects reducers to emit something regularly. If a reducer runs for a long time without […]
Note that the pitfall is limited to MRI (standard Ruby) version 1.9.2. MRI 1.9.3, JRuby and Rubinius does not have this behavior. I have been using Rubinius 2.0 to run machine learning experiments with libsvm lately. When running in Ruby 1.9.2, I noticed that my classifier always classified all samples as negative. I though this […]