How Robots are Changing the Way we Sell

I had a talk about how AI is changing marketing and sales at Inbound 2016. The slides are available at Since the slides are not self-explanatory, I decided to write this companion post. “No humans should perform slave work. It is not interesting, it is tiring and the payment is low. All work than can […]

dict issues in pyspark

I was running some Spark jobs that showed odd results. The output had complex fields that showed up with null values for fields that should always have a value: { “year”: null, “name”: “John Smith”, “age”: null } This puzzled me. I tried hardcoding all those values and setting them once by setting the field to this […]

Self-Preserving Artificial Intelligence

I’m sure most people have heard of the dilemma of whether to design self-driving cars to reduce the number of deaths or to protect their driver. To those of you who haven’t, picture this; you are sitting in your cars which is driving along in a partially blind curve. The car discovers a crowd of people in […]

Named Entities

Here’s an introduction to named entities, named entity recognition (NER), and named entity disambiguation (entity linking). There is also information about how this is useful for Companybook. I originally held this presentation for a Data Science Meetup in Oslo. It’s aimed at data scientists.

Pig lovers meet TOP

Have you ever needed to get the top n items for a key in Pig? For instance the most popular three items in each country for an online store? You could always solve this the hard way by calculating a threshold per country and then filter on that threshold. This is neither to write or execute. What you […]

Crash course in Erlang

This is a summary of a talk I held Monday May 14 2012 at an XP Meetup in Trondheim. It is meant as a teaser for listeners to play with Erlang themselves. First, some basic concepts. Erlang has a form of constant called atom that is defined on first use. They are typically used as […]

Iterating over joins in Pig

Apache Pig is a fantastic language for processing data. It is sometimes incredibly annoying, but it beats the hell out of writing a ton of map reduces and chaining them together. When iterating over joins, an issue that I know that I’m not the only one having ran into is referencing data after a join […]

Hadoop Status Reporting from Ruby

Hadoop Map-Reduce is a great tool for analyzing and processing large amount of data. There are a few things one needs to keep in mind when working with Hadoop. This is the simple solution to one possibly annoying problem. Hadoop expects reducers to emit something regularly. If a reducer runs for a long time without […]

Case statement pitfall when migrating to Ruby 1.9.2

Note that the pitfall is limited to MRI (standard Ruby) version 1.9.2. MRI 1.9.3, JRuby and Rubinius does not have this behavior. I have been using Rubinius 2.0 to run machine learning experiments with libsvm lately. When running in Ruby 1.9.2, I noticed that my classifier always classified all samples as negative. I though this […]