To Hadoop or not
Hadoop is the hammer of big data. This slide deck covers the history and basics of Hadoop as well as some alternatives.
Hadoop is the hammer of big data. This slide deck covers the history and basics of Hadoop as well as some alternatives.
I had a talk about how AI is changing marketing and sales at Inbound 2016. The slides are available at http://content.inbound.com/content/ai-how-robots-are-changing-the-way-we-sell. Since the slides are not self-explanatory, I decided to write this companion post. “No humans should perform slave work. It is not interesting, it is tiring and the payment is low. All work than can […]
I was running some Spark jobs that showed odd results. The output had complex fields that showed up with null values for fields that should always have a value: { “year”: null, “name”: “John Smith”, “age”: null } This puzzled me. I tried hardcoding all those values and setting them once by setting the field to this […]
I’m sure most people have heard of the dilemma of whether to design self-driving cars to reduce the number of deaths or to protect their driver. To those of you who haven’t, picture this; you are sitting in your cars which is driving along in a partially blind curve. The car discovers a crowd of people in […]
Here’s an introduction to named entities, named entity recognition (NER), and named entity disambiguation (entity linking). There is also information about how this is useful for Companybook. I originally held this presentation for a Data Science Meetup in Oslo. It’s aimed at data scientists.
Have you ever needed to get the top n items for a key in Pig? For instance the most popular three items in each country for an online store? You could always solve this the hard way by calculating a threshold per country and then filter on that threshold. This is neither to write or execute. What you […]
This is a summary of a talk I held Monday May 14 2012 at an XP Meetup in Trondheim. It is meant as a teaser for listeners to play with Erlang themselves. First, some basic concepts. Erlang has a form of constant called atom that is defined on first use. They are typically used as […]
Apache Pig is a fantastic language for processing data. It is sometimes incredibly annoying, but it beats the hell out of writing a ton of map reduces and chaining them together. When iterating over joins, an issue that I know that I’m not the only one having ran into is referencing data after a join […]
Hadoop Map-Reduce is a great tool for analyzing and processing large amount of data. There are a few things one needs to keep in mind when working with Hadoop. This is the simple solution to one possibly annoying problem. Hadoop expects reducers to emit something regularly. If a reducer runs for a long time without […]
Note that the pitfall is limited to MRI (standard Ruby) version 1.9.2. MRI 1.9.3, JRuby and Rubinius does not have this behavior. I have been using Rubinius 2.0 to run machine learning experiments with libsvm lately. When running in Ruby 1.9.2, I noticed that my classifier always classified all samples as negative. I though this […]