First Week

First week of full-time work for the startup over and what did I accomplish? I have dumped all of English Wikipedia into a CouchDB database and discovered that mapping over this amount of data on a single server is not possible. I find this weird so I need to do some more investigation, but that can wait since I don’t really need that. Lookups are instant as they should be and that’s good enough for me.

I have also gathered the Norwegian Wikipedia and generated a Norwegian web corpus. Wikipedia should statistically be representative for the web, but the first version of the corpus was non-normalized. I had two options for stemming since I wanted to do everything in Ruby. Wrap the available C porter stemmer or write one from scratch according to the rules described at Norwegian Snowball page I chose the latter and since there is an exhaustive test set available with some 25000 or so word-stem pairs, I am confident my Ruby implementation is correct. Since Ruby 1.8 doesn’t come with utf-8 support and 1.9.1 does, I had to figure out the simplest way of getting a minimum amount of international character handling. Turned out I only needed a utf-8 length method and word.scan(/./u).length takes care of that. Not efficient, but this is a temporary solution until I upgrade to 1.9.1. I should probably make my Norwegian Ruby stemmer publicly available. For now, if anyone needs it, give me a ping.

What has taken most of my time is administrative work. Cleaning up the business plan, gathering market details and initial contact with potential customers. The result of the latter was revision of the business plan to reflect positive established contact with two possible pilot customers. I hadn’t revised my resume in a while so I did that as well. Sadly, my phone call to the local representative for governmental funding grants wasn’t too promising, but I sent him the business plan for review. Maybe he’ll see that we are on to something. Also handed the business plan over to the biz dev company we are working with for review and polishing.

4 thoughts on “First Week

  1. I will not reveal what we are doing for a while, but we expect to be live with the first customer in Q3. This is of course a very rough estimate and I will provide updates before that time. I assume that what you want to know is what we are doing and I will reveal something later in Q2. Suffice to say that we are two founders and we have a combined 18 years of experience from the search engine business and while we are not working directly on search, it is related to search technology.

  2. Ok. Whenever. I’m just saying. I’d like to write about it. Thanks.

  3. I’ll give you some information when we are ready for a little less stealth. It would be really cool to have you write about what we are doing 🙂

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.