Demo of my vocabulary builder v1.0

January 25th, 2009

 

I have a working version of my vocabulary builder. I’ve also learned how to do videos (which I can use to fulfill Jacob’s request to show how I did the view of my house in the snow)

 


January 6th, 2009

 

 
 
A couple of posts ago, I said that my next post would be about some of the technologlies I’ve had to learn to make progress on my vocabulary builder app. (BTW, I don’t really have a good name for it, so if you’ve got some suggestions, please share)
 
The way I’ll proceed is to in order that I’ve learned them. Some of it is pretty technical, but I think there’s a bunch in there that others can understand as well.
 
To recap a bit, my idea is to look at the content of a webpage and annotate it with a personalized vocabulary list. As a test case, I’ve been using a focus.de article which has 600 words total, ~300 of which are unique. Some of them are variations of the same word. (plurals, conjugations..) More on that later.
 
Before I get overly technical, I’ll list the buzzwords:
1) javascript (the main language of the web.)
2) DOM (the interface to the structure of a webpage.)
3) mysql (to store the master vocabulary list)
4) php (to enable external programs to ask for a definition)
5) firefox XUL (variant of html for things like menu items)
6) firefox infrastructure (read/write files, extension sidebars)
7) more html than I knew before.
 
 
My first inclination was to just have it be another webpage on my site. You go to the page, enter a web address, and hit go. It turns out there are a couple problems with this. The first is security. If you load a page from one domain, scripts on that page are not allowed to access data from another. This is enforced by all of the main browsers. There is an expeption to this, frames, but that doesn’t really help. While frames allow you to have content from different domains to be displayed on the same page, they are compartmentalized. A script in one frame is not allowed to interact with others.
 
A second thing I thought of was to artificially have all of the content come from my site. While the web browser is restricted to one domain per page, the server doesn’t have the same constraint; I can do whatever I want on my server. You could tell my server via a form what content you’re interested in, it would ask the other server and the annotated page is delivered in one piece to your browser.
 
I passed on this option because of bandwidth, performance, and ocpyright constraints. Some websites may only want their stuff to come from them. RSS aggregators like the google reader get around this by serving only the text in the site provided feed. The problem there, is that this text is often pretty useless. If you want to read the actual article, you’ll have to go to the website. This also doesn’t work with the movie subtitle idea I’m thinking about. Bandwidth and performance are issues because it would mean every annotated web access would be two jumps. You tell my site what you want to see and wait. My site would tell the other site what’s needed and wait. 
 
So I’m taking the Firefox extension idea.
 
Most website stuff is done in javascript, a language I did not speak, but I’m glad I’ve learned it. The basic syntax is like C. Functions are first class data, which is something I haven’t had available to me since college. It turns out perl does as well, but until recently, I only spoke an old dialect. Javascript also has closures. It doesn’t have classes, but the combination of function data and associative objects give something similar. It’s as much a "real programming language" as it get.
 
Now that I’ve decided where the program would sit, the next question becomes "where do I get my definitions?". An initial thought I had was to use one of the many translation sites out there. google translate is one possibiliy and it will probably be a part of my final solution. The problem is that it translates and I want definitions. There are other sites that will give a tranlation, but those don’t really work either, because the result is not just a definition, but an entire webpage. If I need to ask for 300 words, this won’t work.
 
So I poked around and found that there are a number of dictionaries available for download, some proprierary and some not. I’m currently using one from Bablyon.com, but I’ll probably need to find something more freeware. They have good quality though, so that’s where I am now. Because they use a proprietary format I had to poke around a bit to be able to convert to something I can load into a database.
 
Now that I have a dictionary, I have to be able to access it and some sort of SQL database seemed appropriate. mysql is one that is commonly available on web servers including the one that I use for my page (1and1.com). It’s also where this blog is stored. I hadn’t used mysql in a while so I needed some refreshing. I still need to find a way to deal with variations on spelling, but it’s working well so far. Hopefully, magazine editors do a good job running spellcheck.
 
While experimenting with these, I’ve found that I need a web server installed on my pc at home. Copying files back and forth between it and 1and1.com was just too slow. This meant installing Apache webserver, the PHP and mysql extensions for it, and mysql itself.
 
Ok, now I have a dictionary as well as the ability to ask my server for a definition. Now I need something to translate. If you go to any page, you’ll find lots of unrelated stuff on it. In addition to the article itself, there are menus, advertisements, links to other articles, all of which may contain vocabulary that the user doesn’t know. Lots of clutter. How do I know what’s part of the main text and what’s not? As it turns out, html components are annotated with ids and classes that are used to formatting purposes. The text can be identified using these. Every site does it a little different, of course. I don’t have a good generalize solution to this yet. I’m hoping I can look for the blocks with the largest amount of text and find commonalities in their html structure. I’ll probably have to embed some knoweledge into the system.
 
Now I have the text. I make a list of all of the words in it. Time to get the definitions. For many of the words, it’s simple. Dog is in there. So is green. "words" like 20 are easy to translate (ie, I don’t). What about "dogs"? Including all plurals would make the dictionary larger. Even if we’re ok with that, the dictionary file has what it has; I don’t have control of it. Ok, what about words like "walked", or "walks". What about the word, "Obama"?
 
German, with fewer irregularities, makes all this a bit easier than English, but there’s still work to be done. (work I haven’t done yet). For the 300 words, about 1/3 of them are not in the dictionary as-is. From a gut feeling the "undefined" words break out like this:
1) 20%-30% are plurals and basic conjugations. Freund means friend. Freunden means friends. If I get Freunden and find that it’s not in the dictionary, I can lop off the ‘en’ and try again. There are some simple conjugation rules as well. German is pretty regular. Hebrew moreso. I think this means it’ll work for the ones I’m trying to improve.
2) 10% are names and numbers. For these, I think I can ask google translate for it’s translation. If the translation is identical to the input work, I leave it off the list.
 
So that’s where I am now. I have something wiggling and I think it’ll be helpful for me. Pretty soon I’ll look for a beta-tester AKA guinea pig. Probably my sister who’s learning Swedish (I like to called it Svenskish)
 
Couple more weeks of work here and there, but even if it turns out to be a total flop, I’ve learned a lot.
 
 
 
 

Night snow

December 23rd, 2008

 One thing that I noticed this week about the snow is how it affects nighttime lighting. Robie and I were walking to the local pub this past Friday when I noticed it. Due to all of the additional white surfaces, a couple of streetlights really light up the area, even at a time when it should be really dark.

So the other night at about 7:45pm, I took a bunch of pictures in my front yard. They were taken on my tripod at 20 seconds and F7.1. Adjusted the color to offset the yellowness of incandescent lights. The porch lights are compact flourescents which is why they’re that funky color. 

###### Click and drag with your mouse. The picture should move ######

Portland winter wonderland

December 21st, 2008

This week, we had a ton of snow in Portland. The weather had been forcasting a meteorological armagedon all week, but yesterday, it finally arrived. It snowed ALL day. I don’t remember looking outside a single time when it wasn’t snowing.

So I took some pictures yesterday during the day and then again this morning.

As always, click on the photo for a larger image.

Notice the snow level around the planter beds. The ones below here were taken this morning.

While I was outside, a guy came racing by on his snowmobile with his son. Notice that there is a second helmet just above the windshield.

So the scary part of the snow will be the ice. In the picture below, notice that there’s about 1/4-3/8″ or ice on this piece.














vocobulary follow up

December 18th, 2008

In response to my entry this morning I got a couple comments from my friend Grady that I think are worth more than a simple comment back.

Why not just something that will enable me to just click on the work and have the definition just pop up?

There are a couple reasons, though they may not apply to everyone

1) I like the idea of uninterrupted reading. I load a page, sit back and read. If I have to move the mouse to each word, click, and wait for the definition to come up, that’ll mess up my flow, da mojo.

2) Having the definition already there keeps me from being lazy. It’s not a lot of effort to click on a word, but what if I’m feeling self-conscious about the large numbers of words I don’t know, I may be tempted to gloss over them. How embarrassing, I should know these.

3) Seeing them twice will help retention. I want to review a list of words before starting and then have them reinforced during the course of the article

4) I have a couple additional enhancements in mind that require the program to know what I know.

 

The first of these is making flashcards. The extension can keep track of which words I come across frequently and make a list for offline review. This could be made easy by using business card blanks. I format them for the type of cards I have, run then through the printer, flip em over and print the other side.

 The second enhancement will help me watch movies. Many/most movies have subtitles available somewhere online. I could download and process those giving me a kind of a script. Break it down in sections. For the next 10 minutes, I’ll need to know these 15 words. Instead of pausing every minute or two, I can enjoy the flow of the move more.

 A last comment I’ll make on the topic (until Grady comes up with something else) is that I could see myself using this for English! I don’t know if I’m getting ahead of myself on this, but how often do we look words in our own language?

 

Vocabulary

December 17th, 2008

 I haven’t written a blog entry in a while and I’ve recently been questioned about this but two of the people that actually subscribe to my feed. This is out of probably 6 readers; I’m not sure that blogging is in my future as a career.

 The main thing I’ve been up to is learning web programming. It’s recently come to my attention that this stuff has graduated to "real" programming. I’ll explain by way of analogy what I mean.

 Remember the old versions of yahoo mail? MapQuest. Click and wait. Click and wait. Then Google came along with gmail and their maps.google site. I’m not talking about Google earth, but just the basic website. In both case, no real functionality was added. Yahoo had all of the functions of an email client. Folders, contact lists, and so on. Gmail didn’t add to that. Similarly, MapQuest could do everything you’d want. Enter and address and get directions. Zoom in and out and pan around too.

 No one would dispute, however, that in both cases Google’s versions are better. A clearly superior product. Today’s yahoo mail interface is much better, but I’m comparing to what they had before Google embarrassed them.

 The technologies behind these are what I’m referring to.

 For the longest time, I resisted the internet craze, from the programmer perspective anyway. I love shopping online. I buy lots of stuff from Amazon or EBay. SteepAndCheap and Woot are both on my bookmarks. But technically, I wasn’t interest.

 Well that’s changed.

 About a month ago, I found a personal programming project that I’m using as a way to learn this internet stuff as well as, hopefully, something that will help me improve my German.

 I speak German pretty well. I have relatives and friends that don’t speak English at all. I either speak German to them or we don’t communicate. The thing is, I don’t speak the language as well as I’d like and it’s mostly my vocabulary that’s weak. I took German in high school and college and that helped a lot with grammar; I learned some words too. (I had a small crush on one of my MIT teachers. Betina Brandt, I believe was her name. Beautiful Harvard grad student) I wish I could improve my fluency further.

 Every time I go to visit my mom, I vow to read German while I’m in the US, kind of like New Years resolutions, I never follow through. The closest I’ve gotten was by watching the German TV channel during my time in Israel; they carried it on cable since there aren’t many local channels there. Also lots of German immigrants. This time, I decided to try reading magazines. Stern, Spiegel, and Focus all have online content and I can read their content online. Also, thanks to the wonders of BitTorrent, I can watch German movies. There are actually some good ones that Americans might recognize. Shultze Get the Blues. Run Lola Run. The Experiment. All pretty good movies.

 But that doesn’t help me improve my vocabulary. To do that I’d have to get out the dictionary whenever I come across something I don’t understand. Who wants to do that?

 Which brings me to my web programming project.

 I’m in the process of writing a Firefox extension that will present to me a customize word list. When I bring up a webpage, it will extract all of the words that are used on that page. It’ll then find the definition online and combine them in a sidebar to assist me in reading the article.

 At first glance, this would be really annoying. Webpages typically use many words that I already know. Blue, two, boy, tall, car. Who needs help with those? So to make it useful, I’ll add a capability to mark the words that I already know. My vision is that for that first X articles, I’ll be doing a lot of clicking. After a time, the list will be more useful, yielding a list that’s tailored to where ‘I’ am.

 As I see it, this takes the "studying" out of language improvement. (Who wants to study?) Instead, it will take an activity I already do (read) and turn it into something more enjoyable. Instead of missing many of the nuances, I’ll be able to full appreciate them.

 Here’s the flow that I’m thinking.

1) Go to an article

2) Read the vocabulary list. If the list is long, I’ll look at them by paragraph (a feature I’m planning on adding)

3) Read the article text. Hopefully, the definitions will linger in my head long enough to make it through a couple paragraphs. Since I’m reading straight through without stopping to look stuff up, and with a fuller understanding of the text, I’ll get more out of reading it.

 That’s not really much different from how I read today, save for the part about looking stuff up and understanding everything I read. No flash cards, no sitting there repeating long lists of words. Just reading; something that I really like to do.

 Over time, there will be definitions that I’ll see a bunch of times and eventually, I’ll remember some of them. Some words will sink in my brain quickly. Some will take longer.

 Implementing this project has been very interesting. There are a bunch of things I’ve had to learn to make it happen that I’d never dealt with before and some that I haven’t seen in a while.

 I’ll write about those next time.

Beer brewin

October 22nd, 2008

So I like to brew beer. I also have found that it’s just as easy to make 10 or 15 gallons as it is to make the normal 5.

One thing I’ve learned is that it’s important to oxygenate the wort to give the yeast a boost. I do this with an aquarium bubbler.

The key is to not leave it too long.



yummy beer

October 22nd, 2008

I had dinner with my sister this evening and had some yummy beers. The first was a brew that my friend Eric had recommended to me. It’s called “Black Homo-Erectus” it’s a dark IPA. It was pretty good. Nice hoppyness.

I also had a Nut Brown ale, which was also very good. It was from a smaller brewery. “Dick’s Nut Brown”. Good flavor.

Just to prove it, here’s the receipt. Notice anything funny about it? Click for a larger version:

Done got me a lincoln

October 2nd, 2008

One thing I’ve wanted to for a long time now is learn how to weld metal. I took a class a couple years ago but I didn’t take it very far. I didn’t have an idea for a project to provide context. Because I want to make a larger Rocket Stove, the idea of welding was recently rekindled.

So I poked around on ebay and found myself a new Lincoln 175HD

The other day, I welded up my first useful thing. I got a couple pieces of rebar I had lying around from an older project and made a compost aerator/stirrer. I got the idea from this. The idea is that I jam the pointy end into the compost pile and let the barbs move things around. Works pretty well, though it does require some strength.


Either way, no “black man” should be without a lincoln. Well, now, I’ve got one too.

Composting horse manure

October 2nd, 2008

Here’s a picture of our garden stitched form pictures this evening

It’s been a good season. Better than last year; Robie and I are learning a lot. Most folks I know that have a garden have noticed that the season’s been less productive than we’d like, but we’re pleased with the success we have had and what we’ve learned.

On my drive home, I pass by a horse ranch Abbey Creek Stables. Although it’s not there anymore, they used to have a sign at the entrance “Free manure. We load”. Since you can never have enough compost and because I like the word “free”, I went and got some. Two loads actually for a total of about two cubic yards.

Composting is one of those things that sounds more complicated when you read about it than it actually is. Websites and books talk about cabon/nitrogen ratios and all that. Some refer to it as browns/greens. I’ve generally found that what I’ve got is what I’ve got. If I have too much green, like from cutting the grass, I can’t just conjur up some leaves to balance it out. Whatever I have is what ends up in my compost bins.

Having said that, horse manure actually has an optimal ratio. The manure I got is a nice mix of what looks to be sawdust and manure. Good even texture. It heated up pretty quickly. Since I was thinking about composting at the time, I did a bit of reading. One site I came across talked about a simple way of aerating the pile with perforated pipes. This is actually mentioned in a bunch of places, but the key is that the microbes that process compost need oxygen. Most sources advocate turning the compost pile, but that’s a lot of work.

So I got a couple lengths of 1″ pvc and drilled a bunch of holes. I also had a couple lengths of ABS. I went to Harbor Freight and got some thermometers to track what the pile is doing and now I have two of them in the manure pile. The one in the center quickly climbed outside the working range of the thermometers (159 deg) and it stayed there for about two weeks. The other one is in the corner and hung out at around 140 deg. The active composting temp range that I’ve come across in a couple places is about 110-160 deg. Today, three weeks after I got the second load of manure, the center is at 140 deg and the corner is 120. I’m curious to see how it’ll look when things cool off and I dig around in there.

The manure is under the blue tarp. The other pipes are normal compost. One thing I am a little concerned about is an ariticle I just read in mother earth news about some herbicides surviving through horses’ digestive systems.

If it all works out, I’ll probably get a bunch more loads of the stuff.

Oh, and it doesn’t stink at all.