Archive for February, 2006

Importing categories and listings into Mambo mosDirectoy

Wednesday, February 15th, 2006

This post is going to serve as a personal reference for when I have to import an existing spreadsheet (in this case it is an excel .xls file) into mosDirectory in Mambo. The listings I have are organized by SIC (standard industry code), and I have a corresponding key that ties SIC codes to category names.

First, we need to look at the table structure for categories. I added the “sic” column so I can keep track of how each category relates to the standardized SIC categories.
The only column that aren’t set to the default values are sic, parent, and name. So, I can take my top level SIC categories and create a query that contains only these fields:

INSERT INTO mos_dir_cat (sic, name, published) VALUES
(1031,’Lead and Zinc Ores’, 1)

Since my list has 3 tiers of categories (we call them Parent Category, Category, and Sub Category), I have to go through a process where I load in the level 1 categories first, load in the level 2 categories (while keeping track of their parent categories in a tmp row), and then load in the level 3 categories while keeping track of their parent categories in a tmp row as well.

  1. Load level 1 category names (no sic code)
  2. Load level 2 category names, sic code, and parent category names
  3. Update level 2 rows, setting the “parent” column with the id of the parent category. To do this we use a SQL to pull the id off of the parent category listed in the “tmp” row:

    UPDATE mos_dir_cat as m1, mos_dir_cat as m2
    SET m1.parent = m2.id WHERE m1.tmp = m2.name

  4. Repeat the process for level 3, first loading in the names, sic codes, and parent category names (which go into the tmp row), then filtering out the parent category names from the tmp row and inserting the parent id’s into the parent row.

excel doc screenshotA note on what I mean by “load”: Basically I cut and past the columns I need from the excel document into a text file. It will paste them in with tabs separating the rows. I do a simple search and replace for [\t] (\t stands for a tab) and replace them with [”, “]. Then I search and replace for [\r] (a line break) and replace it with [”),\r(”]. This will turn this:

Agricultural Products - Crops Wheat 0111
Agricultural Products - Crops Corn 0115

into

(”Agricultural Products - Crops”, “Wheat”, “0111″),
(”Agricultural Products - Crops”, “Corn”, “0115″),

Although this involves some manual steps, it is a great way to convert a text and tabs document into simple SQL. I then add my insert statement to the top, and close out the query at the very end of the file by replacing the last comma with a semicolon:

INSERT INTO mos_dir_cat (tmp, name, sic) VALUES
(”Agricultural Products - Crops”, “Wheat”, “0111″),
(”Agricultural Products - Crops”, “Corn”, “0115″),

(”National Security”, “International Affairs”, “9721″);

So now that the categories are in (phew!), how am I going to import the 25,000 business into the directory as listings?

  1. Organize the columns I need in my excel document.
  2. Export as a CSV file.
  3. Convert the CSV file to SQL by hand (like I did before when I cut and pasted data from excel into a text file, except instead of converting tabs to [”, “], I just convert the line breaks so that they include the parenthesis at the beginning and end of everyline.
  4. Insert these listings into the mos_dir_listings table, leaving the “cid” column blank until the next step.
  5. Run and UPDATE query on the mos_dir_listings table, updating cid with the category id using the sic code:

    UPDATE mos_dir_listings AS listing, mos_dir_cat AS cat
    SET listing.cid = cat.id
    WHERE listing.sic = cat.sic

  6. Man that query took a long time to execute. So long I almost thought it crashed. After making it through all 25,000 rows, we are finished!

We’re done! I now have three tiers of categories and their corresponding sic codes loaded into mos_dir_cat, and 25,000 listings in mos_dir_listings, linked to categories by way of their sic code.

programming language translation

Friday, February 10th, 2006

is probably a well-established field of inquiry, but I know nothing about it.

I was looking for the equivalent of php’s array_diff() in ruby, and I found this:

http://pleac.sourceforge.net/

It’s an attempt to solve a set of common programming problems with an assortment of different languages.
Each time I learn a new programming language I am struck by the awkward fumbly-ness of going from a language I am adept at to another I am a novice at.  (java to perl, perl to php, php to javascript, and now ruby).  You would expect that each successive language would come easier, and that thereby the process would be progressively more and more rewarding but in fact, the opposite is true:  Taking on new languages is like taking on a new lover.   With each new nuance and idiosyncracy you discover, you find yourself missing the comfortable familiarity of the old.  You unwittingly gather in your mind a set of all pleasant and fulfilling features against which you impossibly compare each new instance, and as you get progressively more bitter and jaded you accept that the only love you will ever know exists only as a conglomeration-memory of all the good memories of all the lovers you ever new, and by extension you love something that never really existed as an individual in the first place.   Dying is when you finally let go of this unattainable dream.
But what would make this transition-period smoother, I realized, would be an online programming language translator (that would translate function-by-function (or method or procedure)).  Surely something like this exists out there?

The fun thing, of course, would be to discover bit by bit the underlying (overarching) babel- (omni- ?)- language among all programming languages, and all the weird constructs that some have that others don’t.  (Closures, prototype-based vs. class-based OOP, lamda functions.)

The future never ceases to evade me

Tuesday, February 7th, 2006

Some random thoughts (ADD-parade):

goddess worship

thanks Sarah, kriskrug and ccote. (This is not a photoshop job. She was actually wearing this shirt.)

google identity

Scott, mom tells me that you are now the #1 google-hit for “meves.” Congrats. Having been displaced I feel a little disoriented, but I suppose that is my fault for carelessly leaving my sense of identity with a search engine ;)

body twitch

I said it here, now: One day in the next 5 years there will be a (successful?) chain of facilities that function as a mix between a video-arcade and fitness club. It will start with a gym installing “Dance Dance Revolution” arcade-game (or, less likely, an arcade installing showers) and/or some other such similar “physical” videogames, (like that boxing one, which seems to tire people out) … and then there will be a bigger market for such games and game companies will start to offer a wider variety of such games.

Research and engineering are of course underway for these games already. And I imagine that we could all come up with some remarkable physical videogames of our own. (I want one that teaches me how to dance to the “My Milkshake” song)

It might be difficult to make this lucrative in a gym-setting, because initial cost and the maintainence for a arcade DDR system is probably more than for an elliptical machine or treadmill (or is it?) and arcades and fitness centers differ in their respective pricing models. Also a challenge will be marketing it to the right demographic. There currently isn’t much crossover between the arcade-game market and the fitness-club set.

progress vertigo

Excerpt from David Heinemeier Hansson from the Rails book (pp.215 hardcopy): When discussing using SQL vs. ORM (e.g. DAO, e.g. ActiveRecord) David argues convincingly that there is a time and a place for both ORM and SQL. He says that you should “start out using the object-oriented interface for productivity and pleasure, and then dip beneath the surface for a close-to-the-metal experience when you need it to.” He is in fact referring to SQL as “close-to-the-metal”, which gave me a yet another little wave of progress-vertigo.

(All this time I felt guilty for not being too strong on hardware / networking (to say nothing of compiled languges or that .. what’s it called? assemblism code?), but now I feel more well-rounded for knowing something “low-level” like SQL. The future never ceases to evade me.)

In contrast,strongbad seems to have a handle on things with his succinct definition of technology

time management

Went to the dentist today (my actual teeth pictured). More on this later.

EZPDO, Propel, Rails

Monday, February 6th, 2006

mark,

After out discussion about EZPDO, here is another one that looks more established and is use in the Symfony framework:

http://propel.phpdb.org/trac/

I think I may switch to this because it looks like it’s quite a bit more popular.

Also, I ordered the Ruby book. Doh.

Beat me to it again

Wednesday, February 1st, 2006

Damn. I guess I shouldn’t be surprised. Someone beat me to it — this is what my first project with Ruby on Rails was supposed to be:

http://cmsmatrix.org/

It’s is a tool that dynamically helps you choose among dozens (hundreds?) of content management systems.

(For those shopping for CMS’s see also opensourcecms.com, which lets you try out dozens and dozens of CMS’s and related things. (unfortunately this is limited to apps written in PHP4.))
This “decision maker” matrix is something that I have been wanting since like 2001, when I was sort of working on a version in Perl (or was it PHP?) to make such a thing for cell phones. (Choosing a cellphone plan can be daunting for the newcomer, if you remember that far back.) But then of course, I realized that such a tool would be useful for all kinds of purchasing decisions (Like a layer on top of pricewatch that does queries for you.) Surely, hundreds of other developers (entrepeneurs?) have thought of the value of something like this: Image a “push” ebay that stores your queries for desired items, and stores those items in a massive hyper-normalized (?) database. (A concept net? A knowledge-base?) It then notifies you when it finds items that fuzzy-match your query (within some tolerance).

Oh.. right. Amazon does this (among other online superstores). But does amazon have frontend that lets you construct queries with arbitrarily deep granularity with respect to the “problem domain”? What the hell am I talking about?
Surely, such a thing should be on its way to the surface by now, no? How come I haven’t heard of it? (It’s a rhetorical question — please don’t answer with an insult unless it’s clever ;))

I imagine the biggest challenges facing such a system would include but not be limited to:

  • keeping data current (spiders? trust metrics?)
  • keeping data accurate (”truthful”)
  • keeping data [properties? aspects? attributes? features?] unbiased and subjective
  • choosing aspect that can be compared to each other (aspect association)
  • allowing new aspects to be added dynamically

As for “keeping data current”, you would want spiders, when possible. I have always wanted a system that recognizes a spider as a “user” or “contributor” along with its human counterparts. (To have both humans and bots in the same table makes me shudder with excitement and reminds me of a story.) As for trust metrics, imagine a slashdot where people contribute not articles but well-defined, structured data?  (If you could get this easy enough for end-users to do, and you could set up an online shop, people would probably come knocking.)

But surely there’s something I’m not thinking of.

Well I can’t wait to see it when it happens. Shouldn’t be long now.

Instant Obsolessence

Wednesday, February 1st, 2006

is my business.

(So when do I get used to it?)

[then, on feb. 03:]

Is it possible that software is not like anything else, that it is meant to be discarded: that the whole point is to see it as a soap bubble?

—Alan Perlis

I discoverd Alan Perlis in Jim Weirich’s excellent blog where he is described as:

the first head of the Carnegie Mellon University Computer Science Department and the first recipient of the Turing Award, [who] recorded some of his accumlated knowledge about programming in a series of one sentence statements. You can read more about his Epigrams here

When were you first Turing-tricked?

Wednesday, February 1st, 2006

For me it was in like 1998 when I was in a users’ forum IRC chat room with bunch of Debian (linux) users. I kept asking neophyte questions like “how do you use apt-get” and they kept deferring my questions to some guy named Frank.
“Frank, tell Mark about apt-get.”

And Frank spit out a bunch of stuff about apt-get.

Everyone in the room kept sending the really easy questions to Frank, and what struck me was the manner in which they spoke to him: they were really rude and direct. I kept thanking Frank, and did not understand why the others weren’t extending him the same respect.

So anyway… blah blah blah I was engaged in a social interaction with this thing for a good minute or so before I realized that I was talking to a bot. Whatever: my belief was not suspended for that long, but long enough to give me a jolt. The feeling is probably akin to standing next to someone in an elevator and turning to them to ask the time and then realizing they are a mannequin. (one of my pet peeves in fact.)

Although it wore off quickly, the initial novelty of talking to a (somewhat useful) bot was exhilirating.
[more on this later]