Importing categories and listings into Mambo mosDirectoy
This post is going to serve as a personal reference for when I have to import an existing spreadsheet (in this case it is an excel .xls file) into mosDirectory in Mambo. The listings I have are organized by SIC (standard industry code), and I have a corresponding key that ties SIC codes to category names.
First, we need to look at the table structure for categories. I added the “sic” column so I can keep track of how each category relates to the standardized SIC categories.
The only column that aren’t set to the default values are sic, parent, and name. So, I can take my top level SIC categories and create a query that contains only these fields:
INSERT INTO mos_dir_cat (sic, name, published) VALUES
(1031,’Lead and Zinc Ores’, 1)
Since my list has 3 tiers of categories (we call them Parent Category, Category, and Sub Category), I have to go through a process where I load in the level 1 categories first, load in the level 2 categories (while keeping track of their parent categories in a tmp row), and then load in the level 3 categories while keeping track of their parent categories in a tmp row as well.
- Load level 1 category names (no sic code)
- Load level 2 category names, sic code, and parent category names
- Update level 2 rows, setting the “parent” column with the id of the parent category. To do this we use a SQL to pull the id off of the parent category listed in the “tmp” row:
UPDATE mos_dir_cat as m1, mos_dir_cat as m2
SET m1.parent = m2.id WHERE m1.tmp = m2.name - Repeat the process for level 3, first loading in the names, sic codes, and parent category names (which go into the tmp row), then filtering out the parent category names from the tmp row and inserting the parent id’s into the parent row.
A note on what I mean by “load”: Basically I cut and past the columns I need from the excel document into a text file. It will paste them in with tabs separating the rows. I do a simple search and replace for [\t] (\t stands for a tab) and replace them with [”, “]. Then I search and replace for [\r] (a line break) and replace it with [”),\r(”]. This will turn this:
Agricultural Products - Crops Wheat 0111
Agricultural Products - Crops Corn 0115
into
(”Agricultural Products - Crops”, “Wheat”, “0111″),
(”Agricultural Products - Crops”, “Corn”, “0115″),
Although this involves some manual steps, it is a great way to convert a text and tabs document into simple SQL. I then add my insert statement to the top, and close out the query at the very end of the file by replacing the last comma with a semicolon:
INSERT INTO mos_dir_cat (tmp, name, sic) VALUES
(”Agricultural Products - Crops”, “Wheat”, “0111″),
(”Agricultural Products - Crops”, “Corn”, “0115″),
…
(”National Security”, “International Affairs”, “9721″);
So now that the categories are in (phew!), how am I going to import the 25,000 business into the directory as listings?
- Organize the columns I need in my excel document.
- Export as a CSV file.
- Convert the CSV file to SQL by hand (like I did before when I cut and pasted data from excel into a text file, except instead of converting tabs to [”, “], I just convert the line breaks so that they include the parenthesis at the beginning and end of everyline.
- Insert these listings into the mos_dir_listings table, leaving the “cid” column blank until the next step.
- Run and UPDATE query on the mos_dir_listings table, updating cid with the category id using the sic code:
UPDATE mos_dir_listings AS listing, mos_dir_cat AS cat
SET listing.cid = cat.id
WHERE listing.sic = cat.sic - Man that query took a long time to execute. So long I almost thought it crashed. After making it through all 25,000 rows, we are finished!
We’re done! I now have three tiers of categories and their corresponding sic codes loaded into mos_dir_cat, and 25,000 listings in mos_dir_listings, linked to categories by way of their sic code.