New Podcast

Posted Tuesday, January 31 at 3:28 pm
  Thomas Turnbull (tom_o_t on drupal.org) and Alan Palazzolo (zzolo on drupal.org) join Mike Anello to talk about their new book from O’Reilly Media, Mapping with Drupal. Mike’s usual co-hosts, Andrew and Ryan, were both unable to participate in the podcast, leaving Thomas and Alan subject to Mike’s long-winded (but extremely interesting by some accounts) questions.
Download Podcast 73
DrupalEasy_ep73_20120131.mp3
Syndicate content

NEWSLETTER

Stay informed on our latest news!

Syndicate content

Testimonial

The overall quality of DrupalEasy's webinar was top-notch. Using [online tools], the audio and visual was seamless, which let me focus strictly on absorbing new ways to leverage the power of Drupal. Thanks for a useful and efficient webinar, DrupalEasy!

Who are we?

DrupalEasy is the collective expertise of Ryan Price and Michael Anello, who joined forces to provide training and consulting services worldwide. Read all about them and what they can do.

What is Drupal?

Drupal is a free, super-powerful content management system for sites that require information posting and collection, including blogs, forums, videos, photos, and databases of information. We think it is the best platform available. Here's why...

Why Drupal?

More and more savvy organizations are going with Drupal for content management, and its no mystery why. It’s free, flexible, and easy to maintain for small or large volume sites. Learn more...

Import Hundreds of Taxonomy Terms using AWK

No votes yet

Today's challenge: your editors just handed you almost 200 taxonomy terms to add to the site, and you don't have the time or inclination to hit the taxonomy/n/add/term page for the next 2 hours or so... AWK to the rescue!

Doing a simple CSV export of the term_data and term_hierarchy tables, you've got a pretty simple structure:

term_data
tid,vid,name,description,weight

term_hierarchy
tid,parent

What you'll ultimately generate here is a file that stores everything you need to know about importing these terms via a CSV - the term names, the weights, good IDs, and the TID of the parents.

2048,#the current value of the sequences for term_data
term,34,Blogs,#a helper line
x,1,Drupal,All about Drupal,-5
x,1,Modules,Ways to extend Drupal,-4
x,1,Themes,Making your install pretty,-3
term,35,News,#a second helper line
x,1,International,,0
x,1,Local,,0
x,1,Hyperlocal,,0

In this file, we've got 3 types of data:

  1. The starting value for sequences
  2. The id of the parent term for the next several rows, starting with the word "term" followed by the TID and the plain English name just to help us get organized
  3. The new terms, with an "x" where the new TIDs will be placed, and the VID, Description and Weight all filled out.

You could use AWK to help you generate this file, if you had a list with all the terms each on their own line, and you didn't care about weights initially. I created mine by hand, as the list of terms was not in alphabetical order, and needed some help on the weights.

Next you need to generate your AWK script. Two of our rules will essentially skip to the next line without printing anything:

awk 'BEGIN {FS=","; OFS=",";}
NR == 1 {sequences = $1} # only runs if this is the first record
/^term/ {term = $2; next} # only runs on term helper lines
/^x/ {print ++sequences, $2, $3, $4, $5}' term_data.txt > import.txt

AWK has some pre-defined variables, like FS and OFS for input and output field separators, and NR for the current record number. In the case of NR == 1, this is a condition so the instructions are only executed on line 1. On line 1, we want to grab the sequences data to be used when printing.

The other 2 rules don't apply to line 1. They both check for regex matches at the beginning of the line. The ^term line contains the "next" instruction, which is similar to the "continue" command in other programming languages.

Your output should look something like this, but about 190 lines longer:

2049,1,Drupal,All about Drupal,-5
2050,1,Modules,Ways to extend Drupal,-4
2051,1,Themes,Making your install pretty,-3
2052,1,International,,0
2053,1,Local,,0
2054,1,Hyperlocal,,0

A variation on the ^term line:

/^term/ {weight = -10; next}
/^x/ {print ++sequences, $2, $3, $4, weight++}

Now every time you get a new term parent, the weight is re-set to -10. If you have more than 21 terms, the weights will go above +10, but Drupal still understands weights outside of |10|.

The other data you need to generate is the term_hierarchy data. Start with the same base file, because the import.txt you wrote out to no longer has the parent data. Here the same script piped into a second AWK command that makes use of the parent data.

awk 'BEGIN {FS=","; OFS=",";}
NR == 1 {sequences = $1} # only runs if this is the first record
/^term/ {term = $2; next} # only runs on term helper lines
/^x/ {print ++sequences, $2, $3, $4, $5}' term_data.txt | \
awk 'BEGIN {FS=","; OFS=",";}
/^term/ {term = $2; next}
/^[^term]/ {print $1, term}' > import_hierarchy.csv

Notice that you'll need to change the second rule in the second script to match lines that don't start with term, because after you print everything out with the first script, your lines no longer start with "x".

That's all for today's AWK class. There are some more examples of using AWK on Drupal Easy for your enjoyment.

Trackback URL for this post:

http://drupaleasy.com/trackback/11

1 comment

Pingback

[...] Go to the author’s original blog: Import Hundreds of Taxonomy Terms using AWK [...]

Add your comment

The content of this field is kept private and will not be shown publicly.
 
  • Web page addresses and e-mail addresses turn into links automatically.
  • Allowed HTML tags: <a> <em> <strong> <cite> <code> <ul> <ol> <li> <dl> <dt> <dd> <blockquote>
  • Lines and paragraphs break automatically.

More information about formatting options

Syndicate content