Renumber reference variables in text columns

Question

Background

For a data entry project, a user can enter variables using a short-hand notation:

"Pour i1 into a flask."
"Warm the flask to 25 degrees C."
"Add 1 drop of i2 to the flask."
"Immediately seek cover."

In this case i1 and i2 are reference variables, where the number refers to an ingredient. The text strings are in the INSTRUCTION table, the ingredients the INGREDIENT table.

Each ingredient has a sequence number for sorting purposes.

Problem

Users may rearrange the ingredient order, which adversely changes the instructions. For example, the ingredient order might look as follows, initially:

seq | label
1   | water
2   | sodium

The user adds another ingredient:

seq | label
1   | water
2   | sodium
3   | francium

The user reorders the list:

seq | label
1   | water
2   | francium
3   | sodium

At this point, the following line is now incorrect:

"Add 1 drop of i2 to the flask."

The i2 must be renumbered (because ingredient #2 was moved to position #3) to point to the original reference variable:

"Add 1 drop of i3 to the flask."

Updated Details

This is a simplified version of the problem. The full problem can have lines such as:

"Add 1 drop of i2 to the o3 of i1."

Where o3 is an object (flask), and i1 and i2 are water and sodium, respectively.

Table Structure

The ingredient table is structured as follows:

id | seq | label

The instruction table is structured as follows:

step

Algorithm

The algorithm I have in mind:

Repeat for all steps that match the expression '\mi([0-9]+)':
Break the step into word tokens.
For each token:
1. If the numeric portion of the token matches the old sequence number, replace it with the new sequence number.
2. Recombine the tokens and update the instruction.
3. Update the ingredient number.

Update

The algorithm may be incorrect as written. There could be two reference variables that must change. Consider before:

seq | label
1   | water
2   | sodium
3   | caesium
4   | francium

And after (swapping sodium and caesium):

seq | label
1   | water
2   | caesium
3   | sodium
4   | francium

Every i2 in every step must become i3; similarly i3 must become i2. So

"Add 1 drop of i2 to the flask, but absolutely do not add i3."

Becomes:

"Add 1 drop of i3 to the flask, but absolutely do not add i2."

Code

The code to perform the first two parts of the algorithm resembles:

CREATE OR REPLACE FUNCTION
  renumber_steps(
    p_ingredient_id integer,
    p_old_sequence integer,
    p_new_sequence integer )
  RETURNS void AS
$BODY$
DECLARE
  v_tokens text[];

BEGIN
  FOR v_tokens IN
    SELECT
      t.tokens
    FROM (
      SELECT
        regexp_split_to_array( step, '\W' ) tokens,
        regexp_matches( step, '\mi([0-9]+)' ) matches
      FROM
        instruction
      ) t
  LOOP
    RAISE NOTICE '%', v_tokens;
  END LOOP;
END;
$BODY$
  LANGUAGE plpgsql VOLATILE
  COST 100;

Question

What is a more efficient way to solve this problem (i.e., how would you eliminate the looping constructs), possibly leveraging PostgreSQL-specific features, without a major revision to the data model?

Thank you!

System Details

PostgreSQL 9.1.2.

Erwin Brandstetter · Accepted Answer

You have to take care that you don't change ingredients and seq numbers back and forth. I introduce a temporary prefix for ingredients and negative numbers for seq for that purpose and exchange them for permanent values when all is done.

Could work like this:

CREATE OR REPLACE FUNCTION renumber_steps(_old int[], _new int[])
  RETURNS void AS
$BODY$
DECLARE
    _prefix     CONSTANT text := ' i';    -- prefix, incl. leading space
    _new_prefix CONSTANT text := ' ###';  -- temp prefix, incl. leading space
    i           int;
    o           text;
    n           text;
BEGIN

IF array_upper(_old,1) <> array_upper(_new,1) THEN
    RAISE EXCEPTION 'Array length mismatch!';
END IF;

FOR i IN 1  .. array_upper(_old,1) LOOP
    IF _old[i] <> _new[i] THEN
        o := _prefix || _old[i] || ' ';        -- leading and trailing blank!
        -- new instruction are temporarily prefixed with new_marker
        n := _new_prefix || _new[i]  || ' ';

        UPDATE instruction
        SET    step = replace(step, o, n)      -- replace all instances
        WHERE  step ~~ ('%' || o || '%');

        UPDATE ingredient
        SET    seq = _new[i] * -1              -- temporarily negative
        WHERE  seq = _old[i];
    END IF;
END LOOP;

-- finally replace temp. prefix
UPDATE instruction
SET    step = replace(step, _new_prefix, _prefix)
WHERE  step ~~ ('%' || _new_prefix || '%');

-- .. and temp. negative seq numbers
UPDATE ingredient
SET    seq = seq * -1
WHERE  seq < 0;

END;
$BODY$
  LANGUAGE plpgsql VOLATILE STRICT;

Call:

SELECT renumber_steps('{2,3,4}'::int[], '{4,3,2}'::int[]);

The algorithm requires ...
... that ingredients in the steps are delimited by spaces.
... that there are no permanent negative seq numbers.

_old and _new are ARRAYs of the old and new instruction.seq of ingredients that change position. The length of both arrays has to match, or an exception will be raised. It can contain seq that don't change. Nothing will happen to those.

Requires PostgreSQL 9.1 or later.

Renumber reference variables in text columns

Tags:

sql

full-text-search

postgresql

plpgsql

Background

Problem

Updated Details

Table Structure

Algorithm

Update

Code

Question

System Details

Dave Jarvis

1 Answers

Erwin Brandstetter

Recent Activity

Donate For Us

Renumber reference variables in text columns

Tags:

sql

full-text-search

postgresql

plpgsql

Background

Problem

Updated Details

Table Structure

Algorithm

Update

Code

Question

System Details

Dave Jarvis

1 Answers

Erwin Brandstetter

Related questions

Recent Activity

Donate For Us