Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Value.match() Regex in Google Refine

I am trying to extract a sequence of numbers from a column in Google Refine. Here is my code for doing it:

value.match(/[\d]+/)[0]

The data in my column is in the format of

abcababcabc 1234566 abcabcbacdf

The results is "null". I have no idea why!! It is also null if instead of \d I try \w.

like image 606
mchangun Avatar asked Jul 27 '13 10:07

mchangun


People also ask

What are the two types of grel general refine expression language form?

In GREL, functions can use either of these two forms: functionName(arg0, arg1, ...) arg0. functionName(arg1, ...)

How do you find and replace in OpenRefine?

OpenRefine provides a find/replace function for you to edit your data. Selecting Edit cells → Replace will bring up a simple window where you can input a string to search and a string to replace it with.

What is grel?

Google Refine Expression Language (GREL) is to OpenRefine what formulas are to Excel or SQL to a database: a way to accomplish more complex transformations, queries, and arrangement of data. In OpenRefine, GREL can be used in four places: Creating a custom text or numeric facet. Adding a column based on another column.


1 Answers

OpenRefine doesn't add implicit wildcards to the end of the pattern as some systems do (and as one might expect). Try this pattern instead:

value.match(/.*?(\d+).*?/)[0]

You need the lazy/non-greedy qualifier (ie question mark) on the wildcards so that they don't gobble up some of your digits too. If you just use /.*(\d+).*/ you'll only match a single digit because the rest of them will be taken by the .* pattern.

Full documentation for the implementation can be seen in Java's Pattern class docs.

like image 128
Tom Morris Avatar answered Oct 11 '22 16:10

Tom Morris