I am trying to extract a sequence of numbers from a column in Google Refine. Here is my code for doing it:
value.match(/[\d]+/)[0]
The data in my column is in the format of
abcababcabc 1234566 abcabcbacdf
The results is "null". I have no idea why!! It is also null if instead of \d
I try \w
.
In GREL, functions can use either of these two forms: functionName(arg0, arg1, ...) arg0. functionName(arg1, ...)
OpenRefine provides a find/replace function for you to edit your data. Selecting Edit cells → Replace will bring up a simple window where you can input a string to search and a string to replace it with.
Google Refine Expression Language (GREL) is to OpenRefine what formulas are to Excel or SQL to a database: a way to accomplish more complex transformations, queries, and arrangement of data. In OpenRefine, GREL can be used in four places: Creating a custom text or numeric facet. Adding a column based on another column.
OpenRefine doesn't add implicit wildcards to the end of the pattern as some systems do (and as one might expect). Try this pattern instead:
value.match(/.*?(\d+).*?/)[0]
You need the lazy/non-greedy qualifier (ie question mark) on the wildcards so that they don't gobble up some of your digits too. If you just use /.*(\d+).*/
you'll only match a single digit because the rest of them will be taken by the .* pattern.
Full documentation for the implementation can be seen in Java's Pattern
class docs.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With