Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Is there a better way to extract information from a string?

Let's say I have an array of strings, and I need specific info from them, what would be an easy way to do that?

Suppose the array is this:

let infoArr = [
  "1 Ben Howard 12/16/1988 apple",
  "2 James Smith 1/10/1999 orange",
  "3 Andy Bloss 10/25/1956 apple",
  "4 Carrie Walters 8/20/1975 peach",
  "5 Doug Jones 11/10/1975 peach"
];

Let's say I want to extract the date and save it into another array, well I could make a function like this

function extractDates(arr)
{
  let dateRegex = /(\d{1,2}\/){2}\d{4}/g, dates = "";
  let dateArr = [];

  for(let i = 0; i<arr.length; i++)
  {
    dates = /(\d{1,2}\/){2}\d{4}/g.exec(arr[i])
    dates.pop();
    dateArr.push(dates);
  }

  return dateArr.flat();
}

Although this works, it is clunky and requires pop() because it will return an array of arrays, ie: ["12/16/1988", "16/"], plus I need to call flat() afterwards.

Another option would be to substring the strings, with a given position, where I need to know a regex pattern.

function extractDates2(arr)
{
  let dates = [];

  for(let i = 0; i<arr.length; i++)
  {
    let begin = regexIndexOf(arr[i], /(\d{1,2}\/){2}\d{4}/g);
    let end = regexIndexOf(arr[i], /[0-9] /g, begin) + 1;
    dates.push(arr[i].substring(begin, end));
  }

  return dates;
 }    

And of course it uses the next regexIndexOf() function:

function regexIndexOf(str, regex, start = 0)
{
  let indexOf = str.substring(start).search(regex);
  indexOf = (indexOf >= 0) ? (indexOf + start) : -1;
  return indexOf;
}

Again this function also works, but it seems too awful to accomplish the extraction of something simple. Is there an easier way to extract data into an array?

like image 524
Travis Avatar asked Jan 02 '19 03:01

Travis


People also ask

How do you extract data from a string?

When you are working with data stored as a string, you can extract substrings from the total string. This extraction is done by specifying the offset within the string, indicating from which position you want to extract the substring. Position number from which to start extracting.

How do you extract a string from a string?

You call the Substring(Int32) method to extract a substring from a string that begins at a specified character position and ends at the end of the string. The starting character position is zero-based; in other words, the first character in the string is at index 0, not index 1.

How do I extract the first word of a string?

Extract the First Word Using Text Formulas The FIND part of the formula is used to find the position of the space character in the text string. When the formula finds the position of the space character, the LEFT function is used to extract all the characters before that first space character in the text string.


1 Answers

One approach could be using map() over the elements of the array applying the match on each element, and finally call flat() to get the desired result:

let infoArr = [
  "1 Ben Howard 12/16/1988 apple",
  "2 James Smith 1/10/1999 orange",
  "3 Andy Bloss 10/25/1956 apple",
  "4 Carrie Walters 8/20/1975 peach",
  "5 Doug Jones 11/10/1975 peach"
];

const result = infoArr.map(o => o.match(/(\d{1,2}\/){2}\d{4}/g)).flat();

console.log(result);

Alternatively, you could use flatMap():

let infoArr = [
  "1 Ben Howard 12/16/1988 apple",
  "2 James Smith 1/10/1999 orange",
  "3 Andy Bloss 10/25/1956 apple",
  "4 Carrie Walters 8/20/1975 peach",
  "5 Doug Jones 11/10/1975 peach"
];

const result = infoArr.flatMap(o => o.match(/(\d{1,2}\/){2}\d{4}/g));

console.log(result);

Also, if you need to remove null values from the final array in the case there are strings without dates, you can apply filter(), like this:

const result = infoArr.map(o => o.match(/(\d{1,2}\/){2}\d{4}/g))
                      .flat()
                      .filter(date => date !== null);

const result = infoArr.flatMap(o => o.match(/(\d{1,2}\/){2}\d{4}/g))
                      .filter(date => date !== null);

An example with conflicting data:

let infoArr = [
  "1 Ben Howard 12/16/1988 apple 10/22/1922",
  "2 James Smith orange",
  "3 Andy Bloss 10/25/1956 apple",
  "4 Carrie Walters 8/20/19075 peach",
  "5 Doug Jones 11/10-1975 peach"
];

const result = infoArr.flatMap(o => o.match(/(\d{1,2}\/){2}\d{4}/g))
                      .filter(date => date !== null); /* or filter(date => date) */

console.log(result);

Alternative without flat():

Since flat() and flatMap() are still currently "experimental", subject to change, and some browser (or versions) don't support it, you can use next alternative with the limitation that will only get the first match on every string:

const infoArr = [
  "1 Ben Howard 12/16/1988 apple 10/22/1922",
  "2 James Smith orange",
  "3 Andy Bloss 10/25/1956 apple",
  "4 Carrie Walters 8/20/19075 peach",
  "5 Doug Jones 11/10-1975 peach"
];

const getData = (input, regexp, filterNulls) =>
{
    let res = input.map(o =>
    {
        let matchs = o.match(regexp);
        return matchs && matchs[0];
    });

    return filterNulls ? res.filter(Boolean) : res;
}

console.log(getData(infoArr, /(\d{1,2}\/){2}\d{4}/g, false));
console.log(getData(infoArr, /(\d{1,2}\/){2}\d{4}/g, true));
like image 183
Shidersz Avatar answered Oct 02 '22 14:10

Shidersz