<p>I have some random string, for example: <code>Hello, my name is john.</code>. I want that string split into an array like this: <code>Hello, ,, , my, name, is, john, .,</code>. I tried <code>str.split(/[^\w\s]|_/g)</code>, but it does not seem to work. Any ideas?</p>

<p>This solution caused a challenge with spaces for me (still needed them), then I gave <code>str.split(/\b/)</code> a shot and all is well. Spaces are output in the array, which won't be hard to ignore, and the ones left after punctuation can be trimmed out.</p>

How do you split a javascript string by spaces and punctuation?

Tags:

javascript

regex

split

I have some random string, for example: Hello, my name is john.. I want that string split into an array like this: Hello, ,, , my, name, is, john, .,. I tried str.split(/[^\w\s]|_/g), but it does not seem to work. Any ideas?

961

asked May 28 '11 15:05

chromedude

2 Answers

This solution caused a challenge with spaces for me (still needed them), then I gave str.split(/\b/) a shot and all is well. Spaces are output in the array, which won't be hard to ignore, and the ones left after punctuation can be trimmed out.

answered Oct 17 '22 22:10

MikeyB

To split a str on any run of non-word characters I.e. Not A-Z, 0-9, and underscore.

var words=str.split(/\W+/);  // assumes str does not begin nor end with whitespace

Or, assuming your target language is English, you can extract all semantically useful values from a string (i.e. "tokenizing" a string) using:

var str='Here\'s a (good, bad, indifferent, ...) '+
        'example sentence to be used in this test '+
        'of English language "token-extraction".',

    punct='\\['+ '\\!'+ '\\"'+ '\\#'+ '\\$'+   // since javascript does not
          '\\%'+ '\\&'+ '\\\''+ '\\('+ '\\)'+  // support POSIX character
          '\\*'+ '\\+'+ '\\,'+ '\\\\'+ '\\-'+  // classes, we'll need our
          '\\.'+ '\\/'+ '\\:'+ '\\;'+ '\\<'+   // own version of [:punct:]
          '\\='+ '\\>'+ '\\?'+ '\\@'+ '\\['+
          '\\]'+ '\\^'+ '\\_'+ '\\`'+ '\\{'+
          '\\|'+ '\\}'+ '\\~'+ '\\]',

    re=new RegExp(     // tokenizer
       '\\s*'+            // discard possible leading whitespace
       '('+               // start capture group
         '\\.{3}'+            // ellipsis (must appear before punct)
       '|'+               // alternator
         '\\w+\\-\\w+'+       // hyphenated words (must appear before punct)
       '|'+               // alternator
         '\\w+\'(?:\\w+)?'+   // compound words (must appear before punct)
       '|'+               // alternator
         '\\w+'+              // other words
       '|'+               // alternator
         '['+punct+']'+        // punct
       ')'                // end capture group
     );

// grep(ary[,filt]) - filters an array
//   note: could use jQuery.grep() instead
// @param {Array}    ary    array of members to filter
// @param {Function} filt   function to test truthiness of member,
//   if omitted, "function(member){ if(member) return member; }" is assumed
// @returns {Array}  all members of ary where result of filter is truthy
function grep(ary,filt) {
  var result=[];
  for(var i=0,len=ary.length;i++<len;) {
    var member=ary[i]||'';
    if(filt && (typeof filt === 'Function') ? filt(member) : member) {
      result.push(member);
    }
  }
  return result;
}

var tokens=grep( str.split(re) );   // note: filter function omitted 
                                     //       since all we need to test 
                                     //       for is truthiness

which produces:


tokens=[ 
  'Here\'s',
  'a',
  '(',
  'good',
  ',',
  'bad',
  ',',
  'indifferent',
  ',',
  '...',
  ')',
  'example',
  'sentence',
  'to',
  'be',
  'used',
  'in',
  'this',
  'test',
  'of',
  'English',
  'language',
  '"',
  'token-extraction',
  '"',
  '.'
]

EDIT

Also available as a Github Gist

answered Oct 17 '22 22:10

Rob Raisch

Related questions
                            
                                Get window height on mobile devices (especially iPhones) without using jQuery
                            
                                JavaScript if string is in comma delimited string
                            
                                How should I crop the image at client side using jcrop and upload it?
                            
                                Set date() to midnight in users timezone with moment.js
                            
                                JS equivalent for jQuery one()
                            
                                Access props sent to components along with Redux state data
                            
                                How to tell the version number of RxJS
                            
                                How to access php SESSION variable in javascript [duplicate]
                            
                                How to change boolean value on click in angular 2 component
                            
                                How to add scroll into react-bootstrap Modal.Body
                            
                                How to get data and response status from API using node-fetch?
                            
                                filter array of objects by another array of objects
                            
                                Javascript - onchange within <option>
                            
                                setTimeout but for a given time
                            
                                Get the current url but without the http:// part bookmarklet!
                            
                                Convert tags to html entities
                            
                                Dependency inversion principle in JavaScript
                            
                                Setting the text of an <option> element using jQuery
                            
                                Create divs from Array elements
                            
                                Character Limit On Textbox

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With