Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Problems with tokenize

Tags:

string

groovy

I have

def testStr = 'a:*b*c*d'

I want to get

tokens[0]=='a'
tokens[1]=='b*c*d'

I try

def tokens = testStr.tokenize(':*')

but get

tokens[0]=='a' 
tokens[1]=='b'
tokens[2]=='c'
tokens[3]=='d'

How can I do this thing

like image 253
Karen Avatar asked Apr 12 '12 08:04

Karen


People also ask

What are the disadvantages of tokenization?

Disadvantages of Tokenization Implementing tokenization does certainly add a layer of complexity to your IT structure, with processing transactions becoming more complicated and comprehensive. It also doesn't eliminate all security risks.

What are the challenges of tokenization?

The challenge is to find a framework where the key value proposition of tokenized assets is not significantly diminished or lost in the process. The current proposals for security token standards are still at an early stage and are expected to evolve considerably over time.

What are the challenges of tokenization in NLP?

Challenges in Tokenization One of the biggest challenges in the tokenization is the getting the boundary of the words. In English the boundary of the word is usually defined by a space and punctuation marks define the boundary of the sentences, but it is not same in all the languages.

Can tokenization be hacked?

Tokenization replaces the Primary Account Number (PAN) with randomly generated tokens. If intercepted, the data contains no cardholder information, rendering it useless to hackers.


1 Answers

tokenize takes a list of possible tokens, so it's splitting on both : and *

You probably want split which takes a regular expression to split on (and returns a String[]):

def testStr = 'a:*b*c*d'

def tokens = testStr.split( /:\*/ )
assert tokens[ 0 ] == 'a'
assert tokens[ 1 ] == 'b*c*d'
like image 161
tim_yates Avatar answered Sep 29 '22 13:09

tim_yates