Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Print Different Output Values Corresponding to Duplicate Input in a Table?

For example, TableA:

     ID1    ID2   
     123    abc
     123    def
     123    ghi
     123    jkl
     123    mno
     456    abc
     456    jkl

I want to do a string search for 123 and return all corresponding values.

    pp = Cases[#, x_List /; 
     MemberQ[x, y_String /; 
       StringMatchQ[y, ToString@p, IgnoreCase -> True]], {1}] &@TableA

    {f4@"ID2", f4@pp[[2]]}

Above, p is the input, or 123. This returns only one value for ID2. How do I get all values for ID2?

like image 471
Rose Avatar asked Dec 04 '22 07:12

Rose


1 Answers

To complement other solutions, I would like to explore the high-performance corner of this problem, that is, the case when the table is large, and one needs to perform many queries. Obviously, some kind of preprocessing can save a lot of execution time in such a case. I would like to show a rather obscure but IMO elegant solution based on a combination of Dispatch and ReplaceList. Here is a small table for an illustration (I use strings for all the entries, to keep it close to the original question):

makeTestTable[nids_, nelems_] :=
  Flatten[Thread[{"ID" <> ToString@#, 
         ToString /@ Range[#, nelems + # - 1]}] & /@ Range[nids], 1]

In[57]:= (smallTable = makeTestTable[3,5])//InputForm
Out[57]//InputForm=
{{"ID1", "1"}, {"ID1", "2"}, {"ID1", "3"}, {"ID1", "4"}, {"ID1", "5"}, 
 {"ID2", "2"}, {"ID2", "3"}, {"ID2", "4"}, {"ID2", "5"}, {"ID2", "6"}, 
 {"ID3", "3"}, {"ID3", "4"}, {"ID3", "5"}, {"ID3", "6"}, {"ID3", "7"}}

The preprocessing step consists of making a Dispatch-ed table of rules from the original table:

smallRules = Dispatch[Rule @@@ smallTable];

The code to get (say, for "ID2") the values is then:

In[59]:= ReplaceList["ID2", smallRules]

Out[59]= {"2", "3", "4", "5", "6"}

This does not look like a big deal, but let us move to larger tables:

In[60]:= Length[table = makeTestTable[1000,1000]]
Out[60]= 1000000

Preprocessing step admittedly takes some time:

In[61]:= (rules = Dispatch[Rule @@@ table]); // Timing

Out[61]= {3.703, Null}

But we only need it once. Now, all subsequent queries (perhaps except the very first) will be near instantaneous:

In[75]:= ReplaceList["ID520",rules]//Short//Timing
Out[75]= {0.,{520,521,522,523,524,525,<<988>>,1514,1515,1516,1517,1518,1519}}

while an approach without the preprocessing takes a sizable fraction of a second for this table size:

In[76]:= Cases[table,{"ID520",_}][[All,2]]//Short//Timing
Out[76]= {0.188,{520,521,522,523,524,525,<<988>>,1514,1515,1516,1517,1518,1519}}

I realize that this may be an overkill for the original question, but tasks like this are rather common, for example when someone wants to explore some large dataset imported from a database, directly in Mathematica.

like image 142
Leonid Shifrin Avatar answered Feb 25 '23 22:02

Leonid Shifrin