I am working through a gensim
tutorial and have come across something I don't understand. texts
is a nested list of strings:
In [37]: texts
Out[37]:
[['human', 'machine', 'interface', 'lab', 'abc', 'computer', 'applications'],
['survey', 'user', 'opinion', 'computer', 'system', 'response', 'time'],
['eps', 'user', 'interface', 'management', 'system'],
['system', 'human', 'system', 'engineering', 'testing', 'eps'],
['relation', 'user', 'perceived', 'response', 'time', 'error', 'measurement'],
['generation', 'random', 'binary', 'unordered', 'trees'],
['intersection', 'graph', 'paths', 'trees'],
['graph', 'minors', 'iv', 'widths', 'trees', 'well', 'quasi', 'ordering'],
['graph', 'minors', 'survey']]
and sum(texts,[])
gives:
Out[38]:
['human',
'machine',
'interface',
'lab',
'abc',
'computer',
'applications',
'survey',
'user',
'opinion',
'computer',
The list goes on for a few more lines but I omitted the rest to save space. I have two questions:
1) Why does sum(texts,[])
produces that outcome (i.e. flattens the nested list)?
2) Why is the output displayed strangely - one element of per line? Is there something special with this output (...or I suspect it might be my iPython behaving strangely). Please confirm if you see this as well.
It's because adding lists together concatenates them.
sum([a, b, c, d, ..., z], start)
is equivalent to
start + a + b + c + d + ... + z
So
sum([['one', 'two'], ['three', 'four']], [])
is equivalent to
[] + ['one', 'two'] + ['three', 'four']
Which gives you
['one', 'two', 'three', 'four']
Note that start
, by default, is 0
, since by default it works with numbers, so if you were to try
sum([['one', 'two'], ['three', 'four']])
then it would try the equivalent of
0 + ['one', 'two'] + ['three', 'four']
and it would fail because you can't add integers to lists.
The one-per-line thing is just how IPython is deciding to output your long list of strings.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With