I got a list of string data. 10,20,30 are the line numbers
10. string 1
20. string 2
30. string 3
and if user types in "23 string data". 23 is the line number user wants to insert into. The data should become like that
10. string 1
20. string 2
23. string data
30. string 3
and if user types in "40 string data". The data should become like that
10. string 1
20. string 2
23. string data
30. string 3
40. string data
I'm relatively new in C's data structure. Which data structure should I use to store this kind of data efficiently? My current direction is to implement dynamic array or linked list. However, below are the list of problems I experienced.
Problem with dynamic array:
Problem with linked list:
Let's assume the following about your requirements:
No strong real time. (I.e. it's not for high frequency trading, or controlling machinery.)
It runs on a relatively contemporary PC (RAM measured in GB, CPU frequency in GHz). In particular it does not run on an embedded system.
The data is no more than a few ten thousand lines.
Then you can use almost any data structure you like; it won't matter with respect to memory or run time behavior.
For example, in order to find the point of insertion in a linked list, just iterate that list. PCs are fast enough to iterate tens of thousands of times before you finished blinking.
Or just allocate an array of 100,000 lines of 80 characters each. No problem whatsoever. Or of a million lines. Still no problem. Or of 10 million lines, still no problem. You see my point? (In an array you'll need a marker to mark unused lines. I would use a struct line { bool used; char text[80]; }
or the like. You can also cater to arbitrarily long lines — and save memory — by having just a char *text
member and allocating dynamically, or defining the text as a linked list of chunks.)
The choice therefore boils down to what's easiest for you to use. Could be the array.
I'll give the two solutions I could come up with, but this question is possibly open-ended.
Use a hash table. Keys are line numbers. Values are (string, pointer to next line's value)
. This makes both random and linear access fast. Edit: Insertion is still O(n)
with this. It'll only help with access time, which will be O(1)
. The second solution has O(1)
insertion.
Assuming you don't have wildly spaced out line numbers: Use a singly linked list L
to store strings. Also create a separate array P
containing a pointer to every k
-th node in the list. To access line i
, check P[floor(i/k)]
, jump to the node it points to in L
, and jump forward i mod k
times to reach your string. Access time is therefore O(k)
. Insertion time is O(1)
. Space usage for n
strings is O(n + max{i}/k)
.
The one thing that makes this relevant to C... is that there's no built-in hash table, of course! So #2 may be easier to implement.
I know you're looking for a specialized data structure, but how about instead using a simple data structure but sorting it lazily? You could append new lines to a dynamic array and then sort the array (with qsort
) when you need to print them.
I think that this would be better because printing all lines is probably done much less frequently than adding/inserting lines. Therefore you should make adding lines cheap (in this case, O(1) amortized), and printing can be more expensive (in this case, O(n log n)). This also keeps your data structures simple and lets the C standard library handle the complicated parts.
You could make this a bit better still by keeping a flag that tracks whether all of the data is already known to be sorted; that way repeatedly printing (or, presuming you're trying to write a BASIC interpreter, repeatedly running) will be cheap too. Such a flag also might be helpful if you expect that lines are usually entered in order; then as each line is added:
alreadySorted = alreadySorted && (new_line_number > last_line_number)
I'll note that you have not specified what happens if a line is added that reuses an existing line number. If you wish to replace the old line, then you could tweak this approach by using a stable sort and afterward iterating over the lines to remove lines with duplicate numbers, keeping only the last one.
(If you want to make qsort
stable for this case, instead of storing just a string for each line, you could store some extra metadata with it (any monotonically increasing counter would do, such as the current time, or just the total number of lines at the time the line was added). Then the comparison function you give to qsort
would just need to use that extra data to resolve ties from duplicate line numbers.)
One disadvantage to this approach is that removing lines either won't be fast or won't reclaim memory immediately. However, you haven't specified whether line removal is a requirement; even if it is, it is likely to be a rare operation (so being a bit more time-inefficient or a bit more space-inefficient might be acceptable).
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With