Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Which data structure in C allow me to store lines and append lines easily?

I got a list of string data. 10,20,30 are the line numbers

10. string 1
20. string 2
30. string 3

and if user types in "23 string data". 23 is the line number user wants to insert into. The data should become like that

10. string 1
20. string 2
23. string data
30. string 3

and if user types in "40 string data". The data should become like that

10. string 1
20. string 2
23. string data
30. string 3
40. string data

I'm relatively new in C's data structure. Which data structure should I use to store this kind of data efficiently? My current direction is to implement dynamic array or linked list. However, below are the list of problems I experienced.

Problem with dynamic array:

  1. Using the line number as the index and create sufficient space to ensure array length is always more or equal to the highest line number. Waste a lot of memory for un-used index.
  2. Printing out the data would be an issue. Eg. index 0-9 doesn't have memory allocated. Accessing it will cause an error. Need to find ways to know which index are used?

Problem with linked list:

  1. I cannot jump index(not sure). How do identify which line comes after another and insert line in between easily?
like image 956
ysj Avatar asked Nov 06 '16 06:11

ysj


3 Answers

Let's assume the following about your requirements:

  1. No strong real time. (I.e. it's not for high frequency trading, or controlling machinery.)

  2. It runs on a relatively contemporary PC (RAM measured in GB, CPU frequency in GHz). In particular it does not run on an embedded system.

  3. The data is no more than a few ten thousand lines.

Then you can use almost any data structure you like; it won't matter with respect to memory or run time behavior.

For example, in order to find the point of insertion in a linked list, just iterate that list. PCs are fast enough to iterate tens of thousands of times before you finished blinking.

Or just allocate an array of 100,000 lines of 80 characters each. No problem whatsoever. Or of a million lines. Still no problem. Or of 10 million lines, still no problem. You see my point? (In an array you'll need a marker to mark unused lines. I would use a struct line { bool used; char text[80]; } or the like. You can also cater to arbitrarily long lines — and save memory — by having just a char *text member and allocating dynamically, or defining the text as a linked list of chunks.)

The choice therefore boils down to what's easiest for you to use. Could be the array.

like image 77
Peter - Reinstate Monica Avatar answered Sep 23 '22 19:09

Peter - Reinstate Monica


I'll give the two solutions I could come up with, but this question is possibly open-ended.

  1. Use a hash table. Keys are line numbers. Values are (string, pointer to next line's value). This makes both random and linear access fast. Edit: Insertion is still O(n) with this. It'll only help with access time, which will be O(1). The second solution has O(1) insertion.

  2. Assuming you don't have wildly spaced out line numbers: Use a singly linked list L to store strings. Also create a separate array P containing a pointer to every k-th node in the list. To access line i, check P[floor(i/k)], jump to the node it points to in L, and jump forward i mod k times to reach your string. Access time is therefore O(k). Insertion time is O(1). Space usage for n strings is O(n + max{i}/k).

The one thing that makes this relevant to C... is that there's no built-in hash table, of course! So #2 may be easier to implement.

like image 34
sudo Avatar answered Sep 20 '22 19:09

sudo


I know you're looking for a specialized data structure, but how about instead using a simple data structure but sorting it lazily? You could append new lines to a dynamic array and then sort the array (with qsort) when you need to print them.

I think that this would be better because printing all lines is probably done much less frequently than adding/inserting lines. Therefore you should make adding lines cheap (in this case, O(1) amortized), and printing can be more expensive (in this case, O(n log n)). This also keeps your data structures simple and lets the C standard library handle the complicated parts.

You could make this a bit better still by keeping a flag that tracks whether all of the data is already known to be sorted; that way repeatedly printing (or, presuming you're trying to write a BASIC interpreter, repeatedly running) will be cheap too. Such a flag also might be helpful if you expect that lines are usually entered in order; then as each line is added:

alreadySorted = alreadySorted && (new_line_number > last_line_number)

I'll note that you have not specified what happens if a line is added that reuses an existing line number. If you wish to replace the old line, then you could tweak this approach by using a stable sort and afterward iterating over the lines to remove lines with duplicate numbers, keeping only the last one.

(If you want to make qsort stable for this case, instead of storing just a string for each line, you could store some extra metadata with it (any monotonically increasing counter would do, such as the current time, or just the total number of lines at the time the line was added). Then the comparison function you give to qsort would just need to use that extra data to resolve ties from duplicate line numbers.)

One disadvantage to this approach is that removing lines either won't be fast or won't reclaim memory immediately. However, you haven't specified whether line removal is a requirement; even if it is, it is likely to be a rare operation (so being a bit more time-inefficient or a bit more space-inefficient might be acceptable).

like image 27
jamesdlin Avatar answered Sep 23 '22 19:09

jamesdlin