Preventing Memory issues when handling large amounts of text

Tags:

c#

I have written a program which analyzes a project's source code and reports various issues and metrics based on the code.

To analyze the source code, I load the code files that exist in the project's directory structure and analyze the code from memory. The code goes through extensive processing before it is passed to other methods to be analyzed further.

The code is passed around to several classes when it is processed.

The other day I was running it on one of the larger project my group has, and my program crapped out on me because there was too much source code loaded into memory. This is a corner case at this point, but I want to be able to handle this issue in the future.

What would be the best way to avoid memory issues?

I'm thinking about loading the code, do the initial processing of the file, then serialize the results to disk, so that when I need to access them again, I do not have to go through the process of manipulating the raw code again. Does this make sense? Or is the serialization/deserialization more expensive then processing the code again?

I want to keep a reasonable level of performance while addressing this problem. Most of the time, the source code will fit into memory without issue, so is there a way to only "page" my information when I am low on memory? Is there a way to tell when my application is running low on memory?

Update: The problem is not that a single file fills memory, its all of the files in memory at once fill memory. My current idea is to rotate off the disk drive when I process them

230

asked Sep 15 '09 14:09

Dan McClain

2 Answers

1.6GB is still manageable and by itself should not cause memory problems. Inefficient string operations might do it.

As you parse the source code your probably split it apart into certain substrings - tokens or whatver you call them. If your tokens combined account for entire source code, that doubles memory consumption right there. Depending on the complexity of the processing you do the mutiplier can be even bigger. My first move here would be to have a closer look on how you use your strings and find a way to optimize it - i.e. discarding the origianl after the first pass, compress the whitespaces, or use indexes (pointers) to the original strings rather than actual substrings - there is a number of techniques which can be useful here.

If none of this would help than I would resort to swapping them to and fro the disk

173

answered Oct 16 '22 22:10

mfeingold

If the problem is that a single copy of your code causing you to fill the memory available then there are atleast two options.

serialize to disk
compress files in memory. If you have a lot of CPU it can be faster to zip and unzip information in memory, instead of caching to disk.

You should also check if you are disposing of objects properly. Do you have memory problems due to old copies of objects being in memory?

answered Oct 16 '22 23:10

Shiraz Bhaiji

Related questions
                            
                                What is the allocation being saved here?
                            
                                Is iterating over an array with a for loop a thread safe operation in C# ? What about iterating an IEnumerable<T> with a foreach loop?
                            
                                Why is my .NET framework app looking for the wrong version of the .NET core/standard platform extension assembly, and how do I fix it?
                            
                                How to validate multi part compressed (i.e zip) files have all parts or not in C#?
                            
                                How to handle dynamic error pages in .net MVC Core?
                            
                                Convert IntPtr to Int64: conv.u8 or conv.i8?
                            
                                How do I read only part of a column from a Parquet file using Parquet.net?
                            
                                How to change .NET Framework to .NET Standard/Core in Visual Studio?
                            
                                Resharper breaking Visual Studio 2019 functionality
                            
                                RDLC issues on monitors with higher recommended scaling
                            
                                log4net/c# - Different layout based on the level
                            
                                Cross-referencing across multiple databases
                            
                                How do I track a repeating calendar event in C# / SQL Server? [closed]
                            
                                Raising events asynchronously
                            
                                Does Entity Framework/LINQ to SQL Data Binding use reflection?
                            
                                Persisting user preferences in Silverlight
                            
                                Interrupt an active screensaver programmatically?
                            
                                Is it possible to load and execute C# snippets using DLR?
                            
                                Fast Access to the type/method/... that holds an Attribute in C#
                            
                                Allow an infrared device to send a signal to control the monitor of a PC

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With