Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Plot a CDF chart by Microsoft Excel

I'm not quite sure if I can ask this question here or on SuperUser,

I want to know how can I plot a CDF chart out of my excel data. My data is something like this (my real data have 22424 records):

1   2.39E-05
1   2.39E-05
1   2.39E-05
2   4.77E-05
2   4.77E-05
2   4.77E-05
4   9.55E-05
4   9.55E-05
4   9.55E-05
4   9.55E-05
4   9.55E-05
4   9.55E-05
8   0.000190931
8   0.000190931
like image 771
Am1rr3zA Avatar asked Dec 30 '10 10:12

Am1rr3zA


2 Answers

This answer is how to create an 'empirical distribution function', which is what many people really have in mind (myself included) when they say CDF... https://en.wikipedia.org/wiki/Empirical_distribution_function

Assuming the second column of the sample data starts in cell B1, in cell C1, type:

=SUM(IF($B$1:$B$14<=B1,1,0))/COUNT($B$1:$B$14)

then press Shift+Enter, to enter it as an array formula. It will now look like this in the formula bar:

{=SUM(IF($B$1:$B$14<=B1,1,0))/COUNT($B$1:$B$14)}

Copy the cell down to cover C1:C14. Then make Scatter plot with B1:B14 as X, C1:C14 as Y. It will show four points.

  • Don't need to sort or remove duplicates
  • Use range names, or take advantage of Excel table capabilities, to manage the input ranges more automatically
  • It is a single-cell array formula, so depending on how you copy-and-paste, you will get a message "Cannot change part of an array". If you use Copy-Paste, copy cell C1, then select cells C2:c14 and Paste.
  • Ideally, the graph should be presented as a step function, but I didn't have time to figure out any way (good or bad) to do that.
like image 147
Knom Avatar answered Oct 23 '22 18:10

Knom


You can use the NORMDIST function and set the final parameter to true:

As an example, suppose I have 20 data points from 0.1 to 2.0 in increments of 0.1 i.e. 0.1, 0.2, 0.3...2.0.

Now suppose that the mean of that dataset is 1.0 and the standard deviation is 0.2.

To get the CDF plot I can use the following formula for each of my values:

=NORMDIST(x, 1.0, 0.2, TRUE) -- where x is 0.1, 0.2, 0.3...2.0

alt text


To remove duplicate entries from your data and sum values that are the same you can use the following code.

  1. In excel, place you data in sheet1, starting in cell A1
  2. Press ALT + F11 to open VBE
  3. Now Insert > Module to place a module in the editor
  4. Cut and paste code below into module
  5. Place cursor anywhere in RemoveDuplicates and Press F5 to run the code

As a result, your unique, summed results will appear in Sheet2 in your workbook.

Sub RemoveDuplicates()
    Dim rng As Range
    Set rng = Range("A1:B" & GetLastRow(Range("A1")))

    rng.AdvancedFilter Action:=xlFilterCopy, CopyToRange:=Worksheets("Sheet2").Range("A1"), Unique:=True

    Dim filteredRng As Range
    Dim cl As Range

    Set filteredRng = Worksheets("Sheet2").Range("A1:A" & GetLastRow(Worksheets("Sheet2").Range("A1")))

    For Each cl In filteredRng
        cl.Offset(0, 1) = Application.WorksheetFunction.SumIf(rng.Columns(1), cl.Value, rng.Columns(2))
    Next cl
End Sub

Function GetLastRow(rng As Range) As Long
    GetLastRow = rng.End(xlDown).Row
End Function
like image 20
Alex P Avatar answered Oct 23 '22 19:10

Alex P