I want to compare costs of CPT codes from two different claims payers. Both have par and non par priced providers. I am using dplyr
and modeest::mlv
, but its not working out as anticipated. Heres some sample data;
source CPTCode ParNonPar Key net_paid PaidFreq seq
ABC 100 Y ABC100Y -341.00 6 1
ABC 100 Y ABC100Y 0.00 2 2
ABC 100 Y ABC100Y 341.00 6 3
XYZ 103 Y XYZ103Y 740.28 1 1
XYZ 104 N XYZ104N 0.00 2 1
XYZ 104 N XYZ104N 401.82 1 2
XYZ 104 N XYZ104N 726.18 1 3
XYZ 104 N XYZ104N 893.00 1 4
XYZ 104 N XYZ104N 928.20 2 5
XYZ 104 N XYZ104N 940.00 2 6
and the code
str(data)
View(data)
## Expand frequency count to individual observations
n.times <- data$PaidAmounts
dataObs <- data[rep(seq_len(nrow(data)), n.times),]
## Calculate mean for each CPTCode (for mode use modeest library)
library(dplyr)
library(modeest)
dataSummary <- dataObs %>%
group_by(ParNonPar, CPTCode) %>%
summarise(mean = mean(net_paid),
median=median(net_paid),
mode = mlv(net_paid, method=mfv),
total = sum(net_paid))
str(dataSummary)
I thought I could load modeest in the summarize function with the mean and median, but this formulation errors out with Error in as.character(x) : cannot coerce type 'closure' to vector of type 'character' Without mlv I am getting a df like this, but what I want is to get all the stats for a payer cpt on one line. I envision graphing it in boxplots by limiting the x and y segments, once I get what I need on a row
the inadequate answer is this ( I forgot to get the payer name in here!)
ParNonPar CPTCode mean median(net_paid) total
N 0513F 0.000000 0.000 0.00
N 0518F 0.000000 0.000 0.00
N 10022 0.000000 0.000 0.00
N 10060 73.660000 90.120 294.64
N 10061 324.575000 340.500 1298.30
N 10081 312.000000 312.000 312.00
thanks very much for your time and effort.
R does not have a standard in-built function to calculate mode. So we create a user function to calculate mode of a data set in R. This function takes the vector as input and gives the mode value as output.
The mode() method in R extracts the most frequently occurring value(s) in a vector x or data frame.
The summarize() function is used in the R program to summarize the data frame into just one value or vector. This summarization is done through grouping observations by using categorical values at first, using the groupby() function. The dplyr package is used to get the summary of the dataset.
I use this approach:
df <- data.frame(groups = c("A", "A", "A", "B", "B", "C", "C", "C", "D"), nums = c("1", "2", "1", "2", "3", "4", "5", "5", "1"))
which looks like:
groups nums
A 1
A 2
A 1
B 2
B 3
C 4
C 5
C 5
D 1
Then I define:
mode <- function(codes){
which.max(tabulate(codes))
}
and do the following:
mds <- df %>%
group_by(groups) %>%
summarise(mode = mode(nums))
giving:
groups mode
A 1
B 2
C 5
D 1
You need to make a couple of changes to your code for mlv to work.
Try:
dataSummary <- dataObs %>%
group_by(ParNonPar, CPTCode) %>%
summarise(mean = mean(net_paid),
meadian=median(net_paid),
mode = mlv(net_paid, method='mfv')[['M']],
total = sum(net_paid))
to get:
> dataSummary
Source: local data frame [3 x 6]
Groups: ParNonPar
ParNonPar CPTCode mean meadian mode total
1 N 104 639.7111 893.00 622.7333 5757.40
2 Y 100 0.0000 0.00 0.0000 0.00
3 Y 103 740.2800 740.28 740.2800 740.28
Hope that helps you move forward.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With