Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How do I determine if I am getting record duplication in MYSQL?

I'm running a query to find out how much estimated work was done on a factory floor and how much time was actually tracked in comparison to the amount of hours that station has available.

I"m doing this to determine which machines we need to purchase more of. Anything that we have a usage factor of over 100% is something that we're over capacity.

The issue is that I'm getting astronomically high numbers for some operations. It is impossible that 5 men working each at a machine could track more than 120 hours however the result I am getting is well over a thousand.

What I do in the query is take all the batches, which have tasks and sum all of the estimated time of each tasks. I sum all of the time_elapsed in the batch_log and I multiply the hours_open by the number of machines of that operation.

Because of this, deburr should have a max of 120 hours as they are open 24 hours a day and there are 5 deburring stations. Does anything glaring jump out when looking at this query?

Please let me know if you need more info.

SELECT 
  DATE(bl.start_time) as date_tracked,
  o.name as operation,
  SUM(TIME_TO_SEC(bl.time_elapsed)/ 3600)  as time_elapsed,
  SUM(t.estimated_nonrecurring + t.estimated_recurring) / 3600  as estimated,
  o.hours_open as hours_open,  
    (count(distinct m.id)) as machine_count,
  hours_open * (count(distinct m.id)) as total_hours,
  (sum(TIME_TO_SEC(bl.time_elapsed)) / 3600) / (count(distinct m.id)) as time_elapsed_usage
FROM
  batches b
INNER JOIN 
  tasks t on b.id = t.batch_id
INNER JOIN  
  batch_log bl on b.id = bl.batch_id
INNER JOIN
  operations o on b.operation_id = o.id 
INNER JOIN
  machines m  on b.operation_id = m.operation_id
WHERE 
  bl.time_elapsed < "8:00:00"

GROUP BY
  b.operation_id,
  DATE(bl.start_time)
ORDER BY date_tracked, o.id

So I've started again and once I get to this point I seem to have duplication in the time elapsed:

select 
  batches.operation_id,
  date(batch_log.start_time) as date,
  SEC_TO_TIME(SUM(TIME_TO_SEC(batch_log.time_elapsed))) as elapsed,
    sum(tasks.estimated_nonrecurring + tasks.estimated_recurring) as estimated_time

from
  batches
INNER JOIN batch_log on batches.id = batch_log.batch_id
INNER JOIN tasks on batches.id = tasks.batch_id
WHERE batches.id not in (
-1,
-2,
-3,
-4,
-5,
-6,
-7,
-8,
-9,
-10,
-11,
-12,
-13,
-14
)
group by Date(batch_log.start_time), operation_id 
order by batch_log.start_time, batches.operation_id

EDIT: What am I doing wrong in the above? If I knew this I could be careful to structure queries better. Honestly, I haven't been able to find anything and I've been digging through SQL books. Even if I could get an answer on the smaller statement I could make some progress. Working on other stuff for now.

like image 772
davidahines Avatar asked Dec 04 '22 19:12

davidahines


1 Answers

Usually when I want to do something complex like this I start one chunk at a time and check the data so I know what I should expect.

So to develop I start with using SELECT * and work out my joins first. If it is a large dataset, I may filter by a where clasue to select only one set of records, a batch in your case. THen I can go back to the raw data one table at atime and check my results.

Once I know that I haven't accidentally made the number of records too big, then I start adding in the real columns I want. Instead of formulas, I select the actual columns that will be in the formulas. This helps me accurately develop the formulas. THen I add in the formulas and any necessary grouping.

PS. I suspect they are right that machines is causing the isse, but I wanted to give you a set of tools for figuring out future problems.

like image 145
HLGEM Avatar answered Dec 08 '22 14:12

HLGEM