Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

(relational) database performance for a date/time point/interval

So I am doing a project in Access SQL and it has come along nicely. I have learned a lot about Access and VBA and this site has been helpful in the process.

Now I am facing a problem which is performance and since I have little experience in this kind of SQL work I come here for some thoughts.

I have a ~20 table relational database for around 100 sections which represent parts of a route. The Access database is essentially a map on which I drew several routes (via lines) that can be coloured dynamically - the color is determined by the specific question and calculated out of the database.

Here is a picture which explains it better. You can not click on lines in access so the buttons are set to be identical in colour and width to the lines and are clickable for more information. a thing

The user can chose a date and it will display the progress of the route according to the question asked. Up to now, these questions were always binary "yes, or no" (green or red).

I have found that because of the complexity of the queries I have to pretty much prepare a temporary database for each query at startup, otherwise it is not possible to scroll through dates smoothly.

So anyway here is my specific problem:

Each section of the route can be in different phases (think construction) at a certain date. From "phase 0" to "done"

A new line is to be implemented which represents phases of a project. There are around 8 possible phases for all sections, which can happen at different times and - here is the thing - in a different order for each section AND not all phases happen on all sections.

What I have in the database are only starting dates - not ending dates - for each phase. The order of the phases has pretty much be determined by the order of the starting date. At least each phase can only happen once for each section, so there is that. As you can see this is a shitty thing for this kind of performance centric program.

I am certain it will involve one or several temporary databases. My ideas:

  1. Aggregate all dates into one row of a new table. Since the number of phases is set, there are columns for each phase - if it is needed, when it starts and when it ends. A loop now needs to go through each and check if the user-date falls into which phase. So: "SectionID - phase1needed phase1start phase1end ....."
    Advantage:

    • One can confirm the data manually and display it in secondary forms well
    • It keeps the database small
      Disadvantage:
    • The actual loop needs to go through (At worst) all phases to find the correct one.
  2. Calculate a new database which is just "IdSection - Date - Phase" and calculate a phase for each Section and EVERY Day in an interval.
    Advantage:

    • This keeps the runtime calculations to one query per section
    • Access should work with large amounts of data
      Disadvantage:
    • I can not manually check if what I did was correct for all sections
    • Will take long at startup, like really long
    • It will take a lot of entries in that db

Now I ask which you would prefer, or even if there is a different method? I can not really change much about the points of data I have.

In short I have to display intervals of time of different phases and in the database I only have starting points of time, no complete order of the phases.

Thank you for your thoughts, any experiences in these sort of things will help

like image 693
IMA Avatar asked Dec 21 '12 07:12

IMA


People also ask

Are relational databases good for time series data?

A relational database can be quite powerful for time-series data. Yet, the costs of swapping in/out of memory significantly impacts their performance. But NoSQL approaches that implement Log Structured Merge Trees have only shifted the problem, introducing higher memory requirements and poor secondary index support.

Why is performance important in a relational database?

Database Performance Fuels Company Performance and slow systems result in lost revenue, lower productivity, and increased support costs. When performance problems occur, database administrators (DBAs) are frequently the first to troubleshoot.

Is NoSQL good for time series?

Another type of database, NoSQL, are also often used to store time series data. Since NoSQL databases are more flexible in terms of the data format for each record, they are good for capturing time series data from a number of distinct sources.

How do I pass a date range in SQL query?

SELECT * FROM PERSONAL WHERE BIRTH_DATE_TIME BETWEEN '2000-01-01 00:00:00' AND '2002-09-18 12:00:00';


1 Answers

If I understand you properly, you have a series of data similar to the form:

Section 1, Phase 7, Start Date = 11/07/2012
Section 1, Phase 2, Start Date = 12/14/2012
Section 1, Phase 3, Start Date = 12/28/2012
Section 2, Phase 1, Start Date = 11/04/2012
Section 2, Phase 9, Start Date = 12/30/2012
Section 3, Phase 4, Start Date = 11/19/2012
Section 3, Phase 5, Start Date = 12/06/2012
Section 3, Phase 3, Start Date = 12/11/2012

and you want to answer a question like "What phase is each section in on 12/15/2012?", is that correct?

The answer in this case should look something like the form:

Section 1, Phase 2
Section 2, Phase 1
Section 3, Phase 3

In order to do this, I'll assume you have a table called SECTION_PHASES with the following fields:

SECTION    Number
PHASE      Number
START_DATE Date/Time

What you need to do is figure out the maximum start date for each section that happened before your current input date, because that is the most recently active phase before the next phase change. Once you do that, you can join that information back into your main table to determine what the phase was after that date.

You need to make one query SECTION_MAX_DATES that then has the following code in its SQL View:

SELECT [SECTION_PHASES].SECTION, Max([SECTION_PHASES].START_DATE) AS target_date
FROM SECTION_PHASES
WHERE [SECTION_PHASES].START_DATE<#12/15/2012#
GROUP BY [SECTION_PHASES].SECTION
ORDER BY [SECTION_PHASES].SECTION;

Once you have that query saved, you can join it as a subquery back to your original table. Now, make another query SECTION_PHASE_AT_DATE which includes your original table and the previous query, then enter the following code in its SQL View:

SELECT SECTION_PHASES.SECTION, SECTION_PHASES.PHASE, SECTION_PHASES.START_DATE
FROM SECTION_MAX_DATES INNER JOIN SECTION_PHASES ON (SECTION_MAX_DATES.target_date=SECTION_PHASES.START_DATE) AND (SECTION_MAX_DATES.SECTION=SECTION_PHASES.SECTION)
ORDER BY SECTION_PHASES.SECTION;

That query will give you the result you are after, if I understand your question correctly. There is no need to calculate the end dates if I understand you properly that a new start date for a given phase indicates the end of whatever phase was previously-current prior to the new date.

You'll still have a few edge cases to work out, like what happens if a section doesn't have a phase registered yet prior to the given date. I'll also leave it to you to figure out how to parameterize the date in the WHERE clause of the 1st of the two queries, which is probably trivial for you given the progress you made already! However, I think this is the SQL structure you were looking for to solve the data/calculation part of your problem.

like image 71
Lluluien Avatar answered Nov 15 '22 03:11

Lluluien