I am developing a script in my localhst using PHP and mysql and I am dealing with large data (about 2 millions of records for scintific research)
some queries I need to call once in a life (to analyse the data and prepare some data); however it takes very long time for example: now my script is analysing some data for more than 4 hours
I knew I might have some problems in the optimization of my database I am not an expert
for example I just figured out that "indexing" can be useful to speed up the queries however even with indexing some columns my script is still very slow
any idea how to speed up my script (in PHP and mysql)
I am using XAMPP as a server package
Thanks a lot for help
best regards
update 1:
$sql = "select * from urls";//10,000 record of cached HTML documents
$result = $DB->query($sql);
while($row = $DB->fetch_array($result)){
$url_id = $row["id"];
$content = $row["content"];
$dom = new DOMDocument();
@$dom->loadHTML($content);
$xpath = new DOMXPath($dom);
$row = $xpath->evaluate("/html/body//a");
for($i = 0; $i < $row->length; $i++) {
// lots of the code here to deal with the HTML documents and some update and insert and select queries which query another table which has 1 million record
}
update 2:
I do not have "JOIN" in my quires or even "IN"
they are very simple queries
and don't know! and I don't know how to know which causes the slowness?
is it the PHP or the MYSQL?
First of all, to be able to optimize efficiently, you need to know what it taking time :
With those informations, you can then try to figure out :
Still : this is quite a specific question, and the answers will be probably be pretty specific too -- which means more informations might be necessary if you want more than general answer...
Edit after your edits
As you only have simple queries, things might be a bit easier... Maybe.
select * from a where x = 12
" and "select * from a where x = 14
" are of the same type : same select, same table, same where clause -- only the value changesEXPLAIN
will help
microtime
from PHP will help you find out which ones those are
Before that, to find out if PHP is working too much, or if it's MySQL, a simple way is to use the "top" command on Linux, or the "process manager" (I'm not on windows, and don't use it in english -- the real name might be something else).
If PHP is eating 100% of CPU, you have your culprit. If MySQL is eating all CPU, you have your culprit too.
When you know which one of those is working too much, it's a first step : you know what to optimize first.
I see from your portion of code that your are :
If you have a multi-core CPU, an idea (that I would try if I see that PHP is eating lots of CPU) would to to parallelize.
For instance, you could have two instances of the PHP script running at the same time :
select * from urls where id < 5000
"select * from urls where id >= 5000
"You will get a bit more concurrency on the network (probably not a problem) and on the database (a database knows how to deal with concurrency, and 2 scripts using it will generally not be too much), but you'll be able to process almost twice the same amount of documents in the same time.
If you have 4 CPU, splitting the urls-list in 4 (or even more ; find out by trial and error) parts would do too.
Since your query is on one table and has no grouping or ordering, it is unlikely that the query is slow. I expect the issue is the size and number of the content fields. It appears that you are storing the entire HTML of a webpage in your database and then pulling it out every time you want to change a couple of values on the page. This is a situation to be avoided if at all possible.
Most scientific webapps (like BLAST for example) have the option to export the data as a delimited text file like a csv. If this is the case for you, you might consider restructuring your url table so that you have one column per data field in the csv. Then your update queries will be significantly faster as you will be able to do them entirely in SQL instead of pulling the entire url table into PHP, accessing and pulling one or more other records for each url record and then updating your table.
Assumably you have stored your data as webpages so you can dump the content easily to a browser. If you change your database schema as I've suggested, you'll need to write a webpage template that you can plug the data into when you wish to output it.
Knowing queries and table structures it would be easier.
If you cant give them out check if you have IN operator. MySQL tends to slow too much in there. Also try to run
EXPLAIN yourquery;
and see how it is executed. Sometimes sorting takes too much time. Try to avoid sorting on non-index columns.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With