Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Async I/O intensive code is running slower than non-async, why?

I am refactoring an application and trying to add an asynchronous version of an existing function to improve performance times in an ASP.NET MVC application. I understand that there is an overhead involved with asynchronous functions, but I expected that with enough iterations, the I/O intensive nature of loading the data from the database would more than compensate for the overhead penalty and that I would receive significant performance gains.

The TermusRepository.LoadByTermusId function loads data by retrieving a bunch of datatables from the database (using ADO.NET and the Oracle Managed Client), populates a model, and returns it. TermusRepository.LoadByTermusIdAsync is similar, except it does so asynchronously, with a slightly different method of loading up datatable download tasks when there's multiple datatables to retrieve.

public async Task<ActionResult> AsyncPerformanceTest()
{
    var vm = new AsyncPerformanceTestViewModel();
    Stopwatch watch = new Stopwatch();
    watch.Start();
    for (int i = 0; i < 60; i++)
    {
        TermusRepository.LoadByTermusId<Termus2011_2012EndYear>("1");
        TermusRepository.LoadByTermusId<Termus2011_2012EndYear>("5");
        TermusRepository.LoadByTermusId<Termus2011_2012EndYear>("6");
        TermusRepository.LoadByTermusId<Termus2011_2012EndYear>("7");
    }
    watch.Stop();
    vm.NonAsyncElapsedTime = watch.Elapsed;
    watch.Reset();
    watch.Start();
    var tasks = new List<Task<Termus2011_2012EndYear>>();
    for (int i = 0; i < 60; i++)
    {
        tasks.Add(TermusRepository.LoadByTermusIdAsync<Termus2011_2012EndYear>("1"));
        tasks.Add(TermusRepository.LoadByTermusIdAsync<Termus2011_2012EndYear>("5"));
        tasks.Add(TermusRepository.LoadByTermusIdAsync<Termus2011_2012EndYear>("6"));
        tasks.Add(TermusRepository.LoadByTermusIdAsync<Termus2011_2012EndYear>("7"));               
    }
    await Task.WhenAll(tasks.ToArray());
    watch.Stop();
    vm.AsyncElapsedTime = watch.Elapsed;            
    return View(vm);
}

public static async Task<T> LoadByTermusIdAsync<T>(string termusId) where T : Appraisal
{
    var AppraisalHeader = new OracleCommand("select tu.termus_id, tu.manager_username, tu.evaluee_name, tu.evaluee_username, tu.termus_complete_date, termus_start_date, tu.termus_status, tu.termus_version, tn.managername from tercons.termus_users tu left outer join tercons.termus_names tn on tu.termus_id=tn.termus_id where tu.termus_id=:termusid");
    AppraisalHeader.BindByName = true;
    AppraisalHeader.Parameters.Add("termusid", termusId);
    var dt = await Database.GetDataTableAsync(AppraisalHeader);
    T Termus = Activator.CreateInstance<T>();
    var row = dt.AsEnumerable().Single();
    Termus.TermusId = row.Field<decimal>("termus_id").ToString();
    Termus.ManagerUsername = row.Field<string>("manager_username");
    Termus.EvalueeUsername = row.Field<string>("evaluee_username");
    Termus.EvalueeName = row.Field<string>("evaluee_name");
    Termus.ManagerName = row.Field<string>("managername");
    Termus.TERMUSCompleteDate = row.Field<DateTime?>("termus_complete_date");
    Termus.TERMUSStartDate = row.Field<DateTime>("termus_start_date");
    Termus.Status = row.Field<string>("termus_status");
    Termus.TERMUSVersion = row.Field<string>("termus_version");
    Termus.QuestionsAndAnswers = new Dictionary<string, string>();

    var RetrieveQuestionIdsCommand = new OracleCommand("select termus_question_id from tercons.termus_questions where termus_version=:termus_version");
    RetrieveQuestionIdsCommand.BindByName = true;
    RetrieveQuestionIdsCommand.Parameters.Add("termus_version", Termus.TERMUSVersion);
    var QuestionIdsDt = await Database.GetDataTableAsync(RetrieveQuestionIdsCommand);
    var QuestionIds = QuestionIdsDt.AsEnumerable().Select(r => r.Field<string>("termus_question_id"));

    //There's about 60 questions/answers, so this should result in 60 calls to the database. It'd be a good spot to combine to a single DB call, but left it this way so I could see if async would speed it up for learning purposes.
    var DownloadAnswersTasks = new List<Task<DataTable>>();
    foreach (var QuestionId in QuestionIds)
    {
        var RetrieveAnswerCommand = new OracleCommand("select termus_response, termus_question_id from tercons.termus_responses where termus_id=:termus_id and termus_question_id=:questionid");
        RetrieveAnswerCommand.BindByName = true;
        RetrieveAnswerCommand.Parameters.Add("termus_id", termusId);
        RetrieveAnswerCommand.Parameters.Add("questionid", QuestionId);
        DownloadAnswersTasks.Add(Database.GetDataTableAsync(RetrieveAnswerCommand));
    }
    while (DownloadAnswersTasks.Count > 0)
    {
        var FinishedDownloadAnswerTask = await Task.WhenAny(DownloadAnswersTasks);
        DownloadAnswersTasks.Remove(FinishedDownloadAnswerTask);
        var AnswerDt = await FinishedDownloadAnswerTask;
        var Answer = AnswerDt.AsEnumerable().Select(r => r.Field<string>("termus_response")).SingleOrDefault();
        var QuestionId = AnswerDt.AsEnumerable().Select(r => r.Field<string>("termus_question_id")).SingleOrDefault();
        if (!String.IsNullOrEmpty(Answer))
        {
            Termus.QuestionsAndAnswers.Add(QuestionId, System.Net.WebUtility.HtmlDecode(Answer));
        }
    }
    return Termus;
}

public static async Task<DataTable> GetDataTableAsync(OracleCommand command)
{
    DataTable dt = new DataTable();
    using (var connection = GetDefaultOracleConnection())
    {
        command.Connection = connection;
        await connection.OpenAsync();
        dt.Load(await command.ExecuteReaderAsync());
    }
    return dt;
}

public static T LoadByTermusId<T>(string TermusId) where T : Appraisal
{
    var RetrieveAppraisalHeaderCommand = new OracleCommand("select tu.termus_id, tu.manager_username, tu.evaluee_name, tu.evaluee_username, tu.termus_complete_date, termus_start_date, tu.termus_status, tu.termus_version, tn.managername from tercons.termus_users tu left outer join tercons.termus_names tn on tu.termus_id=tn.termus_id where tu.termus_id=:termusid");
    RetrieveAppraisalHeaderCommand.BindByName = true;
    RetrieveAppraisalHeaderCommand.Parameters.Add("termusid", TermusId);
    var AppraisalHeaderDt = Database.GetDataTable(RetrieveAppraisalHeaderCommand);
    T Termus = Activator.CreateInstance<T>();
    var AppraisalHeaderRow = AppraisalHeaderDt.AsEnumerable().Single();
    Termus.TermusId = AppraisalHeaderRow.Field<decimal>("termus_id").ToString();
    Termus.ManagerUsername = AppraisalHeaderRow.Field<string>("manager_username");
    Termus.EvalueeUsername = AppraisalHeaderRow.Field<string>("evaluee_username");
    Termus.EvalueeName = AppraisalHeaderRow.Field<string>("evaluee_name");
    Termus.ManagerName = AppraisalHeaderRow.Field<string>("managername");
    Termus.TERMUSCompleteDate = AppraisalHeaderRow.Field<DateTime?>("termus_complete_date");
    Termus.TERMUSStartDate = AppraisalHeaderRow.Field<DateTime>("termus_start_date");
    Termus.Status = AppraisalHeaderRow.Field<string>("termus_status");
    Termus.TERMUSVersion = AppraisalHeaderRow.Field<string>("termus_version");
    Termus.QuestionsAndAnswers = new Dictionary<string, string>();

    var RetrieveQuestionIdsCommand = new OracleCommand("select termus_question_id from tercons.termus_questions where termus_version=:termus_version");
    RetrieveQuestionIdsCommand.BindByName = true;
    RetrieveQuestionIdsCommand.Parameters.Add("termus_version", Termus.TERMUSVersion);
    var QuestionIdsDt = Database.GetDataTable(RetrieveQuestionIdsCommand);
    var QuestionIds = QuestionIdsDt.AsEnumerable().Select(r => r.Field<string>("termus_question_id"));
    //There's about 60 questions/answers, so this should result in 60 calls to the database. It'd be a good spot to combine to a single DB call, but left it this way so I could see if async would speed it up for learning purposes.
    foreach (var QuestionId in QuestionIds)
    {
        var RetrieveAnswersCommand = new OracleCommand("select termus_response from tercons.termus_responses where termus_id=:termus_id and termus_question_id=:questionid");
        RetrieveAnswersCommand.BindByName = true;
        RetrieveAnswersCommand.Parameters.Add("termus_id", TermusId);
        RetrieveAnswersCommand.Parameters.Add("questionid", QuestionId);
        var AnswersDt = Database.GetDataTable(RetrieveAnswersCommand);
        var Answer = AnswersDt.AsEnumerable().Select(r => r.Field<string>("termus_response")).SingleOrDefault();
        if (!String.IsNullOrEmpty(Answer))
        {
            Termus.QuestionsAndAnswers.Add(QuestionId, System.Net.WebUtility.HtmlDecode(Answer));
        }
    }
    return Termus;
}

public static DataTable GetDataTable(OracleCommand command)
{
    DataTable dt = new DataTable();
    using (var connection = GetDefaultOracleConnection())
    {
        command.Connection = connection;
        connection.Open();
        dt.Load(command.ExecuteReader());
    }
    return dt;
}

public static OracleConnection GetDefaultOracleConnection()
{
    return new OracleConnection(ConfigurationManager.ConnectionStrings[connectionstringname].ConnectionString);
}

Results for 60 iterations are:

Non Async 18.4375460 seconds

Async     19.8092854 seconds

The results of this test are consistent. No matter how many iterations I go through of the for loop in AsyncPerformanceTest() action method, the async stuff runs about 1 second slower than the non-async. (I run the test multiple times in a row to account for the JITter warming up.) What am I doing wrong that's causing the async to be slower than the non-async? Am I misunderstanding something fundamental about writing asynchronous code?

like image 499
mason Avatar asked Feb 16 '15 15:02

mason


1 Answers

The asynchronous version will always be slower than the synchronous version when there is no concurrency. It's doing all of the same work as the non-async version, but with a small amount of overhead added to manage the asynchrony.

Asynchrony is advantageous, with respect to performance, by allowing improved availability. Each individual request will be slower, but if you make 1000 requests at the same time, the asynchronous implementation will be able to handle them all more quickly (at least in certain circumstances).

This happens because the asynchronous solution allows the thread that was allocated to handle the request to go back to the pool and handle other requests, whereas the synchronous solution forces the thread to sit there and do nothing while it waits for the asynchronous operation to complete. There is overhead in structuring the program in a way that allows the thread to be freed up to do other work, but the advantage is the ability of that thread to go do other work. In your program there is no other work for the thread to go do, so it ends up being a net loss.

like image 160
Servy Avatar answered Sep 18 '22 11:09

Servy