I want to know the relation between training data set, testing data set, and gound truth. I know the meaning of each one separately but I cannot see the relation between them especially ground truth and training data.
Your training data is what you train your classifier on.
You then test the accuracy of your model on your test set.
Ground truth refers to the label for each training sample you have i.e. you know which category/outcome each training sample belongs to
Suppose we need to train a machine to classify apples from oranges. The machine-learning way is to "show" the machine some examples of oranges and apples (training set),based on which it identifies the rest as either oranges or apples (restrict yourself to apples and oranges only!). Now, the ground-truth is the labels you adjudged as apples and oranges(in the training set).
Ground Truth is factual data that has been observed or measured, and can be analyzed objectively. It has not been inferred. If the data is based on an assumption, subject to opinion, or up for discussion, then, by definition, that is not Ground Truth data.
Your ability to solve a problem using data science depends tremendously on how you frame the problem and discerning without ambiguity, if you can establish Ground Truth. more information is detailed here The Importance of Ground Truth in Data Science
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With