- labelled data for Machine Learning tasks
- time and resource intensive/costly to produce
- might need experts in the domain
- multiple experts to label a data instance for reliability and minimize bias
- labelling or annotation guidelines for labelling - to make them consistent
- even with guidelines, experts can label the same thing differently - personal bias. We can use statistical measures such as Inter-rater Reliability to judge the quality of such a gold standard data.
Alternative
since it is expensive, other methods can be used to acquire less than perfect data