Performance evaluation
How well is a system doing against an ideal state (benchmark)?
- often use a benchmark data set (Gold Standard Data)
See also Evaluation of ML systems
Olympic judging
performance evaluation when Gold Standard Data is absent. Committee of judges determines whether some system proposal is relevant or how close it is to desired response
Adequacy evaluation
- Is the system fit for purpose?
- Does it do what the user wants (within cost, time)?
- evaluation as seen by users
- Can I carry out some task effectively and productively?
- Context of use is critical
Adaptability: system can be used, without modification, in new applications.
Efficiency: use of system resources
Correctness: the degree to which a system is free from faults in its specification, design and implementation
Accuracy: degree to which a system is free from error; how well it does the job
Usability: how easy it is to learn and use a system. UI/UX.
Reliability: a mean time between failures
Robustness: degree to which a system continues to function in the presence of invalid inputs
Integrity/Security: prevention of unauthorized or improper assess to a program and data
Explainability: ability to explain the decision-making process of an AI model in terms understandable to the end user. Provide a clear and intuitive explanation of the decisions made.
Diagnostic evaluation
- Answer questions like - are there any side effects from recent updates?
- evaluation as seen by developers
- internal quality of software
Portability: the extent you can modify a system to operate in a different environment
Reusability/Modularity: the extent to which you can use parts of a system in other systems
Maintainability: the ease with which you can modify a system
Flexibility: the extent you can modify a system for uses other than originally specified
Testability: the degree to which you can unit-test and system-test a system
Understandability: the ease with which you comprehend a system (design)
Readability: the ease with which you can read and understand the source code of a system
Interpretability: model transparency → e.g. an AI model is transparent in its operation and provides information about the relationships between inputs and outputs; typically comes at the cost of performance