Performance evaluation

How well is a system doing against an ideal state (benchmark)?

See also Evaluation of ML systems

Olympic judging

performance evaluation when Gold Standard Data is absent. Committee of judges determines whether some system proposal is relevant or how close it is to desired response

Adequacy evaluation

  • Is the system fit for purpose?
  • Does it do what the user wants (within cost, time)?
  • evaluation as seen by users
    • Can I carry out some task effectively and productively?
    • Context of use is critical

Adaptability: system can be used, without modification, in new applications.

Efficiency: use of system resources

Correctness: the degree to which a system is free from faults in its specification, design and implementation

Accuracy: degree to which a system is free from error; how well it does the job

Usability: how easy it is to learn and use a system. UI/UX.

Reliability: a mean time between failures

Robustness: degree to which a system continues to function in the presence of invalid inputs

Integrity/Security: prevention of unauthorized or improper assess to a program and data

Explainability: ability to explain the decision-making process of an AI model in terms understandable to the end user.  Provide a clear and intuitive explanation of the decisions made.

Diagnostic evaluation

  • Answer questions like - are there any side effects from recent updates?
  • evaluation as seen by developers
  • internal quality of software

Portability: the extent you can modify a system to operate in a different environment

Reusability/Modularity: the extent to which you can use parts of a system in other systems

Maintainability: the ease with which you can modify a system

Flexibility: the extent you can modify a system for uses other than originally specified

Testability: the degree to which you can unit-test and system-test a system

Understandability: the ease with which you comprehend a system (design)

Readability: the ease with which you can read and understand the source code of a system

Interpretability: model transparency → e.g. an AI model is transparent in its operation and provides information about the relationships between inputs and outputs; typically comes at the cost of performance