Background: Errors and adverse events in the operating room (OR) are associated with not only poor technical performance but also deficits in nontechnical skills (NTSs). Numerous tools have been developed to assess NTS in the OR. Our aim was to conduct a systematic review of observational tools and report on their implementation and psychometric properties to guide healthcare professionals, educators, and researchers in tool selection and use. Methods: A systematic literature search (January 1, 1990–May 28, 2019) was conducted across databases (MEDLINE, Embase, CINAHL, and PsycINFO) and reference lists of included studies. Reviewers independently screened articles for inclusion, assessed study quality, and extracted data. Results: Thirty-one tools were identified across 88 studies, most commonly conducted in a real-world OR (n = 50), involving two observers (n = 50). The NTS of individuals (n = 62) were assessed more often than that of subteams (n = 21) or entire teams (n = 20). The NOn-Technical Skills for Surgeons demonstrated content validity, concurrent validity, predictive validity, and face validity across a range of studies. Oxford NOn-TECHnical Skills demonstrated content validity, concurrent validity, and predictive validity with good inter-rater reliability and test-retest validity. Conclusions: The NOn-Technical Skills for Surgeons has the strongest evidence of validity and reliability for assessing individuals, whereas the most robust tool for evaluating teams was Oxford NOn-TECHnical Skills. We recommend continued investigation of these observational tools regarding their feasibility and reproducibility of methods. Further research is needed to determine the training requirements for observers and the potential of video and audio recordings in the OR.