<resource xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns="http://datacite.org/schema/kernel-4" xsi:schemaLocation="http://datacite.org/schema/kernel-4 http://schema.datacite.org/meta/kernel-4.1/metadata.xsd"><identifier identifierType="DOI">10.7910/DVN/GM8T8Q</identifier><creators><creator><creatorName nameType="Personal">zhao, zhilong</creatorName><givenName>zhilong</givenName><familyName>zhao</familyName><affiliation>https://ror.org/0530pts50</affiliation></creator></creators><titles><title>Replication Data for: Automated Quality Assessment for LLM-Based Complex Qualitative Coding: A Confidence-Diversity Framework</title></titles><publisher>Harvard Dataverse</publisher><publicationYear>2025</publicationYear><subjects><subject>Computer and Information Science</subject><subject>Social Sciences</subject></subjects><contributors><contributor contributorType="ContactPerson"><contributorName nameType="Organizational">zhao, zhilong</contributorName></contributor></contributors><dates><date dateType="Submitted">2025-08-26</date><date dateType="Updated">2025-08-28</date></dates><resourceType resourceTypeGeneral="Dataset"/><sizes><size>124689</size></sizes><formats><format>application/zip</format></formats><version>1.1</version><rightsList><rights rightsURI="info:eu-repo/semantics/openAccess"/><rights rightsURI="http://creativecommons.org/publicdomain/zero/1.0">CC0 1.0</rights></rightsList><descriptions><description descriptionType="Abstract">This replication package contains all code and data necessary to reproduce the results presented in "Cross-Domain Quality Assessment for Complex Qualitative Analysis: Validating Confidence-Entropy Signals Across Legal, Political, and Medical Tasks".

Research Context: This study extends beyond accessible coding tasks to validate automated quality assessment for complex qualitative analysis requiring domain expertise and interpretive judgment across legal, political, and medical domains.

Package Contents:
- Core Scripts: reproduce_all_results.py (main reproduction script), generate_synthetic_data.py (data generator), validate_reproduction.py (result validation)
- Data Files: Synthetic datasets matching paper statistics for SCOTUS legal reasoning (390 cases), Hyperpartisan political analysis (644 cases), and MTSamples medical classification (1,000 cases)
- Expected Outputs: All LaTeX tables (Table 1-5), validation reports, and cross-domain statistical analyses

Key Findings Reproduced:
- Cross-domain signal effectiveness (Table 1): Perfect correlation reproduction across all domains (±0.005 accuracy)
- Dual-signal weight optimization (Table 2): 6.6-113.7% improvements over single-signal baselines
- Cross-domain transferability (Table 3): 88.9% success rate for weight transfer across domains
- Intelligent triage efficiency (Table 5): 45.4% vs 44.6% effort reduction (0.8% difference)
- Domain-specific patterns: Confidence signals are stronger in legal contexts, and entropy signals are more reliable in political/medical domains

Validation Status: Successfully reproduces all core findings with statistical significance maintained across complex analytical tasks. Demonstrates automated quality assessment viability for scaling complex qualitative research beyond accessible coding tasks.

Usage: Run ./run_complete_reproduction.sh for complete reproduction, or python3 reproduce_all_results.py for individual table generation. All dependencies included.</description></descriptions><geoLocations/></resource>