AURALROOTS: Cross-modal Interaction and Learning