c-SNE: Emotion-based Deep Cross-modal Retrieval by using Stochastic Neighbor Embedding

Cross-modal retrieval based on subjective information aims to enable flexible media retrieval services, such as allowing users to specify, for example, a text or/and an image to search audio clips. The resulting audio clips should have an impression similar to the specified text/image. Existing methods focus on building cross-media cross-modal relationships using objective information from …