Abstract: Today multimedia content comprising both text and images is growing at a rapid pace. There has been a body of work to summarize text content, but to the best of our knowledge, no method has been developed to summarize multimedia content. We propose two methods for summarizing multimedia content. Our novel approach explicitly recognizes two desirable, normative characteristics of a summary - good coverage and diversity of the respective text and images, and that text and images should be coherent with each other. Two methods are examined - graph based and a modification to the submodular approach. Moreover, we propose a metric to measure the quality of a multimedia summary which captures coverage and diversity of text and images as well as coherence between the text and images in the summary. We experimentally demonstrate that the proposed methods achieve good quality multimedia summaries.