To summarize information and sentiment from videos, it’s recommended to take a text classification approach. This is a similar approach to managing audio streams.
First, generate as many features as possible from the video stream. To do so, ask questions about the stream. E.g, ‘What is the sentiment of the video?’, ‘What object is shown?’, or ‘How many people are in the image?’.
Then, generate a table of metadata based on that information and structure the information in a database.
Finally, look up the information and format it into text.
Note: We don't recommend running queries that examine all the pictures, since this is not a cost-effective process.