To create the JSON format this tool expects, use this LLM command:
llm -m gemini-2.5-flash \ -a path-to-audio \ --schema-multi 'timestamp:mm:ss,speaker:best guess at name,text'