instantVideo editor
ins video is a markdown based video editor. It works by transcribing your video to a markdown file, which you can edit using any text editor, and then render the edited markdown back to a video.
You can cut out parts of the video, add title cards, text overlays and more using simple markdown syntax.
Quick start
Requirements
- ins
- uv
- ffmpeg
- chromium
Setup
ins video setup This should download the required data. This can be quite a lot of data as WhisperX is used for transcription.
Edit videos
Assuming the video file you want to edit is called input.mp4
ins video convert input.mp4This produces input.video.md file which contains a transcript of the video You can now edit the input.video.md file using any text editor.
ins video render input.video.mdConverts the markdown file back into a video
Editing
Here is an example of how a markdown file generated from a video looks like:
---
video:
hash: '4b59412a676f5994b916dd983e3315cb8ebfed8f99cf1b609310d6dda39db3d9'
name: 'distro.mkv'
source: '/home/benjamin/Videos/distrotest/distro.mkv'
transcript:
source: './insvideodata/4b59412a676f5994b916dd983e3315cb8ebfed8f99cf1b609310d6dda39db3d9.srt'
generated_at: '2025-12-21T00:37:04.161349597+00:00'
---
`00:00.0-00:03.4` quick brown fox jumps over the lazy dog consectetur adipiscing elit
`00:03.4-00:13.8` elit sed do eiusmod tempor incididunt ut labore et dolore magna aliqua
`00:14.6-00:19.3` ut enim ad minim veniam quis nostrud exercitation ullamco laboris
`00:24.9-00:26.2` nisi ut aliquip ex ea commodo consequat duis aute irure dolor
`00:26.4-00:27.0` in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur
`00:27.2-00:28.7` excepteur sint occaecat cupidatat non proident sunt in culpa qui officia deserunt
`00:28.9-00:30.1` mollit anim id est laborum sed ut perspiciatis unde omnis iste natus
...Cuts
Each line represents a segment of the video with timestamps and the transcribed text. You can comment out or delete lines to cut those segments from the video. You can also rearrange lines. You can not edit the text itself, edits will be ignored, and you cannot cut within a line.
Slides
You can put arbitrary markdown between the transcript lines to create slides. This can contain text, images, code blocks, latex and more.
---
video:
hash: '4b59412a676f5994b916dd983e3315cb8ebfed8f99cf1b609310d6dda39db3d9'
name: 'distro.mkv'
source: '/home/benjamin/Videos/distrotest/distro.mkv'
transcript:
source: './insvideodata/4b59412a676f5994b916dd983e3315cb8ebfed8f99cf1b609310d6dda39db3d9.srt'
generated_at: '2025-12-21T00:37:04.161349597+00:00'
---
`00:00.0-00:03.4` quick brown fox jumps over the lazy dog consectetur adipiscing elit
`00:03.4-00:13.8` elit sed do eiusmod tempor incididunt ut labore et dolore magna aliqua
Hi, this is the first slide
`00:14.6-00:19.3` ut enim ad minim veniam quis nostrud exercitation ullamco laboris
`00:24.9-00:26.2` nisi ut aliquip ex ea commodo consequat duis aute irure dolor
`00:26.4-00:27.0` in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur
`00:27.2-00:28.7` excepteur sint occaecat cupidatat non proident sunt in culpa qui officia deserunt
Hi, this is the second slide
`00:28.9-00:30.1` mollit anim id est laborum sed ut perspiciatis unde omnis iste natus
`00:30.6-00:33.2` error sit voluptatem accusantium doloremque laudantium totam rem aperiam
`00:33.6-00:35.0` eaque ipsa quae ab illo inventore veritatis et quasi architecto beatae vitae
---
`00:35.3-00:37.4` dicta sunt explicabo nemo enim ipsam voluptatem quia voluptas sit aspernatur
`00:38.0-00:39.9` aut odit aut fugit sed quia consequuntur magni dolores eos qui ratione
`00:40.3-00:47.0` voluptatem sequi nesciunt neque porro quisquam est qui dolorem ipsum quia dolor sit ametSlides will start being shown during the segment immediately before their content. They will stay on screen until the next slide or until a separator line --- is reached.
You can also pause the video entirely by placing slides between two separator lines. Multiple slides can be shown while the video is paused by placing separator lines between them. The amount of time a slide is shown is calculated based on the number of words it contains.
---
video:
hash: '4b59412a676f5994b916dd983e3315cb8ebfed8f99cf1b609310d6dda39db3d9'
name: 'distro.mkv'
source: '/home/benjamin/Videos/distrotest/distro.mkv'
transcript:
source: './insvideodata/4b59412a676f5994b916dd983e3315cb8ebfed8f99cf1b609310d6dda39db3d9.srt'
generated_at: '2025-12-21T00:37:04.161349597+00:00'
---
`00:00.0-00:03.4` quick brown fox jumps over the lazy dog consectetur adipiscing elit
`00:03.4-00:13.8` elit sed do eiusmod tempor incididunt ut labore et dolore magna aliqua
---
Hi, this is the first slide, source the video is paused
---
Hi, this is the second slide, after this the video will continue
---
`00:14.6-00:19.3` ut enim ad minim veniam quis nostrud exercitation ullamco laboris
`00:24.9-00:26.2` nisi ut aliquip ex ea commodo consequat duis aute irure dolor
`00:26.4-00:27.0` in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur
`00:27.2-00:28.7` excepteur sint occaecat cupidatat non proident sunt in culpa qui officia deseruntSlides will be styled differently depending on their content. If they just contain a single header (h1, h2 ...) they will act as title cards. If they contain lots of content, they wil not be center aligned and the layout is adjusted to make the larger amount of content more readable.
Background music
---
video:
hash: '4b59412a676f5994b916dd983e3315cb8ebfed8f99cf1b609310d6dda39db3d9'
name: 'distro.mkv'
source: '/home/benjamin/Videos/distrotest/distro.mkv'
transcript:
source: './insvideodata/4b59412a676f5994b916dd983e3315cb8ebfed8f99cf1b609310d6dda39db3d9.srt'
generated_at: '2025-12-21T00:37:04.161349597+00:00'
---
`00:00.0-00:03.4` quick brown fox jumps over the lazy dog consectetur adipiscing elit
`00:03.4-00:13.8` elit sed do eiusmod tempor incididunt ut labore et dolore magna aliqua
```music
https://www.youtube.com/watch?v=example
```
`00:14.6-00:19.3` ut enim ad minim veniam quis nostrud exercitation ullamco laboris
`00:24.9-00:26.2` nisi ut aliquip ex ea commodo consequat duis aute irure dolor
`00:26.4-00:27.0` in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur
`00:27.2-00:28.7` excepteur sint occaecat cupidatat non proident sunt in culpa qui officia deseruntYou can put a file or music url in a code block with the language set to music to set the music for the video. The music will be looped if it is shorter than the video, and will play until the next music block or the end of the video. Setting music to none or an empty code block will disable music from that point.
Limitations
You can not edit the content of the words being said. The only AI being used is for speech detection (and optionally auphonic for automatic EQ and background noise removal), meaning this can not insert words into the video which were never said.
I sincerely believe there is no value in AI image, video and audio generation outside of moodboards, so this will not be supported anytime soon.
As of now, only editing a single video file is supported.
Current notes
As of now (Tue Oct 7 03:28:14 PM CEST 2025) there is an issue with WhisperX which needs a workaround for it to run on Arch Linux. Locate the libtranslate 4.4.so. It's in the uv cache and might not be called exactly that. Once you found it, run the patchelf --clear-execstack command on it.
patchelf --clear-execstack ~/.cache/uv/archive-v0/NjcJvBGqk4Zf5Cd4pRw0E/ctranslate2.libs/libctranslate2-d3638643.so.4.4.0