In seiner Funktionalität auf die Lehre in gestalterischen Studiengängen zugeschnitten... Schnittstelle für die moderne Lehre
In seiner Funktionalität auf die Lehre in gestalterischen Studiengängen zugeschnitten... Schnittstelle für die moderne Lehre
MARII is an intelligent collaborative interface that replaces traditional computer setups with a collaboration focused table system with gesture-tracking, voice input and an LLM that doesn't interrupt you.
NextCloud folder: https://cloud.hs-augsburg.de/s/wYsejfXJHMkc9o8
Github Repo: https://github.com/FelixBae/autonomous-maritime-fleet-control
Miro Board: https://miro.com/app/board/uXjVGprIFMM=/
MARII is designed to tackle a major headache in AI collaboration: „ping-pong prompting“ and the complex problem of agent task delegation. To test our solution, we created a maritime simulation where multiple operators sit around a table, collaborating on a mission with the help of LLMs.
Here is how the project evolved from a hardware challenge into an intelligent software solution:
Initially, we planned on using a large touchscreen table for the operators. When we found out we couldn't get our hands on one, we had to think outside the box. Our workaround? A standard screen paired with a camera system that tracks hand gestures. This switch from touch screen to camera based system enabled us with even more room for different gestures.
To make the normal screen feel interactive, we built a gesture recognition system. A static index finger held for 0.3 seconds emulates a touch click, a circular hand motion selects an object, and a quick shake deletes it.
To get the camera and screen on the same page, we borrowed an idea from robotics: ArUco markers. When the calibration key is pressed, the screen displays these markers, allowing the camera to map its position relative to the screen.
With the hardware sorted, we integrated live voice transcription so operators can speak directly to the AI. However, since our system is built for multi-person collaboration, the last thing we wanted was an AI that constantly interrupts human conversation.
To solve this, we built an intelligent orchestrator architecture. The orchestrator listens and decides exactly when the AI needs to speak out loud. If a verbal reply isn't necessary, the AI stays quiet but still executes its tasks in the background, like updating a summary right on the screen.
Which inputs do we need and what are the possible ways of giving these inputs to the system?
Our new ideas in adaption to the feedback from the last meeting with the profs
The updated BOM
This is the feedback from the profs from the group presentation.