Recently a client asked us to tackle an unusual request. Many of the clients are Mixtec, or Mixteco people. The Mixtecs (/ˈmiːstɛks, ˈmiːʃtɛks/)1, or Mixtecos, are indigenous Mesoamerican peoples of Mexico inhabiting the region known as La Mixteca of Oaxaca and Puebla as well as the state of Guerrero's Región Montañas, and Región Costa Chica, which covers parts of the Mexican states of Oaxaca, Guerrero and Puebla. The name Mixtec is a Nahuatl word meaning 'cloud' or 'inhabitant of place of', or more simply “Cloud People”.
The difficulty was that their language has no written component. So to get a performance “scripted” was going to be a real challenge (not to mention finding a voice talent that spoke the language. The client was looking to implement a simple four question survey, so we had to figure a way to not only capture a performance in an uncommon language with no written equivalence, but we also had to communicate via a touchscreen without using only iconography. In the age of emojis, the on-screen communication was relatively simple, and we were able to capture response using traditional color-coded smiley (not smiley) faces that the user would simple touch to respond. Well, that was one problem solved. Now for the tricky part…
In a typical production pipeline for the PRSONAS character line, we write a script, get approvals on everything and then set up a recording session to capture the voice over and, more importantly, capture the mouth a facial movement of the artist via our motion capture package. The conundrum lie in the fact that we 1) did not have access to a native speaker of the language within our region of the world and 2) the area where our talent resides did not have access to a high quality studio in which to perform the facial capture. So how did we solve this catch-22?
Turns out we had to do a little digging. We finally settled on a software package called “Lip Sync Pro” which was intended to provide very basic lip sync animation for video games. So, we hired a video crew, sent them to the talent and recorded that person with 4K video, while the voice over was being performed. With video in hand we set up a special rigged character and attached the new animations generated by Lip Sync Pro and….. crap, we got “results” but not the ones we wanted it. Turns out that our demands for facial animation were a bit beyond what LSP was initially designed to do. Thankfully, our team of artists took it all in stride and were able to solve the problem. LSP is flexible enough that we were able to create our own mouth shapes for each of the phonemes2 that were used.
That worked well, but we still were not quite to the quality level we were looking for, in regard to the animation. Fortunately, that was fairly easy to solve. A little judicious use of the animation timeline and the animations were spot on and generated directly from the audio tracks captured in the video recording.
With all of the animations in hand and a practical way to capture responses we prepared the database and other required back end software and deployed the unit. At last report the client and the users are completely thrilled and, I believe, impressed that we were able to put together a simple yet elegant system to conversation and capture information in a very unusual set of circumstances.
1. "Mixtec". Oxford English Dictionary (3rd ed.). Oxford University Press. September 2005 (3rd ed.). Oxford University Press. September 2005
2. pho·neme /ˈfōnēm/ any of the perceptually distinct units of sound in a specified language that distinguish one word from another, for example p, b, d, and t in the English words pad, pat, bad, and bat.