Refining Physiology of Phonetics : Research Direction

We’ve narrowed the focus of our research to tracking the biomechanical properties of the voice from general speech to the singing process.

We’re aiming to track the following three facets:

1) Facial Muscles
Sensor 1: Geometric 3D, realtime, video tracking (Kinect).
Sensor 2: Topical sensors. [Currently researching multiple topical sensor options; tilt switches, flex sensors, membrane potentiometer, etc. are on the table]

2) Air Flow and Breathing
Sensor 1: Barometric pressure sensor.
[Partially inspired by the following podcast on the human adaptation of speech control mechanisms, http://www.npr.org/templates/story/story.php?storyId=129083762 we’ve realized the importance of tracking air flow. A barometric sensor may not by the ideal air flow tracker but we are looking into alternates.]

3) Frequency and Pitch
Sensor 1: Direct-to-computer microphone.
Sensor 2: Sinusoidal partial editing analysis and resynthesis. [http://www.klingbeil.com/spear/]
Sensor 2 [ALT]: Simple tuner program.

Data To Be Gathered:
The airflow and facial positions of different frequencies. The goal is to associate particular pitches with particular facial muscle orientations. Airflow data will likely be used as a data type in the application process.

Final Application:
A musical interface that can reverse the data flow. Exact form and function of the interface will be informed by the research and testing findings.

Monitoring the muscular, structural and vocal cord movements and positions

A project by Michelle Cortese and Anne-Marie Lavigne

[TEMPORARY THESIS]

The voice is an interdisciplinary mechanism. A key piece of the human body whose study involves everything from semantics to biomechanics to Roman war history (origins of phonetic text) to acoustics. Many relevant fields have done their part to autonomously engage with the intricacies of the voice but its not often that a research project can weave the study of musculature, acoustics and the Gabor algorithm together with equal focus. We’re looking to combine physical and visual face mapping with the science and art of the voice. We will be monitoring the muscular, structural and vocal cord movements and positions of professional vocalists.

The Mechanism of the Human Voice

When The Mechanism of the Human Voice was written, in 1940, it can be surely said that the book was truly one of a kind. “Voice and speech are of such importance to mankind that one might well envisage an enormous literature dealing with the subject in all its complex details. Such a literature does exist, but the numerous writings are scattered throughout the pages of divers journals; indeed, the science and art of vocalization is so many-sided that few persons possess the knowledge necessary for a complete understanding of its many problems.” (Curry, Forward). Despite the book’s near-ancient status in the world of science, Curry raises an incredibly relevant and thought-provoking hypothesis: the science of the voice is an interdisciplinary study, a pursuit that encompasses everything from phonetics to musculature to semantics to acoustics to mechanics and beyond.

The Mechanism of the Human Voice breaks the voice down into anatomy of the vocal organs, general vocal acoustics, physiology of phonation and the mechanisms of speech. Additional, smaller chapters include explorations into hearing, singing, vocal disorder, personality and experimental study. Identifying the vocal cords as the primary component of frequency vibration of the voice in nearly every chapter, Curry still takes the time to provide a full anatomical explanation of the voice (face, neck, head, chest, etc.), how each element interacts as well as an exploration into frequency relationships and natural human ranges.

Curry’s most innovative offering lies in the Physiology of Phonation chapter: a truly interdisciplinary look at the mechanism of speech. This chapter introduces the idea that speech is both proprioceptive and extero-ceptive, meaning that it is product by internal neural sensations but also mediated by external factors. External factors like listening inform much of our vocal discussions, hence why perfecting intonation can be quite difficult for deaf speakers attempting to learn audible speech. Continuing on with the interdisciplinary nature of speech study, phonation duration is controlled by laryngeal muscular action and the state of the glottis in the expiratory breath. Curry argues that one can look to respiratory action in expiration to better understand the speech process. Vocal pitch is divided into the low-pitch range, the ‘covered’ voice, falsetto and the whistle voice (whistle of the vocal cords, not the lips); pitches are a product of the vocal cords, laryngeal muscles and glottis.
Curry, Robert Oswald Leonard. The Mechanism of the Human Voice. New York, Toronto: Longmans, Green & co., 1940.

Speech and Voice Science

Beginning with the physics of sound, extensively covering breathing, phonation, resonance, articulation (consonants VS vowels) and ending with research topics in production and perception; Behrman is beyond comprehensive. This book is less the pushy interdisciplinary effort of Curry and more of an eighth grade science text book’s take on the science of speech—simple, comprehensive and occupied by dozens upon dozens of diagrams illustrated by Maury Aaseng.

The beauty of Behrman’s simple diagram-laden take on the mechanics of speech is that she offers plenty of suggestions for continued scientific pursuit, particularly suggested instrumentation for monitoring every factor of speech she lists. Particularly notable is the entire chapter devoted to measurement and instrumentation of phonetics. “Measurement of [frequency] and intensity for clinical and research purposes can be divided generally into three categories of measurement; levels of habitual use of the voice, levels of maximum performance, and degree of regularity.” (Behrman, 181).  Relevant instrumentations, to the visual, audible and physical facial tracking we will be tackling, include: microphone frequency tests, being aware of jitter, airflow measurement and intramural pressure, various types of vocal cord monitoring devices (stroboscopy, laryngeal imaging, etc.), and awareness of the four recognized vocal registers (noted above).

Behrman, Alison. Speech and Voice Science. San Diego: Plural Publishing, 2007.

 

3-D Facial Tracking

Facial recognition systems are “an important modality of modern human computer interaction”#. Most of the current research is done either for medical and diagnostic purposes, for artistic creation or animation production. We have identified three different approaches:

  • Top-bottom or Analytic approaches: They use a combination of points related to the main features or organs of the face like the mouth, the eyes, the nose, eyebrows, etc. They are the fiducial points#. Those points are connected together and the distance and angle between those points are used to create the facial recognition image.
  • Bottom-up: They use parts of the facial organs combined with to position of the organs. The tracking is then optimized further.
  • Holistic approaches: Instead of using points or organs, they use the whole face to produce an image. In those cases, “the normalization on face size and rotation is a really important pre-processing to make the recognition robust”#.

The main issue faced by all of those approaches is that face recognition depends on the appearance of the points on the projective surface. The 3-D modeling of the face thus varies with the pose, the illumination and the expression of the subject. This is what researchers call the PIE problem.

Here is a review of 3 facial tracking systems. Two in medical research and one programmed by an artist.

Medical research
Facial recognition for medical purposes is important as it provides information on the states and on the physical condition of the patients. They can therefore enhance pain recognition. We have found two interesting articles stating the results of two different approaches addressing the PIE problem: a multi-camera system and a combination of 2-D and 3-D imagery.

A New Multi-Camera Based Facial Expression Analysis Concept by Niese, R., Al-Hamadi, A. and Michaelis, B. in Campilho, A. and Kamel, (Eds), ICIAR 2012, Part II, LNCS 7325, pp. 64-71

To avoid dealing with complex lighting methods, the model proposed here automatically adapts a generic model to the current face under observation. They combine live capture imagery of the subject with a mesh model that has been developed from a series of stereoscopic scans of several subjects. For every subject, the mesh model is adapted live to the person’s face features. The adaptation process requires a frontal image using information on the eyes, the lips and nose positions. Those are then aligned with the X-axis of the mesh modeling:

After the frontal image has been paired with the mesh model, a cross correlation in done to find the points on the images captured by the two other camer

This model reaches an average classification rate of 81.5%.

Pros and Cons

  • This model does not not represent facial expression specific 3-D shape details but the general face form based on the positions of four features of the face.
  • The projective properties of the image capturing device must be taken into account properly. The camera model parameters are gained in a calibration step that has to be very precise to avoid distortions.
  • The results have shown a deviation 7 degrees for the rotations and 8cm for the translations of the head.

 

Combined Online and Offline information for Tracking Facial Feature Points by Wang, X., Zhang, Y. and Chunlei, C., in C.-Y. Su, s. Rakheja, H. and H. Liu (Eds), ICIRA 2012, Part I, LNAI 7506, pp.196-206.

This approach combines offline informations on the movement constraints in 3-D space with an online frame-to-frame (25 fps) imagery created using the Gabor wavelet algorythm. Both the offline and online methods are integrated with a bundle adjustment method. The tracking process is made of three steps:

  • defining the facial points and construct the initial keyframe;
  • estimate the current frame feature points and set the previous frames feature points;
  • get the current frame’s feature points optimized by the integration tracking method.

They use 14 points obvious of the human face to locate the corresponding points produced by the algorithm. With the frame-by-frame method, they can predict feature points on the following frames.

They then take a 30 pixels x 30 pixels image of the area around every point and then then transform the pixels to get the image. They do so because “only using spatial and temporal continuity information between successive frames to track often leads to error accumulation and gradually causes the drift”#.

Pros and cons

  • The system avoids the jitter and drift phenomenon.
  • The application of the Gabor wavelet algorithm is very complex and require the usage of softwares we do not have access to.

 

Artistic creation

As soon as the kinect camera was launched by Xbox in 2010, digital artists started to use its depth lens to build interactive interfaces. Face OSC, a facial tracking system have been developed by the artist Kyle MacDonald based on the work of Jason Saragih. It is an add on in OpenFrameworks. Here is a video of MacDonald explaining the original code of Saragih.

Here is an interview with Kyle MacDonald explaining the algorithm of Face OSC.

FaceOSC is based on a deformable model fitting technique, taking form of the face and then pushing it until it fits a target (a photo or a camera feed using landmarks). the algorithm uses points in the face the create areas that will be then deformed to fit the model.

FaceOSC can be easily connected to digital audio interfaces.

Pros and Cons

  • FaceOSC is application that can be paired with Processing, MAX/MSP and Ableton Live.
  • Codes of templates are available online.
  • The application does not track the details of the face, but the face in general.

 

 

 

Response to Biomechanical energy Harvesting

For almost three years, I lived and worked in Southern Africa (South Africa and Mozambique). As most of the occidentals who discover the continent, one of the first thing you notice is that most people walk very slowly. For the first months of my stay I was surprised by this fact, that unfortunately led to a lot of prejudice and judgements. In Northern American or European cities, people walk fast. Because they are busy, because they want to make to most of their day, etc. So when you go to Africa for the first time, you think “Well, people here are not in a hurry of doing anything”. Which obviously is not true. People in Africa do not walk slowly because they have nothing to do or because they are lazy. They walk slowly because they are smart. They know that under a 42 Celsius sun, walking fast is a bad idea. Because short after you started, you are sweating, your clothes are wet and therefor do not look clean, and most importantly, you are thirsty. Walking fast in Africa means that you lose energy faster. If you slower your pace, you do not sweat as much and you can walk longer. You go through your day and have done tons of things without feeling exhausted as much.

This article was really interesting as it uses and explains the concept of COH, cost of harvesting:

COH = difference between harvesting mode metabolic power – non-harvesting mode metabolic power/difference between harvesting mode electrical power – non harvesting mode electrical power

COH = 1/device efficiency*muscle efficiency

When designing a wearable device, or even any mechanical device that would harvest energy, it is very important to measure the COH. We do have to measure the efficiency of the device in various modes and then compare them. This comparison allows us to understand the efficiency of the whole structure. A device that is producing a lot of energy but demands a lot of efforts is not worth it. As the energy needed to activate it cancels the energy produced.

The researchers of this study had the great idea to deconstruct a motion into various parts and measured the COH for every one of them. The results showed that it is more efficient to activate the power harvest only in one of the walking phase. Using the whole walking cycle would not be efficient as it would demands to much power to the walker to produce energy. Having identified this, the researches will rework the design of the wearable.

This study is really important in the creation process because it will allow them to create a wearable that is lighter and can be used for longer by the user.

 

Projects – Ideas

Controlling music with the body

Ane interface that uses Processing, the Kinect and motion tracking. The dancers control music with their movements.

http://goo.gl/YLA6J

I am very interested in using the depth camera to create an interface that allows theo body movements to control either the light ambiance or the sound.

Imagine if this dancer could create the music with is movements.

 

Making a robotic arm

I would be interested in building a robotic arm using servos. Something like this.

Response to “The effect of a new syringe design on the ability of rheumatoid arthritis patients to inject a biological medication”

The article displays the process of user testing a new product. The N syringes are being analyzed by rheumatoid arthritis patients and compared to the most common model already in use. The tone of the article made it clear to me it was displaying the last step of the testing process. The article does not explain on which basis the design was conceived but rather uses an approach that aims to validate a new design. The result of the article are also restrained within the study limitations of examining only one position of the patients who were not injecting the medication to themselves but to a skin pad. The fact that one of the authors is the owner of the design company which produced the N syringes should be stated at the end of the article in the “Conflict of interest” section. The tone of the article is academic but it seems that its goal is to validate a design produced by the company of one of the authors. The method used is limited in its scope and sounds more like a company product testing process than an academic research.

Nonetheless, the article gives us an insight on how biomechanics can be used and should be used to conceive and test a product. Ergonomics is the key to create efficient physical interfaces, products and any interactive tool. I have always tough I would use sensors to track or activate interactive interfaces, but this article made me realized I can use them to help me design an interactive object. The are very useful to measure the force and the grip of users.

Also, the article shows us how important it is to organize a well conducted user testing phase. For any interactive device that would be displayed in a public space, a variety of participants should be involved to make sure the interactive devices can be used by everyone. Also, it is important to analyze an interactive interface using a biomechanics approach as it can point out within which physical limitation the users can use it.

 

Dr. William A Sands and the Biomechanics of Gymnastics

For my Biomechanics class, I had to find and select a biomechanics expert and report on his work. Biomechanics? Motions of the body? Gymnastics. For more than 9 years – that is to say most of my childhood – I trained as a gymnast an average of 15 hours per week. By the end of my “career”, I was far from reaching the Olympic level, but could proud myself on winning the provincial competition, doing the same type of skills performed by my idol of the time, Nadia Comaneci. This assignment brought me for the first time to a scientific analysis of movements and skills I performed as a kid, not understanding how complex there were. I discovered a whole field of research I never thought I applied myself.

Dr. Sands is the author of many publications on biomechanics of gymnastics. He contributed to the building of a biomechanics semantics specific to this sport of high physical performance with the merit of the athletes based on the subjective perceptions of judges. Sands and his research team have used biomechanics to build an objective way to score the performances of the athletes. They used video digitizing, infrared timers, and a computer algorithm to build a scientific understanding of the skills performed.

Sands finds that the science of analyzing gymnastic with biomechanics is hard due to the rapid evolution of the sport: “Due to the constant progress of gymnastics skills and the skill specificity of biomechanical analyses, any biomechanical summary will be constrained by the timing and contemporary state of gymnastics performance” (1).  The apparatus characteristics  constantly change and bring a variety of skills. It is a challenge to apply the knowledge of the science to a discipline using tools that are external to the body – unlike diving for instance. Dr Sands studies the impacts of the technical developments of the apparatus on the gymnasts performances and bodies. Here is a video showing the analyze of a certain type of spring floor.

His researches show that on the vault per example, that contrarily as we might think, the velocity and score of a gymnast performance is not so much based on the speed of the run-up towards the vault. It depends more on the transition she makes in between the run-up and the take-off board doing a really simple move, the hurdle. It is in the ability of the gymnast to control the slowing down of her speed rather that the speed itself. You can see here the vault performances of the London 2012 Olympics. Here is the schematic of a yurchenko.

Screen Shot 2013-02-11 at 20.21.13

My understanding of Sands work is that his analyses of the gymnastics skills have helped coaches and athletes perfecting their approaches. He has contributed a lot to the understanding of how the speed and motion of every skills can be deconstructed in small parts and thus improved by focusing on the specific gesture or timing that will push to gymnast towards perfection, and, very important to him, less injuries. Paradoxically, biomechanics of gymnastics is a complex field because of the sport constant improvements, that are mainly die to the application of biomechanics to the sport.

Here is a video (very old school) demonstrating some basic analyses of skills using a biomechanical approach.

(1) W. A. Sands in W. A. Sands, D. J. Caine, J. Borms : Scientific Aspects of Women’s Gymnastics, Medecine and Sport Science, Vol. 45 page 6.