ENHANCEMENT OF EAR RECOGNITTION USING AN INTEGRATED METHOD OF ICP AND SCM TECHNIQUES

Chapter 2: Literature Review
Identification using biometric methods is becoming popular particularly in the medical field and forensics. With the aid of computers, biometrics is seen to become one of the most efficient ways of identifying individuals in the future. Accordingly, with aid of computers, different complex algorithms can be performed almost infinitesimally to automate the identification processes. There are two factors that are necessary to create a biometric method, namely: first is the subject that is to be identified, which should be static on most of its essential characteristics that are necessary for the identification process and second is a set of algorithms that will assess, compare, and match patterns or distinct characteristics of the subject being identified with stored data on databases. An example of a subject that can be used in biometrics is the human ear and possible algorithms that can be used include the Iterative Closest Point (ICP) algorithm and the Stochastic Clustering Method (SCM). This chapter will discuss in detail the related literature that uses these techniques including the rationale of using the human ear as the subject for identification.
2.1 The human ear
The human ear can be used in biometric recognition methods. There are two major reasons why the human ear is a feasible subject for identification, namely: it has an almost static size and static structure throughout an individual`s lifespan. This means that each individual will have a different ear structures and structure almost remains constant throughout his or her lifespan. Note that the use of the human ears for identification and recognition purposes can be btraced back to as early 1890s by the French criminologists Bertilon (1890). It was then refined by Iannarelli who was able to identify seven ear features from the outside of the ear that are distinct for each individual (Iannarelli, 1989).Ear recognition also allows the non-contact method of identification unlike finger printing, face recognition, or eye recognition. Nevertheless, Victor, Bowyer & Sarker (2002) noted that the accuracy of using face recognition is much superior to ear recognition. However, their research result may have been erroneous as they have assumed that the left ear is similar to the right ear as long as both ears belong to the same person. Note that, in their experiment, they have used the data for the left ears to be stored on data bases while the data from the right ears were used for querying. More current studies prove otherwise such as the research conducted by Burger et al (2012).
.In other words, ears can be used for non-invasive identification or recognition. Ear identification usually includes taking photos of the ear and the ear in different angels and lighting conditions. Some process also requires the use of different cameras so as to standardize the identification process. There are, however certain factors that should be held constant during the picture taking process in order to have higher rates of successful identification. One of these factors is: that there should be no obstructions in the ear such as hats, hair, or other objects should not cloud over them. Any background clatter can result to unsuccessful identification. Nevertheless, software engineers and programmers are developing various ways to allow high rates of successful identifications despite obstacles clouding over the ear surface. At present there different initiatives conducted by different software engineers and computer programmers to make the ear identification more robust so that the identification process will become more efficient despite possible occurrence of obstacles during the image taking of the ear. A present, studies have shown that neural networks can be used to create faster and more efficient ear identification methods (Bustar & Nixon, 2010).
Dinkar & Sambyal (2012) have studied the uniqueness of the ear features of 400 different individuals. Results from their study showed that, indeed, ears can be used to recognize or identify people with high levels of accuracy. In their experiment they have used two methods to determine the uniqueness of ear patterns for different individuals. The methods used were: Pattern Recognition by Neural Network and Weighted Scoring System. Using these methods they have shown that there are ten (10) external ear features that are unique for each individual. Moreover, these ear features can be further expanded into thirty seven (37) sub-features. Note that these identified features were converted into numeric scores with the use of the Weighted Scoring System. Note that they have also showed on their studies that there are 80 ear features that can be found similar to different individuals. They have converted these similarities into their numerical counterparts. Using the pattern recognition by neural network software which they have created they have determined the locations of the distinct and similar ear patterns from the center of the ear for each individual. They then evaluated the average of these distances for each individual and they found out that no two individual have the same averages. They then concluded that indeed, ear features can be used to recognize and identify individuals. Note that they determined an accuracy value of 99.9999% in their measurements, making their findings highly reliable. Nevertheless, it should be noted that Dinkar and Sambyal (2012) did not studied the effects of facial expressions or muscle movements near the area during their experiment. They also, did not study the effects of possible sources of errors in practical application of ear recognition, such as, when there are objects that could over some parts of the ear.
2.2 Limiting factors for the accuracy of ear recognition
As pointed out earlier, there are several factors that limit the accuracy of an ear recognition method. Bustard & Nixon identified 5 factors, namely: background, occlusion, lighting, pose, and camera. Background pertains to the difficulty of identifying a significant portion of the ear due to possible cluttering of objects near the ear area. Occlusion pertains to the difficulty of locating the ear due to obscured area caused by hair, earrings or hats. Lighting refers to the variation of the intensities and frequencies of light irradiating the ear. Note that this factor is a common source of deviation between experimental results and actual application. Note that light intensities and frequency may vary from place to place and a single ear may be exposed to multiple frequencies of light. Pose or angle of the image refers to angle at which the ear is viewed. Note that in order to make an extensive comparison between queried values and data-base values, enough ear patterns should be identified. The number of ear patterns which can be identified is dependent on how much of the ear is captured in the image which in turn is highly dependent on the angle the image is taken. The ideal angle is so that the entire outer ear or the pinna is seen in it wholeness. The fifth factor is the quality of the camera used. The higher the resolution of the camera the more accurate the recognition will be and vice versa. Note that not only the resolution is of concern when it comes to selecting a camera to take the ear image. The experimented should also consider its field of view, color sensitivity, sensing resolution, and the capacity of the camera to eliminate background image noise.
(a) (b) (c) (d)
Figure 1: Samples of ear images of varying qualities that should be considered in conducting ear recognition research experiments: (a) good quality (b) occlude with hair (c) insufficient lighting (d) occluded with jewelry (Source: Piflug & Busch, (2011))
2.3 The use of neural networks in ear recognition
It was Bernard Widrow who pioneered the use of neural networks during the 1950s. During that period, neural networks were extensively developed for voice recognition, until they have been proven to be useful industrial robotics, medical imaging, aerospace applications, data mining, and image recognition. Essentially, neural network is a type of artificial intelligence which can be used to imitate how the human brain works. A neural network is a connection of processing elements such as data acquisition elements and data processing elements. These processing elements are compared to the neurons of the human brain, hence the term neural network. The concept of neural network has been widely researched since its conception and many researchers have combined it with other methods in order to identify and recognize different images such as ear images. In year 2008 Wang, Mu and Zen proposed that use of Haar Wavelength Transform in order to study more carefully and accurately the ear images. Daramola & Oluwaninyo (2011) used the Back Propagation Neural Network (BPNN) in tandem with energy-edge density ear recognition system which uses Haar Wavelet Transform. The procedure is composed of seven (7) stages. The first stage is taking an ear picture or image, followed by the decomposition of the image into four sub-bands designated as HH, HL, LH, and LL. After obtaining the four sub-bands, density features and discriminative textual energy are extracted from the three layers and are designated as HH1, HL1, and LH1. The two features – density features and discriminative textual energy – are then fused to form a more robust feature which can be used. This same robust feature is then used to be stored in a database while a copy is used to undergo the Neutral Network training algorithm. After the Neutral Network training algorithm is performed on the image a query is conducted into the database to check if the Neutral Network training algorithm is successful in is assessment which is read as the output. The summary of the procedure is given in figure 2.
(1) Ear image
(2) Decompose image into four sub-bands using Haar Wavelet Transform: HH, HL, LH, and LL
(3) Extract 3 Density features and Discriminative textual energy features: HH1, HL1, and LH1
(4) Fuse Density features and Discriminative textual energy features
Store
(5) Undergo Neural Network training algorithm
(6) Data Base
Query
(7) Output Class
Figure 2: Ear recognition process using Neural Network Training Algorithm (Source: Daramola & Oluwaninyo, 2011).
Results from their experiment are promising. Accordingly, they have shown that Back Propagation Neural Network has higher rate of successful identification than a method called Euclidian Distance (ED). Nevertheless it should be noted that it is not conclusive from their research that clouding of the ear image will still result in successful recognition. This is because their experimental method did not explore the testing of different ear images of the same person. Moreover, the main weakness of this technique comes from it use of 2D images. Note that in 2D images objects that are clouding-over the ear are counted as parts of the ear itself, hence, adding to the features that need to be evaluated. This source of error can be minimized, though, through the use of 3D images and their respective algorithms such as the Iterative Closest Point (ICP) algorithm, and by using Stochastic Clustering Methods
Note that use of neural networks for ear recognition will require it to perform pattern identification task. Kamruzzaman & Hasan (2005)explained that the pattern identification by a neural network can be made easier by decreasing the patterns that needs to be identified. This can be made possible using different algorithms. In their study, they have used the Pruning Algorithm to reduce the number of patterns that a neural network needs to asses.
In recent years there were several neural network models which have been proposed and created for different purposes, such as: for pattern classification, regression analysis, and function approximation (Islam et al, 2002). This research will use neural network for classification or identification purposes, more particularly, in ear recognition. An example of such class of neural networks is the multi-layer feed forward network. There are those neural networks that use standard back propagation. This particular class of neural networks performs efficiently on weight space portions of the pattern with fixed topology. Generally, a network that is relatively small may not be able to understand the input problem well, while a relatively bigger network can result to over fitting ultimately leading to incorrect generalization. Despite this limiting factor involving size, artificial neural networks are still considered to be as universal approximators and efficient computing models making them ideal for ear recognition. The accuracy of neural networks is predicted to higher than that of any human experts`. Nevertheless, it not yet well established how a neural network actually solves certain problems and arrives at a particular solution due to the complexity of its architecture.
As mentioned previously, the topology and size of the neural network determine its efficiency (Kamruzzaman & Hasan, 2005). It is therefore necessary to determine first the appropriate network topology for the neural system that is to be built for ear recognition and this can be done using the algorithm pruning technique. This technique involves by training a neural network that is estimated to more than enough than what is necessary. In other words a creation of a large size network will be done first. After which, the removal of unnecessary networks or neurons will proceed until the right size is acquired. Usually, those networks that show redundancy in operation are removed first. Another factor that is checked during the pruning process is the weight of the links between neurons. The “weight” pertains to how each neuron gives significance and corresponding prioritization to the information between each channel that connect each neuron. One way of simplifying these channels is adding a penalty term to the error function. By adding these penalty terms the unnecessary connections will have lesser prioritizations than those which do not. This therefore results if hierarchical consideration of information that passes through different channels and the neural network becomes simplified.
In a study conducted by Kamruzzaman & Hasan (2005), they have studied the effect of the pruning techniques in the simplification of a complex neural system used for object shape identification. The factors which were considered in the pruning process were: the number of neural connections, the number of neurons, and the penalties for error functions associated for each neuron. All in all the pruning process which they have used involved six (6) steps:
Step 1: The creation of large sized neural network for object size identification which has one hidden unit. The neural network was then randomly initialized with varying ranges of weights of connections or channels.
Step 2: The neural network is then partially trained for the task that it has to perform on a training set comprising of certain number of training epochs. Training algorithms were used in this stage. Note that the number of training epochs, which designated by certain variable (τ) was randomly designated by the user.
Step 3: The elimination of weights then followed. The elimination process was done with the aid of a weight eliminating algorithm which is essentially comprise of “if – else” codes.
Step 4: After pruning the neural network has been pruned based from the weights of its channels, performance testing followed in order to evaluate the efficiency of the network in performing the work that it was design for – for shape identification. The performance of the pruned neural network is checked against established or designated standards. Another hidden unit is then added and step 2 is re-performed.
Step 5: The removal of the nodes with insignificant weight is done conducted.
Step 6: The performance of the pruned neural network is then tested again so as to evaluate its efficiency in performing the task that it was designed to perform. If the performance converges then the pruning process is terminated, else step 1 through 6 are re-performed.
In their experiment using the above steps for pruning a neural network, Kamruzzaman & Hasan (2005) were able to create a neural network with minimal connections but at the same time performs better than its former form – before the pruning procedure was done. Nevertheless, their experiment did not involve the evaluation of the pruning process for evaluating neural networks that are used for regression problems and function approximations which are also necessary for ear recognition.
2.4 Previous works and methods used for ear recognition
As pointed out previously one of the earliest works that investigated the potential of ear recognition for biometric purposes was Iannarelli (1989). His work involved the measurements of landmark points in the ear, which were determined manually. The method used by Iannarelli (1989) was highly reliant on high accuracy segmentation and positioning of the ear landmark points. The potential of ear recognition as a biometric has been studied extensively since the latter years of the 1990s. Most of the methods created during this period involve the exploitation of the ear`s 2-dimensional image. One of the first experiments to be performed for ear recognition was conducted by Burge & Burger (2009). In their study they have used adjancency graphs to model ears. These adjacency graphs were calculated using the Voronoi diagrams of each of the ear curves. They have obtained promising results on their study as to the feasibility of using ears for biometrics. Due to the significant improvements in computational computer programming, Hurley, Nixon & Carter (2005) were able to obtain a significantly high level of ear recognition by using a force field feature extraction method which maps the structure the ear through an energy field. Hurley, Nixon and Carter (2005) created a noble method for incorporating the concept of using ear images to identify different individuals. Their experiment involved the use of field transformation to extract certain features of a 2D ear image. Accordingly, they have mapped the ear well and channel as energy fields. In their experiment they have made following assumption: that the pixels of the image exert mutual attraction that is proportional to their respective intensities and inversely proportional to the square of the distance between them. They have shown that there is high rate (greater than 90%) of successful identification and recognition when the two methods were combined. Due to their study`s success, there were many researches which were then performed in accordance with the assumptions and methods used. Burge et al in 2005 created a biometric system for ear recognition. Their system involves the matching of graphs created from ear images. The graphs were formed via voronoi diagram curves which were previously extracted from the ear images. Rahman et al in 2007 performed a similar experiment but this time they used pre-determined points corresponding to different lengths and angles in the ear image. They have measured the geometric relations between these points on the ear images on their data base and the ear images which were used to perform a query. It should be pointed out here that the experiments of Burge et al (2005) and Rahman et al (2007) have lower rates of successful identification than the experiment conducted by Hurley, Nixon and Carter (2005). This low level of accuracy and rates of successful identification can be attributed to the fact that the images which they have may have been taken from different environment compared to those found on their data bases. Note that graphical methods are more affected by such factors – lighting conditions and angle of the image – than field transformation methods. Nevertheless, it should be noted that the use of neural networks on all experiments was proven effective in useful in the identification process, does it is kept on the proceeding experiment. The 3-dimensional images of the ear can also be used in ear recognition. Naseem, Togneri, & Bennamoun (2008) suggested the utilization of the sparse representation which was previously, successfully applied in face recognition. In accordance with this, they have used the 3D image of the ear for ear recognition and they have shown promising result as they were able to produce approximately 80% rate of successful ear recognition. The use of 3-Dimensional (3D) images for ear recognition was further studied by Yan & Bowyer (2007). Using a range scanner, which is 3D scanner, they have obtained images of the ear. They then segmented the 3D image. After which, they have used the Iterative Closest Point (ICP) algorithm to recognize the ears from the database of 415 individuals. Using ICP they were able to achieve 97.8% successful ear recognitions. Chen & Bhanu (2007), proposed the integration of ICP with other methods such as the local surface descriptor. Using this combination of identification strategies Chen and Bhanu (2007) were able to obtain higher percentage of successful ear recognition than Yan and Bowyer (2007). From these two separate studies is can be seen that the use of 3D image and the integration of several identification methods such as the ICP and local surface descriptor can increase the accuracy and the percentage of successful ear recognition. Nevertheless, it should be noted though that the use of 3D images for ear recognition in the field of forensic may not be a practical approach because most surveillance image capturing devices take only 2D pictures. This research will use the ICP algorithm and combine it with another identification method called the Stochastic Clustering Method (SCM). This method was used by Nixon et al (2009) for ear recognition using 2D ear images. They have employed the use of point-models. These point-models are a series or arrangements of distinct ear structures. They have noted in their study that this type of modeling is highly advantageous than complete 3D modeling. While the latter necessitates the complete exposure of the ear to the image capturing device the former requires only a significant portion of the ear in order to perform the ear recognition task. In real life situations, particularly in forensics, it is rarely that the complete image of ear can be taken from an individual, unless that individual interacts actively with the image capturing device. In real life situations the ear is almost always occluded with either hair or other foreign materials such as hats and jewelries. The main idea in their research is to find a pattern of ear structures from any part of the human ear ad use at least one of these patterns to identify an individual from a data base. The patterns of ear structures, which they termed as constellations of point structures, were derived using the Stochastic Clustering Method (SCM). Accordingly, they have used the Scale invariant Feature Transform (SIFT) to identify the point-model parts, and then they have assessed the patterns of arrangements of these model parts using the SCM. Note that one limitation of this method is that other sets of information where necessary for the construction of the point models such as the biological information of the ears` morphologies. This research will adapt the methods used by Nixon et al (2009) in the use of SCM, it will also adapt the method used by Yan & Bowyer (2007) particularly the use of the ICP. These two methods will be integrated for ear recognition. By integrating these two methods of ear recognition it is expected that a more accurate and more robust ear recognition method will be created. Note that Bustard & Nixon (2010) where successful in creating a 3D model of the ear. Using this model it may be possible to determine the possible spatial orientations of the point-models generated through SCM. The spatial orientation of these point-models can be determined using the ICP. Hence, the integration of ICP and SCM is expected to allow the successful ear recognition using ear images taken from different angles and therefore the constraints in ear detection are delimited. Note that in the study conducted by Nixon et al (2009) only three points on the ears 2D image were necessary to form a constellation which was used in the recognition or identification process.
2.5 The application of Stochastic Clustering Method (SCM) in ear recognition
It can be deduced from the numerous researches recently discussed that indeed an efficient neural network is necessary to produce an efficient ear recognition system. Moreover, it can also be deduced that different evaluation methods can be employed in order to increase the efficiency of the ear recognition system that is being developed. Note that from the experiments of Gardiner (2012), it can be seen that image patterns can be converted into different numerical data sets and this data set can be evaluated in groups or in clusters due to the fact that Dinkar & Sambyal (2012) have proven that the 10 distinct ear features can be further expanded into 37 sub-features. Furthermore, they have shown that the average numerical values of these clustered geometric values are distinct for each individual. Nevertheless, Tan, Ting and Teng (2011) have recognized that finding clusters in numerical data such as the geometric values of the distinct features is not a straightforward task. There should be an efficient method for determining relevant data clusters. They have explained that, given a particular set of data, it is essential to know how many natural clusters are hidden in it. This difficulty of finding these natural clusters is increased when there are no additional information about the data set is given. In their research they have shown that stochastic clustering method can be used to determine the number and the patterns of these natural clusters without other inputs aside from the data set itself. Gdalyahu,(1999) explained that cluster analysis which is an essential task in machine learning, is an effective way to identify inherent patterns or structures contained within a given data set. Accordingly, cluster analysis is useful in partitioning image pixels into meaningful portions. These potions can then correspond to different parts of an object such as the parts of the human ear. The identification of these clusters can be done stochastically, hence the term stochastic clustering method of structure or pattern identification. Stochastic clustering method involves defining and measuring every pair of visual entities whether there is a likelihood that they belong to same pattern or structure it also involves the evaluation of aggregate visual entities that come from similar structure into clusters.
2.6 The application Iterative Closest Point (ICP) algorithm in ear recognition
The Iterative Closest Point (ICP) algorithm is extensively used for recognizing similarities from 3D models. The ICP algorithm is used to particularly minimize the statistical differences between two sets of data points which commonly referred to as clouds of points. This method is usually used to reconstruct 2-Dimensional (2D) and 3-Dimensional (3D) surfaces from multiple scans. In the field of medical science it is usually used for construction of virtual bone models. The ICP uses a relatively simple algorithm to process 2D inputs into 2D or 3D outputs. The algorithm usually has four parts performing four particular tasks:
oo Association of data points using the nearest neighbor criteria
oo Estimation of the transformation parameters through the use of a mean square cost equation of function
oo Transformation of the points to 2D or 3D surfaces using estimated parameters
oo Iteration or the re-association of the points (Besl & Mckay, 1992).
Note that ICP is specifically used for 3D modeling not 2D, nevertheless, as pointed out earlier, this method can be used in tandem with SCM to recognize 2D images using 3D models as references. The main advantage of using ICP in tandem with SCM is that in 2D ear recognition, the identification process is hindered by pose variation, camera position variation, and out-of-plane rotations which are unsolved until the present time. Integrating ICP with SCM to eliminate these limitations is therefore a major milestone in the field of structure modeling and identification if proven successful.
Note that ICP was originally created and designed for image registration. In other words it measures the dissimilarity between two images. The measure of dissimilarity is equivalent to registration error between two images. Being a method for image registration, it is acknowledge as robust against image rotations of translations. One limitation of ICP as pointed out by Pflug & Busch (2012), is that it stops the execution of the algorithm too early. They explained that this early stoppage is due to the fact that the algorithm gets stuck in a local minimum. In order to lift this limitation, Pflug & Busch (2012) suggested that two models be used in its execution. These two models will be coarsely pre-aligned before the refinement procedure takes place using ICP. In the study conducted by Chen & Bhanu (2005), they have extracted point clouds for the contour of the ear`s outer helix. They then registered these point clouds with the reference model with the use of ICP. In later experiments, Chen & Bhanu (2007) have used local surface patches (LSP) over point clouds that are present on the ear`s outer helix. They then followed the same procedure for registering the LSP with the reference model making use of the ICP. This method greatly improved the performance of the ICP backed ear recognition process: from 93.3% to 96.63% for using point clouds and LSP, respectively. Islam et al (2008) performed an interesting method similar to that of Nixon (2009), when they connected the point clouds to form point-models. They then reduced the number of points by removing points that are not included in the constellation, hence reducing the number of “faces in the mesh.” This simplified ear image meshes where then used to query for identification on their database. The meshes were aligned with the possible meshes in the database. The degree of fit or alignment between the two meshes was then determined with use of the ICP algorithm. In their later experiment, Islam et al (2011) used patches instead of point clouds and treated them as features of the respective ears. Note that these patches were also extracted from 3D images using the LSP. A certain point of reference, such as the midpoint of the pinna was then determined and the distances of the structures including their angles of separation were determined. This geometrical values where then compared with those that are present in the database to facilitate ear recognition. The matching of angles and distances was done using ICP. Comparing the recognition rates of the two experiments conducted by Islam et al in 2008 and 2011, the former produced the higher value of recognition rate: 93.98% and 93.5%, respectively. Note, however, that all four experiments did not consider pose variation and different scaling systems which does not allow any conclusive evidence as to the efficiency of ICP in tandem with other methods in ear recognition.
While the studies of Chen & Bhanu (2007), Islam et al (2008), and Islam et al (2011) involved the use of 3D images for both database ear images and query ear images, Cadavid, Mahoor & Abdel-Mottaleb (2009) involved the use of 2D query images and 3D database images in the ear recognition process. Accordingly, they first proposed a real-time ear recognition methodology. This method will use a system that will reconstruct 3D ear models form 2D images which were taken using CCTV cameras. They explained that the different shades of the different portions of the ear image can be considered as measure of depth. After reconstructing the 3D ear model, it is compared with 3D models in the database using ICP. Note that the measure of recognition is the degree of alignment between the two models. Their worked gave 95% successful identification rate. Note however that one possible weakness of this method is the variation of the intensity and frequencies of light that the ear is exposed to. For example, if two light sources hit the ear from two different portions then errors will result in the construction of the model. Note also that occlusions in the ear can also result to significant errors because the method is heavily reliant on the shade of the image. Nevertheless, it should be noted that this study can be considered a major milestone in turning ear recognition into a biometric that can be used in forensics due to its use of CCTV 2D images as means of identity identification. A similar experiment was performed at a much earlier date by Liu & Yan (2007). In their study they have reconstructed 3D ear models from 2D ear images. They have used a stereo vision camera for obtaining 2D images. The major difference between the work of Liu & Yan (2007) and the work of Cadavid, Mahoor & Abdel-Mottaleb (2009) is that the former used meshes of the reconstructed 3D image during the querying process. Accordingly, they have identified distinct ear structures and removed the rest. In other words, the basis of identification is not the degree of alignment between the two 3D images (the database image and the query image) but the degree of overlap between the meshes. The degree of overlap was determined using ICP.
Passalis et al (2007) also proposed the creation of real-time ear recognition system for CCTV images. Their study however, involved the use of a standard model for the human ear. This standard model serves as the reference ear model for both ear images in the data base and query ear images it is created by considering the ear images of average human ear, in other words, it is an ear representative. Note that averaging of the human ear characteristics for the standard model is necessary in order to make sure that it has no similarity to any of the real human ear. The reconstruction of a 3D model to be used for the query was done in a similar method used by Cadavid, Mahoor & Abdel-Mottaleb (2009). That is, 2D ear images were used to generate a 3D ear model. After the reconstruction, the model is deformed and translated so that it fits into the standard model. The history of deformation and translation that the query model has undergone is then recorded. The suspected matches from the database will then undergo the same process of deformation and translation until they fit the standard model, as well. The respective histories of their deformations and translations are then compared with the query model`s history. The degree of variation between histories was evaluated using ICP. This system of ear recognition resulted to 94.4% successful identification rate. Note that due to the robustness of the ICP only one fit of deformation is necessary to establish an authentic recognition. Nevertheless, it should be noted that this particular method may require considerable data base size and a high quality processing software and hardware in order to handle the operations performed on both the query materials and the database materials. This system of ear identification is time consuming, requires extensive work, and is predicted to be expensive compared to the other methods of ear recognition. Another unique idea was proposed by Heng & Zhang (2011) for ear recognition. Accordingly, they proposed that a 3D ear model is constructed. After constructing the 3D model, it will be sliced into incremental thickness starting along the orthogonal axis of the longest distance between the uppermost part of the helix and the lobule. The curvature information of each incremental slice will then be determined and will serve as the bases for the identification procedure. This method resulted to 94.5% successful ear recognitions. Note however that their experiment did not consider different angles of rotation of the 3D image. If ever, possible rotations are considered then a significant number of curves can be generated. Hence this method is tedious if it is to be used in the field of forensics. Note that there is almost infinitesimal number of possible rotations.
From all these methodologies and systems of identification, the important role of ICP cannot be over emphasized. It can also be observed that ICP is indeed efficient in recognizing patterns and deviations from these patterns making it a robust method for ear recognition.
2.6 Challenges for future researches on ear recognition
Pflug & Busch (2012) explained that despite the numerous methods which were developed from the early 1900s up to the present time for using the ears as a means of identification or recognition, there is yet a need to perform numerous researches on devising efficient ear recognition methods that are not limited by diverse factors such as light intensity so that they can be used in real life situations such as in the field of forensics and medicine. It follows therefore that future studies be geared towards this aim. An efficient ear recognition system therefore is one that integrates different methods such as the ICP and SCM into a neural network. This neural network should be able to perform logical and real-time assessment from minimal information inputs and it should be affected minimally by different factors that influence image quality. Moreover, pruning methods for neural networks used for ear recognition is an entirely new topic which may need more complex pruning techniques compared to the technique used by Kamruzzaman & Hasan (2005) as it entails a more complicated process of image recognition – this research will used 3D image models and 2D images.
2.7 Summary of research gaps
To have better understanding of the weaknesses & limitations of the related research literatures discussed previously, it is essential to create a summary of research gaps. The following table (table 1) tabulates these research gaps.
Table 1: Summary of the studies performed from 1989 to 2012 with their respective methodologies and weaknesses & limitations
Research Author(s)
Research Title
Methods Used
Weaknesses & Limitations
Iannarelli (1989)
Ear identification
Manual measurement of the ear`s distinct structures
Time consuming low accuracy not applicable in real life situation such as in forensics and medicine
Hurley, Nixon & Carter (2005)
Force field feature extraction for ear biometrics
Used 2D image as input Mapped 2D ear structure in energy field by using Force Field Feature Extraction Method
Highly affected by image lighting conditions and the angle by which the angle was taken
Chen & Bhanu (2005)
Contour Matching for 3D Ear Recognition
Used 2D image as input Point cloud extraction ICP
Limited by lighting conditions and the angle by which the image was taken
Chen & Bhanu (2007)
Human ear recognition in 3D
Used 3D image as input LSP ICP
Not applicable in practical application in forensics and medicine time consuming
Rahman et al (2007)
Person Identification Using Ear Biometrics
Mapped 2D ear structures relative to a pre-determined point in the ear
Highly affected by image lighting conditions and angle, low accuracy
Yan & Bowyer (2007)
Biometric recognition using 3D ear shape
Used 3D image as input 3D image segmentation method ICP
Did not study the effect of image`s angle of rotation no conclusive evidence that the methods can be used for images taken from different angles
Chen & Bhanu (2007),
Contour matching for 3D ear recognition
Used 3D image as input Local Surface Descriptor ICP
The use of 3D image as input delimits its practical application in the field of forensics or medicine expensive
Passalis et al (2007)
Towards Fast 3D Ear Recognition For Real-Life Biometric Applications
Used 2D image as input used standard model for human ear as reference used history of model deformation and translation ICP
Tedious requires huge database memory to contain all reference data
Naseem, Togneri, & Bennamoun (2008)
Sparse representation for ear biometrics
Mapped 3D ear model using Sparse Representation
Low accuracy
Islam et al (2008)
A Fully Automatic Approach for Human Recognition from Profile Images Using 2D and 3D Ear Data
Used 2D and 3D image input to create simplified ear image meshes LSP ICP
Time consuming did not consider pose variation and different scaling systems no conclusive evidence to its applicability in real life situations
Burge & Burger (2009)
Ear biometrics
Created 2D ear models using adjancency graphs through Voronoi diagrams
Time consuming not applicable to real life situations such as in forensics and medicine limited by image lighting and angle
Nixon et al (2009)
On use of biometrics in forensics: gait and ear
Used 2D image as input SCM
Limited by lighting conditions and by the angle the image was taken low accuracy limited practical application
Cadavid, Mahoor & Abdel-Mottaleb (2009)
Multi-Modal Biometric Modeling
and Recognition of the Human Face and Ear
Used 2D images as inputs created 3D models via multi-modal biometric modeling ICP
Expensive
Heng & Zhang (2011)
Fast 3D Point Cloud Ear Identification by Slice Curve Matching
Used 2D image as input construct 3D model which is then sliced into incremental thickness contour comparison using ICP
Needs huge size of database memory for keeping references Tedious
Dinkar & Sambyal (2012)
Person identification in Ethnic Indian Goans using ear biometrics and neural networks
Used 2D image as input SCM
Time consuming low accuracy
Pflug & Busch (2012)
Biometrics: A Survey of Detection, Feature Extraction and Recognition Methods
Used 3D image as input ICP
ICP algorithm terminates early time consuming
It can be seen from table 1 that a significant number of the past studies have low accuracy due to the adverse effects of image lighting conditions and the angle by which they were taken (image pose). Note also that the majority of the studies which are not affected by these two factors are, however, require methods that are tedious or time consuming to perform. Another observation that should be noted is the limitations & weaknesses of using 2D and 3D images as inputs. When the input is a 2D image, the result is usually affected by the two limiting factors lighting and angle of image, on the other hand, when the input is a 3D image the weakness is that the time for processing usually takes longer compared to processing 2D images and that the method is usually hard to apply in real life situation. Note however, that it may be possible to use 2D inputs and still reduce the limiting power of image lighting and angle by integrating two methods, namely: ICP an SCM. The previous researches has repeatedly shown the reliability of ICP algorithm in recognition querying, while SCM is proven to be efficient in creating patterns among statistical data sets. It is expected in this research that by integrating these two methods, ear recognition will become less time consuming due to the use of 2D image inputs, less affected by the limiting factors such lighting and angle of the image, yet maintaining high percentage of successful ear recognition due to the use of 3D models created from 2D inputs and the robustness of ICP.
2.8 Chapter summary
This chapter has discussed the efficacy of using the human ears for identification purposes. A review of related literature examining the possibility of using ears was done, results from this initiative results to the conclusion that the ear can indeed be used to identify individuals as it posses distinct characteristics inherent to its structure that are unique to each individuals. This chapter has also discussed the use of ear recognition in different fields of expertise such as in the medicine and in the forensics. Another topic which was discussed in this chapter are the different methods of ear recognition which were used starting from 1989 to 2012 in the hope of finding relevant methodologies that can be adapted in the conduct of this research. Based from the findings of the related literatures aforementioned, there is significant probability that the efficacy of a certain ear identification method will increase if it is combined with other methods. This research will integrate ICP and SCM the efficiency and robustness of these two methods were discussed accordingly. The advantages and disadvantages of using 2D and 3D images were also discussed. Moreover, the limitations of the research gaps were discussed. Furthermore, the use of neural networks and the methods for improving their robustness and performance such as the pruning method was also discussed. Lastly, this chapter discussed the possible aspects of ear recognition methods that need to be tackled on future researches.
References
Besl, P. J. and McKay, N.D. (1992). A Method for Registration of 3-D Shapes”. IEEE Trans. on Pattern Analysis and Machine Intelligence, Vol.14 (2): pp. 239 – 256.
Burge, M. and Burger, W. (1998). Ear biometrics. In A. Jain, R. Bolle, and S. Pankanti, editors, BIOMETRICS: Personal Identification in a Networked Society, pp. 273 – 286.
Bustard, J. D. and Nixon, M. S. (2010). 3D morphable model construction for robust ear and face recognition. San Francisco, CA: CVPR
Bustard, J.D and Nixon M.S. (2010).Toward Unconstrained Ear Recognition From Two-Dimensional Images. Systems, Man and Cybernetics. Vol.40(3): 486 – 494.
Cadavid, S., Mahoor, M.H. and Abdel-Mottaleb, M. (2009). `Multi-Modal Biometric Modeling
and Recognition of the Human Face and Ear`. In: IEEE International Workshop
on Safety, Security Rescue Robotics (SSRR). p. 1 – 6.
Chen, H. and Bhanu B. (2005). `Contour Matching for 3D Ear Recognition`. In: Proceedings
of the Seventh IEEE Workshop on Applications of Computer Vision.
Chen, H. and Bhanu, B. (2007) Human ear recognition in 3D. IEEE TPAMI, Vol.29(4): 718 – 737, 2007.
Daramola, S.A. and Oluwaninyo, O.D. (2011). Automatic Ear Recognition System using Back Propagation Neural Network. International Journal of Video & Image Processing and Network Forensic. Vol.11(1): 28 – 32.
Dinkar, A.D. and Sambyal, S.S. (2012). Person identification in Ethnic Indian Goans using ear biometrics and neural networks. India: PubMed.
Gardiner, M. 2012. Progress In Unconstrained Ear Recognition. Retrieved on February 21, 2013 from: http://www.science20.com/beachcombing_academia/progress_unconstrained_ear_recognition-93596.
Gdalyahu, Y. (1999). Stochastic Clustering and its Applications to Computer Vision. Jerusalem: Senate of the Hebrew University.
Heng, L. and Zhang, D. (2011). `Fast 3D Point Cloud Ear Identi_cation by Slice Curve Matching`. In: 3rd International Conference on Computer Research and Development (ICCRD), p. 224.
Hurley, D. J., Nixon, M. S. and Carter, J. N. (2005). Force field feature extraction for ear biometrics. Computer Vision and Image Understanding, Vol.98(1):491 – 512
Iannarelli, A.V. (1989) Ear identification. Paramont Publishing Company.
Islam, S.M.S., Bennamoun, M., and Mian, A.S. (2008). Davies R. `A Fully Automatic Approach for Human Recognition from Pro_le Images Using 2D and 3D Ear Data`. In: Proceedings of 3DPVT – the Fourth Internatinoal Symposium on 3D Data Processing, Visualization and Transmission.
Kamruzzaman, S. M. and Hasan, A.R. (2005). Pattern Classification using Simplified Neural Networks with Pruning Algorithm. Retrieved on February 24, 2013 from: http://arxiv.org/ftp/arxiv/papers/1009/1009.4983.pdf
Liu ,H. and Yan, J. (2007). `Multi-view Ear Shape Feature Extraction and Reconstruction`. In: Third International IEEE Conference on Signal-Image Technologies and Internet-Based System (SITIS). p. 652 – 658.
Lowe, D. G. (2004). Distinctive image features from scale invariant key points. International Journal of Computer Vision, Vol.60(2): 91 – 110.
Meyer, C.D. and Wessell, C.D. (2012). Stochastic data clustering. US: North Carolina State University Press.
Naseem, I., Togneri, R. and Bennamoun, M. (2008). Sparse representation for ear biometrics. Las Vegas, Nevada: In ISVC`08, pp. 336 – 345.
Nixon, M.S., Bouchrika, I., Arbab-Zavar, B., and Carter, J.N. (2009). On use of biometrics in forensics: gait and ear. UK: University of Southampton
Passalis, G., Kakadiaris, I.A., Theoharis T. and Toderici G, Papaioannou, T. (2007). `Towards Fast 3D Ear Recognition For Real-Life Biometric Applications`. In: IEEE Conference on Advanced Video and Signal Based Surveillance (AVSS 2007), p. 39 – 44.
Pflug, A., and Busch C. (2012). Ear Biometrics: A Survey of Detection, Feature Extraction and Recognition Methods. Germany: CASED.
Rahman, M., Islam, R., Bhuiyan, N.I., Ahmed, B. and Islam, A (2007). Person Identification Using Ear Biometrics. International Journal of The Computer, The Internet and Management, Vol.15(1): 1 – 8.
Setiono, R. and Liu, H. (1995).”Understanding Neural networks via Rule Extraction”, In Proceedings of the International Joint conference on Artificial Intelligence, pp. 480-485.
Victor, B., Bowyer, K. and Sarkar, S. (2002). An evaluation of face and ear biometrics. International Conference on Pattern Recognition (ICPR). vol.1(1): p. 429 – 432.
Yan, P. and Bowyer, K. W. (2007). Biometric recognition using 3D ear shape. IEEE TPAMI, Vol.29(8): pp.1297 – 1308.