Publication - Face Recognition in Challenging Situations

0693-0043-MeasuringAccuracy-FacialExaminers-Publication-FaceRecognition-in-ChallengingSituations.pdf

NIST Generic Clearance for Usability Data Collections

Publication - Face Recognition in Challenging Situations

OMB: 0693-0043

Document [pdf]
Download: pdf | pdf
Face	Recognition	in	
Challenging	Situations	
	
Eilidh	Clare	Noyes	
	
Doctor	of	Philosophy		
University	of	York	
Psychology	
	
	
May	2016	

	
	

Abstract	
	
A	great	deal	of	previous	research	has	demonstrated	that	face	recognition	is	unreliable	for	
unfamiliar	 faces	 and	 reliable	 for	 familiar	 faces.	 However,	 such	 findings	 typically	 came	
from	 tasks	 that	 used	 ‘cooperative’	 images,	 where	 there	 was	 no	 deliberate	 attempt	 to	
alter	 apparent	 identity.	 In	 applied	 settings,	 images	 are	 often	 far	 more	 challenging	 in	
nature.	 For	 example	 multiple	 images	 of	 the	 same	 identity	 may	 appear	 to	 be	 different	
identities,	 due	 to	 either	 incidental	 changes	 in	 appearance	 (such	 as	 age	 or	 style	 related	
change,	 or	 differences	 in	 images	 capture)	 or	 deliberate	 changes	 (evading	 own	 identity	
through	disguise).	At	the	same	time,	images	of	different	identities	may	look	like	the	same	
person,	due	to	either	incidental	changes	(natural	similarities	in	appearance),	or	deliberate	
changes	 (attempts	 to	 impersonate	 someone	 else,	 such	 as	 in	 the	 case	 of	 identity	 fraud).	
Thus,	past	studies	may	have	underestimated	the	applied	problem.	In	this	thesis	I	examine	
face	recognition	performance	for	these	challenging	image	scenarios	and	test	whether	the	
familiarity	 advantage	 extends	 to	 these	 situations.	 I	 found	 that	 face	 recognition	 was	
indeed	 even	 poorer	 for	 challenging	 images	 than	 previously	 found	 using	 cooperative	
images.	Familiar	viewers	were	still	better	than	unfamiliar	viewers,	yet	familiarity	did	not	
bring	performance	to	ceiling	level	for	challenging	images	as	it	had	done	in	the	cooperative	
tasks	in	the	past.	I	investigated	several	ways	of	improving	performance,	including	image	
manipulations,	exploiting	perceptual	constancy,	crowd	analysis	of	identity	judgments,	and	
viewing	 by	 super-recognisers.	 This	 thesis	 provides	 interesting	 insights	 into	 theory	
regarding	 what	 it	 is	 that	 familiar	 viewers	 are	 learning	 when	 they	 are	 becoming	 familiar	
with	a	face.	It	also	has	important	practical	implications;	both	for	improving	performance	
in	challenging	situations	and	for	understanding	deliberate	disguise.			
	

	

	

2		

	
	

Table	of	Contents	
	

	

	

Abstract	......................................................................	2	
Table	of	Contents	.......................................................	3	
List	of	Tables	...............................................................	6	
List	of	Figures	..............................................................	7	
Acknowledgements	..................................................	15	
Declaration	...............................................................	16	
Chapter	1	–	General	Introduction	.............................	17	
1.1	 Introduction:	Why	Face	Recognition	is	Important	.................................................	17	
1.2	 Familiarity	&	Face-Matching	..................................................................................	21	
1.3	 What	Information	is	used	to	Recognise	a	Face?	....................................................	30	
1.4	 Learning	Variability	................................................................................................	36	
1.5	 Face	Recognition	in	Challenging	Situations	...........................................................	40	
1.6	 Overview	of	Current	Work	.....................................................................................	44	

Chapter	2	–	Familiarity	&	Challenging	Faces	............	48	
2.1	 Chapter	Summary	..................................................................................................	48	
2.2	 Introduction	...........................................................................................................	48	
2.3	 Celebrity	Faces	&	Celebrity	Lookalikes	..................................................................	54	
2.4	 Experiment	1:	Lookalike	Task	.................................................................................	55	
2.5	 Experiment	2:	Mid	Pixelation	.................................................................................	65	

	

3		

	
	
2.6	 Experiment	3:	Coarse	Pixelation	............................................................................	74	
2.7	 Between	Experiments	Analysis	..............................................................................	81	
2.8	 General	Discussion	.................................................................................................	82	

Chapter	3	–	Improving	Performance	........................	86	
3.1	 Chapter	Summary	..................................................................................................	86	
3.2	 Introduction	...........................................................................................................	86	
3.3	 Experiment	4:	Blurring	Pixelated	Images	...............................................................	91	
3.4	 Experiment	5:	Crowd	Analysis	...............................................................................	97	
3.5	 Experiment	6:	Observer	Factors	..........................................................................	107	
3.6	 General	Discussion	...............................................................................................	117	

Chapter	4	–	Changing	Camera-to-Subject	Distance	120	
4.1	 Chapter	Summary	................................................................................................	120	
4.2	 Introduction	.........................................................................................................	120	
4.3	 Experiment	7:	Facial	Configuration	Measurements	.............................................	124	
4.4	 Experiment	8:	Face-Matching	&	Camera-to-Subject	Distance	.............................	127	
4.5	 Experiment	9	-	Perceptual	Constancy	for	Face	Shape	.........................................	134	
4.6	 General	Discussion	...............................................................................................	140	

Chapter	5	–	Matching	Disguised	Faces	...................	144	
5.1	 Chapter	Summary	................................................................................................	144	
5.2	 Introduction	.........................................................................................................	144	
5.3	 Existing	Disguise	Face	Databases	.........................................................................	152	
5.4	 FAÇADE	Database	................................................................................................	156	
5.5	 Experiment	10:	Unfamiliar	Viewers	.....................................................................	166	
5.6	 Experiment	11:	Unfamiliar	(Informed)	Viewers	...................................................	171	

	

4		

	
	
5.7	 Experiment	12:	Familiar	Viewers	.........................................................................	176	
5.8	 General	Discussion	...............................................................................................	184	

Chapter	6	–	Understanding	Disguise	......................	191	
6.1	 Chapter	Summary	................................................................................................	191	
6.2	 Introduction	.........................................................................................................	192	
6.3	 How	do	People	Disguise	Themselves?	.................................................................	199	
6.4	 Can	Viewers	Predict	by	Eye	which	Disguises	will	be	Effective?	...........................	205	
6.5	 What	do	Viewers	Believe	Makes	for	an	Effective	Disguise?	................................	208	
6.6	 Experiment	13	-	Do	Social	Inferences	Change	for	Disguise?	................................	213	
6.7	 General	Discussion	...............................................................................................	221	

Chapter	7	–	General	Discussion	..............................	224	
7.1	 Overview	of	Findings	...........................................................................................	224	
7.2	 Relation	to	Previous	Research	.............................................................................	228	
7.3	 Theoretical	Implications	......................................................................................	231	
7.4	 Practical	Implications	...........................................................................................	235	
7.5	 Future	Directions	.................................................................................................	237	

References	..............................................................	241	
	
	
	

	

5		

	

	
	

List	of	Tables	
	
Table	3.1	Crowd	analysis	results	broken	down	by	trial	types	(same	identity,	different	
identity)……….……………………………………………………………………………………………………………….104	
Table	 3.2	 Crowd	 analysis	 results	 broken	 down	 by	 trial	 types	 (same	 identity,	 different	
identity)………………………………………………………………………………………………………………….….…106	
Table	3.3	Performance	accuracy	broken	down	by	viewer	group	and	trial	type……………..115	
Table	 3.4	 Crowd	 analysis	 results	 broken	 down	 by	 trial	 types	 (same	 identity,	 different	
identity)	for	the	mid	pixelation	(control)	Experiment	2…………………………………………………116	
Table	 3.5	 Crowd	 analysis	 results	 broken	 down	 by	 trial	 types	 (same	 identity,	 different	
identity)	for	the	SRs.…………………………………………………………………………………………….……….116	
Table	4.1	Table	showing	mean	measurements	for	each	photograph	condition.	EN	stands	
for	 ear	 to	 nose	 measurement,	 and	 NM	 represents	 nose	 to	 mouth.	 The	 letters	 following	
denote	the	side	of	the	image	which	the	measurement	was	taken	for,	L	=	left,	R	=	right	&	C	
=	centre.	Average	measurements	are	calculated	for	week	1	(Avg1)	and	week	2	(Avg2)	at	
both	near	(AvgN)	and	far	(AvgF)	distances………………………………….………………………………..127	
	
Table	6.1	Social	inference	comparisons	for	impersonation	similar	images……………………219	
Table	6.2.	Impersonation	random	disguises,	means	and	median	results	for	each	
analysis…………………………………………………………………………………………………………………………220	

	
	
	

	

	

	

6		

	
	

List	of	Figures	
	

Figure	1.1	Image	showing	the	face	of	the	real	suspect,	Hussain	Oman	on	the	left	and	the	
face	of	Jean	Charles	de	Menezes	on	the	right…………………………………………………………………18	
Figure	 1.2	 Face-matching	 performance	 of	 passport	 officers	 in	 the	 study	 by	 White	 et	 al.	
(2014).	 Some	 officers	 perform	 with	 very	 high	 performance	 regardless	 of	 their	
employment	duration..…………………………………………………………………………………………………..24	
Figure	1.3	Example	of	line	stimuli	taken	from	Bruce	et	al.	(1999)	line	up	task.	The	correct	
match	of	the	target	face	is	face	number	3.……………………………………………………………………..25	
Figure	 1.4	 Examples	 of	 face	 pairs	 taken	 from	 the	 GFMT	 (Burton	 et	 al.,	 2010).	 Top	 row	
show	different	identity	pairs,	bottom	row	are	same	identity	pairs…………………………………27	
Figure	1.5	Examples	of	face	pairs	in	the	matching	task	involving	poor	quality	CCTV	footage	
in	 the	 experiments	 conducted	 by	 Burton	 et	 al.	 (1999).	 Familiar	 viewers	 performed	 with	
high	accuracy	when	matching	the	face	pairs...…………………………………………………………..……28	
Figure	1.6	Examples	of	full-face	image	on	the	left	was	to	be	matched	with	either	of	the	
internal	face	images	on	the	right.	Familiarity	aided	matching	accuracy	on	this	task……….29	
Figure	1.7	Example	of	test	stimuli	used	by	Tanaka	&	Farah	(1993),	which	shows	examples	
of	isolated	features,	intact	faces	and	scrambled	faces…………………………………………………….32	
Figure	1.8	Example	images	of	one	identity	with	features	changed	in	configuration	across	
the	 different	 images	 taken	 from	 the	 study	 by	 Haig	 (1984).	 Features	 themselves	 remain	
intact,	although	the	exact	distances	between	features	are	changes……………………………….33	
Figure	1.9	Example	stimuli	from	Hole	et	al.	(2002)	showing	original	image,	vertical	stretch	
of	150%	and	200%...…………………………..……………………………………………………………..……………34	
Figure	 1.10	 Example	 stimuli	 from	 Collishaw	 &	 Hole	 (2000).	 Top	 row,	 from	 left	 to	 right	
images	 show	 the	 following	 image	 conditions:	 intact,	 inverted,	 scrambled.	 Bottom	 row	
images	show:	blurred,	blurred	and	scrambled,	blurred	and	inverted.……………………….……35	

	

7		

	
	
Figure	1.11	Example	of	the	card-sorting	scenario.	Figure	shows	20	images	of	two	different	
identities.	 Familiar	 viewers	 find	 this	 task	 easy	 whereas	 unfamiliar	 viewers	 generally	
believe	that	there	are	more	than	two	identities	present.	……………………………………..….……38	
Figure	1.12	Example	stimuli	in	the	Dhamecha	et	al.	(2014)	face-matching	task………………44	
Figure	2.1	Example	of	full	face	and	internal	feature	stimuli	(left)	and	full	face	and	external	
features	(right)	viewed	as	part	of	Clutterbuck	&	Johnston’s	(2002)	face-matching	task….53	
Figure	 2.2	 Face-matching	 task	 image	 examples.	 The	 pairs	 on	 the	 left	 (A)	 show	 different	
identities	 (with	 the	 imposter	 face	 on	 the	 right),	 the	 pairs	 on	 the	 left	 (B)	 show	 same	
identity	pairs……………………………………………………………………………………………………………..……57	
Figure	2.3	Photograph	of	one	participant’s	use	of	the	familiarity	scale	taken	immediately	
after	completion	in	the	experimental	setting.	The	far	left	side	of	the	scale	indicates	that	
the	face	was	completely	unfamiliar,	and	the	far	right	depicts	extreme	familiarity	with	the	
face…………………………………………………………………………………………………………………………………59	
Figure	 2.4	 Percentage	 of	 correct	 responses	 in	 face-matching	 task	 (using	 fine	 quality	
200x300	pixel	images)	for	each	familiarity	quintile;	1	(0-19),	2	(20-39),	3	(40-59),	4	(60-79),	
5	(80-100).	With	Band	1	being	completely	unfamiliar	and	Band	5	being	extremely	familiar.	
Error	bars	show	standard	error	of	the	mean…………………………………………………………………..60	
Figure	 2.5	 Pairwise	 comparisons	 showing	 which	 familiarity	 levels	 performance	 was	
significantly	better	than	the	other	familiarity	levels..………………………………………………………61	
Figure	2.6	Graph	showing	pattern	for	Same	identity	pairs	correct	response	and	Different	
(lookalike)	 identity	 pairs	 correct	 responses.	 Error	 bars	 show	 standard	 error	 of	 the	
mean.………………………………………..……………………………………………………………………………………62	
Figure	 2.7	 Example	 of	 actual	 image	 issued	 by	 the	 police	 to	 the	 public	 to	 assist	 with	
identification	 of	 a	 man	 caught	 on	 CCTV	 (Howarth,	 2016).	 This	 image	 takes	 a	 pixelated	
appearance.……………………………………………….…………………………………………………………………..65	
Figure	2.8	Example	of	stimuli	used	in	the	face-matching	task	created	by	Bindemann	et	al.	
(2013)..……………………………………………………….…………………………………………………………………	67	

	

8		

	
	
Figure	2.9	Example	of	the	image	appearance	for	Experiment	2	(top	pair)	compared	with	
the	 fine	 version	 of	 the	 same	 image	 as	 used	 in	 Experiment	 1	 (bottom)	 pair.	 These	 are	
different	image	pairs	of	Al	Gore	with	the	lookalike	appearing	on	the	right..………………….	69	
Figure	2.10	Graph	showing	the	graded	effect	of	familiarity	for	participants’	face-matching	
task	performance.	Error	bars	show	standard	error	of	the	mean……………………………………..71	
Figure	 2.11	 Face-matching	 performance	 broken	 down	 into	 correct	 Same	 and	 Different	
identity	trials.	Error	bars	show	standard	error	of	the	mean……………….……………………………72	
Figure	2.12	Example	of	coarsely	pixelated	image	stimuli	used	in	Experiment	3………….…..76	
Figure	2.13	Percentage	of	correct	responses	for	each	of	the	three	levels	of	familiarity	in	
the	20x30	pixel	condition.	Error	bars	show	standard	error	of	the	mean…………………………77	
Figure	 2.14	 Percentage	 of	 correct	 responses	 in	 face-matching	 task	 (using	 poor	 quality	
20x30	 pixel	 images)	 by	 familiarity	 broken	 down	 into	 same	 (dotted	 line)	 and	 different	
(dashed	line)	correct	trials.	Error	bars	show	standard	error	of	the	mean…………………..……78	
Figure	3.1	Face	shape	classification	examples	provided	by	Towler	et	al.	(2014)…………….	87	
Figure	 3.2	 Graph	 from	 White	 et	 al.	 (2014)	 passport	 officer	 paper	 showing	 the	 officers’	
performance	 accuracy	 on	 the	 GFMT	 alongside	 their	 employment	 duration.	 Some	 police	
officers	performed	very	highly	on	the	GFMT,	these	high	scores	can	be	found	at	both	ends	
of	the	employment	duration	axis……………………………………………………………………………………90	
Figure	3.3	Identical	images	of	Al	Gore	(left)	and	Gary	Barlow	(right)	shown	as	they	were	
presented	in	each	experimental	condition.	The	image	on	the	left	for	each	identity,	shows	
the	coarsely	pixelated	image	as	presented	in	Experiment	3.	The	images	on	the	right,	show	
the	image	on	the	left	of	it,	after	undergoing	blurring,	and	as	presented	in	Experiment	4.93	
Figure	 3.4	 Graph	 showing	 performance	 accuracy	 on	 the	 blurred	 pixelated	 task	 split	 by	
same	and	different	person	trials.	Error	bars	show	standard	error	of	the	mean……………….94	
Figure	3.5	Percentage	of	correct	responses	in	face-matching	task	for	each	familiarity	band	
(low	familiarity,	mid	familiarity,	high	familiarity),	for	Experiments	3	(black	line)	&	4	(blue	

	

9		

	
	
line).	Error	bars	show	standard	error	of	the	mean.	Error	bars	show	standard	error	of	the	
mean………………………………………………………………………………………………………………………………95	
Figure	3.6	Mean	performance	on	items	of	the	GFMT	performance	according	to	different	
crowd	sizes	(White	et	al.	2013).	Graph	shows	performance	accuracy	broken	down	by	trial	
type,	with	results	analysed	for	crowd	sizes	of	1,	2,	4,	8,	16,	32	and	64..…………………….…100	
Figure	3.7	Graph	showing	the	mean	accuracy	score	for	crowd	sizes	of	1,	3,	5	and	15	for	
the	coarsely	pixelated	lookalike	task,	Experiment	3.	Error	bars	show	standard	error	of	the	
mean…………………………………………………………………………………………………………………………….103	
Figure	3.8	Graph	showing	the	mean	accuracy	score	for	crowd	sizes	of	1,	3,	5	and	15	for	
blurred	version	of	coarsely	pixelated	lookalike	task	(blue	line)	and	the	coarsely	pixelated	
lookalike	task	(black	line).…………………………………………………………………………………..…………106	
Figure	3.9	Example	trials	from	the	PLT.	Images	on	the	left	show	different	identities	(with	
the	imposter	face	on	the	right).	Images	on	the	right	show	the	same	identity............……112	
Figure	 3.10	 Performance	 of	 police	 super-recognisers	 and	 comparison	 viewers.	
Performance	 of	 super-	 recognisers	 (SR1–4;	 black)	 and	 comparison	 viewers	 (white)	 on	
three	 different	 tests	 of	 face	 recognition—the	 GFMT	 (left	 column),	 the	 MFMT	 (middle	
column),	 and	 the	 PLT	 (right	 column).	 Vertical	 lines	 indicate	 the	 range	 of	 scores	 for	
comparison	groups,	the	deleted	portion	of	the	line	shows	the	standard	deviation,	and	the	
horizontal	notch	shows	the	mean.	In	all	three	tasks,	chance	performance	is	50%..………113	
Figure	4.1	Changes	in	face	shape	resulting	in	differing	weight	judgments	as	photographs	
were	taken	from	far	distance	(left)	to	near	distance	(right),	example	taken	from	Harper	&	
Latto	(2001)……………………………….	………………………………………………………………………………..121	
Figure	 4.2	 Example	 of	 measurement	 figure	 taken	 from	 Burton	 et	 al.	 (2015).	 Images	 are	
standardised	so	that	interocular	distance	is	the	same.	Metric	distances	are	expressed	as	
proportions	of	standardised	interocular	distance……………………………………………….….….…122	
Figure	 4.3	 Example	 of	 the	 measurements	 taken	 for	 two	 of	 the	 photos	 of	 one	 model.	
Measurements	taken	were	the	distances	between:	left	eye	to	nose,	right	eye	to	nose,	left	
nose	to	mouth	corner,	right	nose	to	mouth	corner	and	centre	of	nose	to	mouth…..….…126	
	

10	
	

	
	
Figure	4.4	Example	of	one	identity	with	each	of	their	four	image	identity	pairings	shown.	
The	 first	 column	 shows	 image	 pairs	 of	 the	 same	 identity	 and	 the	 second	 column	 shows	
different	 identity	 pairs.	 The	 first	 row	 shows	 same	 camera-to-subject	 distance	 pairs	 and	
the	 bottom	 row	 shows	 pairs	 where	 the	 images	 are	 of	 different	 camera-to-subject	
distance.…………………………………………………………………………………….…………………………………129	
Figure	4.5	Effect	of	changing	camera-to-subject	distance	on	performance	accuracy	in	the	
face-matching	 task	 for	 familiar	 (F)	 and	 unfamiliar	 viewers	 (U),	 for	 same	 and	 different	
identity	trials,	at	both	same	and	different	distances.	Error	bars	show	standard	error	of	the	
mean………………………………………………………………………………………………………………………….…132	
Figure	 4.6	 Example	 of	 congruent	 and	 Incongruent	 face	 image	 pairs	 (with	 distance	 cues)	
for	same	and	different	identities.………….…………………….…………….….…………….……..….….…137	
	Figure	 4.7	 Graph	 showing	 the	 percentage	 of	 correct	 responses	 for	 both	 congruent	 and	
incongruent	 image	 pairs	 broken	 down	 by	 same	 and	 different	 identity	 trials.	 Error	 bars	
show	standard	error	of	the	mean………………………………………………………………………………...138	
Figure	 5.1	 Images	 from	 the	 IDV1	 database.	 Props	 include	 glasses,	 fake	 beards	 and	
moustaches,	medical	maskes	and	hats	turbans..…	…………………………………………………….…148	
Figure	5.2	Images	from	the	AR	datatbase.	Disguise	manipulations	are	limited	to	a	change	
of	expression	or	the	addition	of	sunglasses	or	a	scarf..……………….…	…………………………….153	
Figure	5.3	A	sample	of	images	from	the	National	Geographic	Database	(Ramathan,	et	al.,	
2004)	……………………………………………………………………………………………………………………………154	
Figure	5.4	Example	images	taken	from	the	TarrLab	face	database.	……………………………..155	
Figure	5.5	Examples	from	the	synthetic	face	disguise	database	(Singh	et	al.	2009).……..156	
Figure	5.6	Sample	of	props	used	to	create	the	disguise	face	database………………………….159	
Figure	5.7	Image	taken	during	stimuli	selection	process……………………………………………….161	
	

11	
	

	
	
Figure	5.8	Example	pairs	for	each	condition.	Top	row	shows	same	identity	pairs,	the	lower	
two	rows	show	different	identity	pairs.	Pairs	in	the	first	column	are	in	no	disguise.	Pairs	in	
the	 second	 column	 are	 in	 disguise.	 All	 26	 models	 were	 photographed	 in	 each	 of	 the	
conditions……………………………………………………………………………………………………………………162	
Figure	5.9	Selection	of	images	taken	from	the	disguise	base	database	to	create	a	wheel	of	
disguise.	 Images	 with	 the	 same	 colour	 frame	 show	 the	 same	 identity.	 Images	 with	
different	colour	frames	are	of	different	identities………………………………….…………..….….…163	
Figure	 5.10	 Performance	 accuracy	 for	 unfamiliar	 viewers	 for	 evasion,	 impersonation	
similar	 and	 impersonation	 random	 pairs	 when	 the	 images	 consisted	 of	 no	 disguise	 or	
disguise	pairs.	Error	bars	show	standard	error	of	the	mean………………………………………….169	
Figure	5.11	Performance	accuracy	of	unfamiliar	viewers	who	were	aware	of	the	disguise	
component	of	the	face-matching	task.	Error	bars	show	standard	error	of	the	mean…….174	
	Figure	 5.12	 Performance	 accuracy	 in	 the	 face-matching	 task	 for	 viewers	 who	 were	
familiar	 with	 the	 models	 whose	 images	 featured	 in	 the	 task.	 Error	 bars	 show	 standard	
error	of	the	mean…………………………………………………………………………………………………………180	
Figure	5.13	Graph	showing	performance	accuracy	for	Disguise	face	pairs	for	each	of	the	3	
Experiments:	 U	 informed	 (Experiment	 10),	 U	 uninformed	 (Experiment	 11),	 Familiar	
(Experiment	12)	……………………………………………………………………………………………………………181	
Figure	6.1	Figure	demonstrates	two	different	disguise	images	for	the	same	identity.	The	
image	on	the	left	occludes	top	part	of	the	face,	and	the	image	of	the	right	occludes	the	
bottom	part	of	the	face…………………………………………………………………………………………………198	
Figure	6.2	Word	cloud	showing	the	most	frequently	stated	words	for	creating	an	Evasion	
disguise…………………………………………………………………………………………………………………………201	
Figure	 6.3	 Word	 cloud	 showing	 the	 most	 frequently	 stated	 words	 for	 creating	 an	
Impersonation	 Similar	 disguise.	 All	 words	 represent	 similarities	 with	 the	 target	 face	
except	where	specified	as	differences….…………………………………………………………………….…202	

	

12	
	

	
	
Figure	6.4	The	model	(shown	right	in	this	image	pair)	shaved	his	beard	to	better	match	
the	appearance	of	his	target	(right)……………………………………………………………………….…..…203	
Figure	6.5	The	model	(right)	has	copied	the	eyebrows	of	the	target	(left)	using	makeup	to	
alter	eyebrow	shape……………………..……………………………………………………….………..………..…203	
Figure	 6.6	 Word	 cloud	 showing	 the	 most	 frequently	 stated	 words	 for	 creating	 an	
Impersonation	 Random	 disguise.	 All	 words	 represent	 similarities	 with	 the	 target	 face	
except	where	specified	as	differences.…….……………………………………………………………………204	
Figure	 6.7	 Graph	 showing	 correlation	 between	 effectiveness	 rating	 and	 percentage	 of	
errors	 made	 for	 each	 Evasion	 disguise	 item.	 Data	 points	 are	 spread	 horizontally	 if	 they	
would	otherwise	overlap……………………………………………………………………………………………...204	
Figure	 6.8	 Graph	 showing	 correlation	 between	 effectiveness	 rating	 and	 percentage	 of	
errors	 made	 for	 each	 Impersonation	 Similar	 disguise	 Item.	 Data	 points	 are	 spread	
horizontally	if	they	would	otherwise	overlap…………………………………………………………..……207	
Figure	 6.9	 Graph	 showing	 correlation	 between	 effectiveness	 rating	 and	 percentage	 of	
errors	 made	 for	 each	 Impersonation	 Random	 disguise	 item.	 Data	 points	 are	 spread	
horizontally	if	they	would	otherwise	overlap………………………………………..………………………207	
Figure	 6.10	 Bar	 graph	 showing	 the	 most	 frequent	 forms	 of	 disguise	 for	 Evasion	 (blue),	
Impersonation	Similar	(red)	and	Impersonation	Random	(grey).	Evasion	changes	capture	
differences	 in	 appearance	 with	 the	 reference	 photograph	 whereas	 Impersonation	
changes	represent	similarities..………………………………………………………………………………….…210	
Figure	6.11	Image	example	where	social	inferences	were	reported	to	differ	between	the	
reference	model	image	(left)	and	the	model	in	evasion	disguise	(right)..……………………..211	
Figure	 6.12	 The	 model	 (left)	 copied	 the	 distinguishing	 feature	 (mole	 [on	 the	 left	 side	 of	
the	image	under	the	mouth])	of	the	target	(right)	by	using	make	up…………………………….212	
Figure	6.13	Example	illustration	of	the	distance	calculations	made	for	the	Evasion	disguise	
condition.	Distance	moved	for	incidental	change	was	compared	with	distance	moved	for	

	

13	
	

	
	
disguise	 change	 for	 each	 of	 the	 3	 disguise	 conditions	 (Evasion,	 Impersonation	 Similar,	
Impersonation	Random)……….……………………………………………………………………………………...218	
Figure	7.1	Schematic	representations	of	the	disguise	manipulations	with	regards	to	face	
space.	Each	bubble	represents	one	individual’s	face	space.…………………………..………….….232	
	
	
	

	

	

14	
	

	
	

Acknowledgements	
	
First	I	would	like	to	express	my	sincere	gratitude	to	my	supervisor	Dr	Rob	Jenkins,	for	his	
continued	guidance	and	support,	as	well	as	for	his	enthusiasm	for	each	and	every	project	
throughout	my	PhD.		
	
I	would	also	like	to	thank	the	members	of	the	University	of	York	FaceLab	as	well	as	my	
friends	and	family	for	all	of	their	encouragement	and	helpful	advice.		
	
Finally,	 thank	 you	 to	 my	 disguise	 models	 –	 so	 much	 of	 this	 thesis	 would	 not	 have	 been	
possible	 without	 the	 phenomenal	 effort	 and	 dedication	 you	 put	 into	 creating	 your	
disguises.	 For	 that	 I	 am	 incredibly	 grateful.	 Your	 involvement	 helped	 make	 this	 PhD	 so	
enjoyable.		
	
	

	

	

	

15	
	

	
	

Declaration	
	
I	declare	that	this	thesis	is	my	own	work	carried	out	under	normal	terms	of	supervision.	
This	 work	 has	 not	 been	 previously	 been	 presented	 for	 an	 award	 at	 this,	 or	 any	 other	
University.	All	quotations	in	this	thesis	have	been	distinguished	by	quotation	marks	and	
they	 have	 been	 attributed	 to	 the	 original	 source.	 All	 sources	 are	 acknowledged	 as	
References.	
	
	

	

	

	

16	
	

	
	

Chapter	1	–	General	Introduction	
1.1 Introduction:	Why	Face	Recognition	is	Important	
Face	recognition	refers	to	the	ability	to	correctly	verify	the	identity	of	a	specific	individual,	
often	by	comparing	a	‘target’	face	against	other	face	images.	Accurate	facial	identification	
is	 important	 because	 facial	 image	 comparison	 is	 the	 basis	 of	 many	 security	
infrastructures,	such	as	passport	control.	Successful	face	recognition	is	often	critical	in	the	
identification	of	criminal	suspects,	and	avoiding	miscarriage	of	justice.		
	
The	Applied	Problem	
The	 need	 for	 face	 recognition	 research	 became	 apparent	 following	 several	 high	 profile	
cases	of	mistaken	identity.	In	England	in	the	1890s	Adolf	Beck	was	convicted	of	fraud	and	
imprisoned	as	a	result	of	erroneous	face	recognition.	Beck	was	first	imprisoned	in	1896,	
after	being	repeatedly	picked	out	in	a	police	line	up	of	face	photographs	as	being	the	man	
responsible	for	defrauding	over	20	women	in	different	attacks.	All	women	had	reported	
that	their	attacker	was	a	grey	haired	man	who	had	a	moustache.	Beck’s	face	was	the	only	
face	 in	 the	 line	 up	 to	 have	 both	 of	 these	 features.	 After	 serving	 his	 sentence	 Beck	 was	
released,	 but	 soon	 imprisoned	 after	 again	 being	 identified	 (based	 on	 eyewitness	
testimonies	and	line	up	scenarios)	for	more	attacks	similar	in	nature	to	those	prior	to	his	
prison	 sentence.	 It	 was	 only	 when	 these	 similar	 attacks	 continued	 during	 Beck’s	 second	
prison	 sentence	 that	 the	 real	 culprit	 ‘Smith’	 whose	 real	 name	 was	 Meyer,	 was	 finally	
caught.	Three	of	the	five	women	who	identified	Beck	as	the	culprit	at	the	hearing	prior	to	
his	 second	 imprisonment	 were	 called	 in	 to	 view	 photographs	 of	 Meyer.	 All	 women	
admitted	their	mistake	in	their	previous	recognition	of	Beck	and	agreed	that	Meyer	was	
the	true	attacker.	Meyer	had	been	following	the	original	Beck	case	and	had	moved	away	
to	 America	 until	 Beck	 served	 his	 sentence,	 returning	 to	 the	 UK	 and	 to	 his	 previous	
fraudulent	 activity	 following	 Beck’s	 release	 but	 was	 unaware	 that	 Beck	 had	 been	
convicted	a	second	time	(Coats,	1999).	
	

	

17	
	

	
	
Similar	high	profile	situations	of	mistaken	identity	still	exist	in	the	United	Kingdom.	On	the	
22nd	 July	 2005,	 a	 case	 of	 mistaken	 identity	 cost	 Jean	 Charles	 de	 Menezes	 his	 life.	 The	
metropolitan	police	mistook	Jean	Charles	de	Menezes,	a	Brazilian	electrician,	for	Hussain	
Osman	who	was	a	suspect	involved	in	failed	bomb	attacks,	which	had	been	carried	out	in	
London	 the	 previous	 day.	 Several	 incidents	 led	 to	 the	 police	 shooting	 Jean	 Charles	 de	
Menezes	eight	times,	believing	that	they	were	faced	with	Hussain	Osman	at	the	time	of	
shooting.	One	of	the	contributors	was	erroneous	face-matching	-	police	had	been	given	a	
photograph	of	Osman,	and	mistook	Jean	Charles	de	Menezes	to	be	him	(see	Figure	1	for	
example	 images	 of	 the	 suspect	 and	 mistaken	 suspect)	 (BBC	 News,	 2005;	 Cowan,	 2005).	
This	 tragedy	 echoed	 the	 message	 that	 when	 face	 recognition	 goes	 wrong,	 very	 serious	
consequences	can	occur.		
	
	
	
	
	
	
Figure	 1.1	 Image	 showing	 the	 face	 of	 the	 real	 suspect,	 Hussain	 Oman	 on	 the	 left	 and	 the	 face	 of	 Jean	
Charles	de	Menezes	on	the	right.		

	
In	 addition	 to	 these	 famous	 cases,	 several	 studies	 highlight	 the	 weight	 that	 eyewitness	
face	recognition	can	hold	on	a	jury	verdict.	Wells	et	al.	(1998)	found	that	out	of	the	first	
40	cases	where	later	DNA	evidence	excluded	wrongfully	convicted	suspects	as	the	culprit	
of	a	crime,	90%	of	these	wrongful	convictions	had	been	strongly	based	on	evidence	from	
erroneous	 eyewitness	 testimonies.	 Additionally,	 Huff	 (1987)	 found	 that	 mistaken	
eyewitness	identification	of	a	convicted	culprit	occurred	in	500	wrongful	convictions	that	
were	later	investigated.		

	

18	
	

	
	
	
Loftus	&	Doyle	(1992)	tested	the	weight	given	to	eyewitness	testimonies	experimentally	
by	 creating	 a	 mock	 trial	 situation.	 Two	 separate	 sets	 of	 jurors	 took	 part,	 each	 set	 were	
presented	 with	 the	 same	 evidence,	 however	 one	 of	 the	 sets	 of	 jurors	 were	 also	 given	
evidence	 from	 an	 eyewitness	 testimony	 whereas	 the	 other	 set	 of	 jurors	 received	 no	
eyewitness	testimony.	When	no	eyewitness	testimony	was	presented,	18%	of	jurors	gave	
a	 guilty	 verdict.	 This	 jumped	 to	 a	 guilty	 verdict	 of	 72%	 for	 those	 who	 received	 the	
eyewitness	testimony.	Further	still,	even	when	the	eyewitness	testimony	was	said	to	be	
inaccurate,	 68%	 of	 jurors	 delivered	 a	 guilty	 verdict.	 Jurors	 tend	 to	 overestimate	 the	
accuracy	 of	 eyewitness	 testimonies	 (Brigham	 &	 Bothwell,	 1983).	 It	 is	 important	 to	
understand	whether	these	cases	of	mistaken	identity	are	unfortunate	anomalies	in	face	
recognition	or	whether	they	are	indicative	of	a	very	serious	problem	–	people	may	not	be	
as	 good	 as	 they	 think	 they	 are	 at	 recognising	 faces.	 A	 great	 deal	 of	 research	 has	 been	
done	on	this,	which	will	be	discussed	later	in	this	chapter.		
	
Solutions	to	the	Face	Recognition	Problem	
The	examples	and	studies	discussed	above	clearly	demonstrate	that	there	are	situations	
where	 face	 recognition	 has	 been	 inaccurate	 in	 the	 past.	 Several	 procedures	 and	
technologies	 exist	 to	 aid	 identification.	 For	 example,	 passports,	 driving	 licences	 and	
university	student	cards	feature	an	image	of	the	holders	face.	This	is	intended	to	allow	a	
viewer	 to	 make	 an	 identity	 judgment	 by	 comparing	 the	 image	 of	 the	 physically	 present	
cardholder	with	the	image	on	the	card	that	they	are	holding.	If	the	images	are	considered	
a	match,	then	the	holder	will	be	granted	the	access	or	purchase	that	they	wish	to	acquire.	
If	 the	 images	 are	 not	 considered	 to	 match	 this	 would	 imply	 that	 the	 holder	 is	 using	
fraudulent	identification,	and	the	carrier	would	not	be	granted	their	access	or	purchase	
request.			
	
Additionally,	 police	 line-up	 situations	 exist	 to	 try	 and	 reduce	 the	 number	 of	 erroneous	
face	 identifications	 in	 an	 investigation.	 In	 a	 line-up	 situation,	 witnesses	 of	 a	 crime	 view	

	

19	
	

	
	
the	faces	of	several	people,	often	of	a	similar	appearance,	one	of	whom	is	the	suspect	in	a	
criminal	case.	The	witness	is	tasked	with	picking	out	the	suspect	from	the	line-up	array	of	
faces.	 Line-up	 situation	 errors	 existed	 in	 several	 of	 the	 cases	 discussed	 above,	
demonstrating	that	this	method	of	facial	identification	does	not	always	render	accurate	
results.	
	
Closed	 circuit	 television	 (CCTV)	 cameras	 have	 become	 more	 prevalent	 in	 recent	 years	
meaning	that	more	facial	images	are	stored	now	than	ever	before.	CCTV	camera	footage	
is	 used	 in	 many	 forensic	 investigations	 as	 the	 images	 can	 both	 capture	 the	 image	 of	 a	
suspect	and	track	the	suspect’s	movement.	CCTV	camera	footage	is	however	not	always	
of	good	quality,	and	images	available	for	comparison	may	come	from	multiple	cameras,	
with	breaks	in	the	footage	where	CCTV	is	not	in	operation.		
	
Demands	Experienced	by	Human	Viewers	
Although	there	are	several	methods	in	place,	which	aim	to	ensure	accurate	identification	
of	an	individual	occurs,	each	of	the	methods	places	demand	on	the	human	viewer	who	is	
required	to	make	the	identity	judgment	from	the	visual	information	available.	Document	
holder	to	card	image	identity	comparisons	and	CCTV	camera	footage	evaluation	require	a	
viewer	to	make	an	identity	judgment	by	comparing	the	images	presented	before	them	to	
determine	 whether	 the	 identities	 are	 the	 same	 person	 or	 different	 people	 across	 the	
multiple	 images.	 Line-up	 scenarios	 include	 a	 memory	 component	 –	 a	 witness	 has	 to	
compare	the	previously	seen	image	of	a	face,	which	they	have	stored	in	memory,	with	the	
images	 seen	 before	 them.	 This	 memory	 component	 to	 the	 task	 may	 affect	 face	
recognition	accuracy.	Human	ability	to	meet	each	of	these	demands	is	assessed	below.		
	
Human	Face	Memory	Ability	
The	 famous	 cases	 of	 mistaken	 identity	 discussed	 earlier	 involved	 aspects	 of	 erroneous	
face	 memory.	 To	 test	 face	 memory	 performance	 experimentally,	 Bruce	 (1982)	

	

20	
	

	
	
investigated	 human	 face	 memory	 performance	 for	 identical	 images,	 images	 changed	 in	
either	pose	or	expression	and	images	changed	in	both	pose	and	expression.	Participants	
in	the	study	viewed	images	of	24	people,	and	were	told	to	try	and	remember	the	face	of	
the	person	in	each	of	the	images	viewed	as	they	would	be	asked	to	identify	the	images	in	
a	 later	 recognition	 task,	 in	 which	 the	 identity	 may	 be	 shown	 in	 a	 changed	 image.	
Recognition	accuracy	was	highest	for	the	unchanged	images	(90%	rate),	followed	by	the	
images	changed	in	pose	or	expression	(76%	hit	rate)	and	finally	images	changed	in	both	
pose	 and	 expression	 (60.5%	 hit	 rate).	 It	 is	 important	 to	 note	 that	 in	 real	 world	 face	
recognition	 tasks	 the	 images	 for	 comparison	 are	 always	 changed	 between	 learning	 and	
viewing	the	face	at	a	later	stage.	This	study	provides	experimental	evidence	for	poor	face	
memory	performance	for	changed	face	images.	Alongside	the	example	of	famous	cases	of	
mistaken	identity	discussed	above,	it	is	clear	that	face	recognition	involving	face	memory	
is	problematic.		
	
Perceptual	Matching	
Initially,	memory	was	thought	to	be	the	whole	problem	of	face	recognition.	However,	in	
recent	years,	identification	by	face-matching	has	become	important.	In	face-matching	the	
task	is	to	compare	two	images—both	of	which	are	physically	present	–	in	order	to	make	
identity	 judgments.	 This	 is	 the	 scenario	 used	 in	 passport	 control,	 other	 identity	 checks	
involving	 photographic	 identification,	 and	 also	 by	 police	 to	 piece	 together	 an	 incident	
through	comparison	of	multiple	CCTV	images.	There	is	no	memory	component	in	this	task	
as	all	of	the	information	needed	to	make	a	decision	is	available	in	the	images	presented.	
However,	 it	 soon	 became	 clear	 that	 the	 problems	 of	 face	 recognition	 surpassed	
erroneous	memory.	
	

1.2 Familiarity	&	Face-Matching	
Face-matching	accuracy	differs	greatly	for	faces	that	a	viewer	is	familiar	with	compared	to	
performance	 for	 faces	 that	 are	 unfamiliar	 to	 the	 viewer	 (e.g	 Burton,	 Wilson,	 Cowan	 &	
Bruce,	 1999).	 This	 section	 will	 examine	 face-matching	 performance	 for	 both	 unfamiliar	
	

21	
	

	
	
viewers	 and	 familiar	 viewers	 in	 turn	 and	 then	 explore	 the	 idea	 that	 familiarity	 is	 more	
than	a	binary	variable;	investigating	research	which	treats	familiarity	as	a	graded	concept	
(Clutterbuck	&	Johnston,	2002,2004,2005).		
	
Unfamiliar	Face-Matching	
People	are	extremely	accurate	at	matching	two	identical	images	of	a	face	(Bruce,	1982).	
Face	recognition,	as	required	by	the	security	and	forensic	situations	outlined	above,	relies	
on	 the	 ability	 to	 match	 or	 identify	 a	 mismatch	 between	 different	 images	 of	 faces.	 The	
literature	 covers	 several	 unfamiliar	 face-matching	 scenarios	 including	 person	 to	
photograph	 matching	 (Kemp,	 Towell	 &	 Pike,	 1997;	 White,	 Kemp,	 Jenkins,	 Matheson	 &	
Burton,	2014),	police	line	up	scenarios	(Burton,	Wilson,	Cowan,	&	Bruce,	1999)	and	paired	
image	comparisons	(Megreya	&	Burton,	2006,	2007;	Burton,	2010).	
	
Unfamiliar	Live	Face-Matching	
The	 comparison	 of	 a	 face	 photograph	 to	 a	 physically	 present	 face	 is	 the	 method	 of	
identity	verification	that	matches	that	of	passport	control,	building	access,	shop	sales	and	
many	other	security	situations.	There	is	no	memory	component	in	such	a	task	as	both	the	
physically	 present	 cardholder	 and	 the	 identification	 card	 are	 available	 for	 direct	
comparison.	 Kemp	 et	 al.	 (1997)	 tested	 performance	 accuracy	 for	 matching	 an	 ID	 card	
holder	(physically	present)	to	a	photo	on	the	ID	card.	The	experiment	took	part	in	a	large	
supermarket	in	the	outskirts	of	London	where	participants	in	the	study	were	experienced	
supermarket	 cashiers.	 The	 cashiers’	 task	 was	 to	 determine	 whether	 the	 photograph	 on	
the	 identification	 cards	 portrayed	 the	 holders.	 Each	 cashier	 judged	 physical	 appearance	
against	the	identity	cards	for	an	average	of	44	shoppers	with	whom	they	interacted	at	the	
checkout	(interaction	was	as	in	a	normal	shopping	scenario).	Identification	performance	
was	poor	-	the	cashiers	performed	with	67%	accuracy.	More	than	half	of	fraudulent	cards	
were	 accepted	 as	 the	 holder’s	 true	 identity.	 In	 this	 experiment	 identification	 card	
photographs	 were	 taken	 just	 six	 weeks	 before	 the	 experiment.	 In	 the	 UK,	 passport	
photographs	 are	 valid	 for	 up	 to	 ten	 years	 before	 the	 photograph	 needs	 updating.	 This	

	

22	
	

	
	
means	that	the	cashier	task	could	be	even	easier	than	real	life	situations	of	cardholder	to	
image	 identity	 matching	 as	 a	 face	 may	 look	 even	 more	 different	 over	 longer	 periods	 of	
time.		
	
While	 performance	 was	 poor	 for	 the	 supermarket	 cashiers,	 one	 might	 expect	 trained	
personnel	who	work	in	security	to	excel	at	a	live	person	to	image	matching	task.	White,	
Kemp,	 Jenkins,	 Matheson	 &	 Burton	 (2014)	 recently	 tested	 passport	 officers	 face	
recognition	ability	in	a	task	similar	to	Kemp	and	colleagues	(1997)	study.	Passport	officers	
were	tested	on	their	ability	to	match	live	subjects	to	a	passport	style	photograph	that	was	
either	of	the	same	identity	or	showed	an	image	of	a	different	identity.	Just	16	female,	and	
16	 male	 student	 volunteers	 took	 part	 as	 the	 photo	 holders	 and	 identities	 for	 this	
experiment.	Different	person	trials	were	created	though	pairing	together	the	most	similar	
images	from	the	images	provided	by	the	volunteers.	As	the	volunteer	pool	was	small,	this	
greatly	 limited	 the	 ability	 to	 provide	 an	 extremely	 convincing	 foil.	 Additionally,	 all	
photographs	 were	 taken	 just	 two	 weeks	 before	 the	 experiment	 was	 carried	 out.	 These	
conditions	may	make	the	experiment	easier	than	a	real	life	situation	would	be,	as	in	real	
situations	of	identity	fraud	people	may	have	a	large	sample	of	available	different	person	
passports	to	chose	from,	and	additionally	for	same	person	trials,	people	may	look	more	
different	in	their	real	passport	photograph	than	they	did	in	the	image	used	in	this	task	as	
less	time	had	passed	since	the	time	of	photograph,	leaving	less	time	for	natural	changes	
as	 a	 result	 of	 aging.	 Nevertheless,	 performance	 in	 the	 matching	 task	 was	 poor.	 On	
average,	 passport	 officers	 made	 errors	 in	 10%	 of	 cases,	 with	 14%	 of	 different	 identity	
pairs	 being	 mistakenly	 accepted	 as	 same	 people.	 Years	 of	 experience	 and	 training	 in	
matching	 faces	 made	 no	 difference	 to	 performance.	 Despite	 overall	 levels	 of	 poor	
performance,	 there	 were	 some	 individuals	 who	 performed	 very	 well	 on	 the	 task	 (see	
Figure	 1.2).	 A	 photo-to-photo	 matching	 test	 was	 then	 conducted	 to	 compare	 passport	
officer	 performance	 with	 that	 of	 undergraduate	 student	 participants.	 Student’s	
performance	 was	 indistinguishable	 from	 that	 of	 the	 passport	 officers.	 This	 study	
highlights	 the	 difficulties	 of	 live-person	 to	 image	 matching,	 and	 also	 demonstrates	 that	
passport	 officers,	 a	 highly	 trained	 group,	 are	 no	 better	 than	 undergraduates	 at	 facematching.		
	

23	
	

	
	
	
	
	
	
	
	
	
	
	
	
	
Figure	1.2	Face-matching	performance	of	passport	officers	in	the	study	by	White	et	al.	(2014).	Some	officers	
perform	with	very	high	performance	regardless	of	their	employment	duration.		

Line-ups	
In	 addition	 to	 live	 person	 to	 image	 matching,	 poor	 face-matching	 performance	 has	 also	
been	 reported	 in	 experiments	 that	 replicate	 the	 police	 line	 up	 scenario.	 Bruce,	
Henderson,	Greenwood	&	Hancock	(1999)	created	a	face-matching	task	based	on	a	police	
line	 up	 situation.	 Participants	 had	 to	 match	 a	 given	 photograph	 of	 a	 face	 to	 one	 of	 the	
face	images	from	a	line	up	of	10	face	images	presented	in	an	array	below	the	target	face	
(see	Figure	1.3).	Participants	were	told	that	the	target	face	may	or	may	not	be	present	in	
the	array	line-up,	as	would	be	the	case	in	a	real	criminal	line-up	situation.	Notably,	unlike	
a	real	line-up	situation,	no	memory	component	was	involved	for	the	viewer	of	the	faces.	
All	face	photographs	were	taken	on	the	same	day,	in	the	same	lighting,	and	in	the	same	
pose.	The	only	deliberate	modification	between	images	was	that	the	target	photograph	
was	 taken	 with	 a	 different	 camera	 to	 the	 array	 photographs.	 Despite	 the	 consistencies	
between	the	target	photograph	and	the	correct	match	in	the	array	line-up,	performance	

	

24	
	

	
	
was	 poor.	 For	 both	 the	 target-present	 and	 target-absent	 array	 scenarios,	 accuracy	 was	
70%.	Performance	accuracy	for	correctly	matching	a	target	to	the	same	identity	present	
in	the	array	remained	poor	even	when	participants	knew	that	the	target	face	was	present	
in	the	array	(Bruce	et	al.	1999;	Bruce,	Henderson,	Newman	&	Burton,	2001).			
	
	
	
	
	
	
	
	
	
	
Figure	 1.3	 Example	 of	 line	 stimuli	 taken	 from	 Bruce	 et	 al.	 (1999)	 line	 up	 task.	 The	 correct	 match	 of	 the	
target	face	is	face	number	3.		
	

Paired	Face-Matching	
More	recently,	work	on	face-matching	ability	has	focused	on	paired	matching	paradigms,	
in	 which	 viewers	 make	 identity	 judgments	 (same	 or	 different)	 for	 two	 faces	 presented	
simultaneously.	The	viewer’s	task	is	to	say	whether	the	two	faces	in	the	pair	presented	
depict	the	same	person	or	different	people.	Face-matching	by	image	is	a	good	model	of	
real	 world	 tasks	 and	 has	 important	 practical	 applications.	 For	 example,	 police	
investigations	often	require	accurate	matching	of	an	image	taken	from	CCTV	footage	to	
another	image	of	the	suspect,	such	as	a	mug	shot.	Alternatively,	two	different	face	images	
taken	 from	 different	 CCTV	 footage	 may	 be	 compared	 in	 order	 to	 track	 a	 person’s	

	

25	
	

	
	
movement.	As	in	the	experiments	described	above,	the	face-matching	paradigm	involves	
no	memory	component	–	both	images	are	viewed	simultaneously	and	the	viewer’s	task	is	
to	 say	 whether	 the	 faces	 presented	 are	 of	 the	 same	 person’s	 face	 or	 are	 faces	 of	 two	
different	people.		
	
To	 provide	 a	 standardised	 measure	 of	 face-matching	 ability,	 Burton,	 White	 &	 McNeil	
(2010)	 designed	 a	 task	 known	 as	 the	 Glasgow	 Face-Matching	 Test	 (GFMT).	 The	 images	
used	to	construct	this	test	were	taken	at	a	constant	viewing	distance	and	same	camera	
angle,	under	constant	lighting	conditions,	and	the	expression	of	the	face	was	unchanged	
across	photographs	of	the	same	identity	(see	Figure	1.4	for	examples).	Photographs	of	the	
same	individual	were	taken	only	minutes	apart,	with	no	deliberate	attempt	to	change	the	
model’s	appearance	in	any	way	between	photographs.	Thus,	images	should	have	been	of	
optimal	condition	to	allow	accurate	comparison.	Different	person	trials	were	created	by	
pairing	the	faces	with	the	most	similar	other	person’s	face	image	out	of	those	available.	
Two	versions	of	the	GFMT	were	created	–	the	full	version	(160	items),	and	a	short	version	
consisting	of	the	40	most	difficult	items	from	the	full	version.	Participants	made	over	10%	
of	errors	in	the	full	version.	This	performance	is	considered	low,	especially	considering	the	
controls	 taken	 to	 make	 the	 images	 as	 similar	 to	 each	 other	 as	 possible	 and	 given	 that	
participants	experienced	no	time	limit	while	completing	the	task.	The	short	version	of	the	
task	rendered	poorer	results,	with	participants	making	nearly	20%	errors.		
	
	
	
	
	
	
	

	

26	
	

	
	
	
	
	
	
	
	
Figure	1.4	Examples	of	face	pairs	taken	from	the	GFMT	(Burton,	et	al.	2010)	Top	row	show	different	identity	
pairs,	bottom	row	are	same	identity	pairs.		

	
In	 summary,	 these	 studies	 successfully	 and	 repeatedly	 demonstrated	 the	 real	 world	
problem	of	poor	face-matching	in	a	lab	environment.	However,	these	findings	go	against	
most	 people’s	 intuition.	 People	 tend	 to	 believe	 that	 they	 are	 good	 at	 face	 recognition	
(Jenkins	&	Burton,	2011).	It	is	important	to	remember	the	vast	majority	of	face-matching	
that	 people	 are	 faced	 with	 on	 a	 daily	 basis	 involves	 familiar	 faces.	 People	 experience	
regular	and	repeated	success	in	recognising	family	members	and	friends,	and	do	so	with	
very	little	difficulty.		
	
Familiar	Face	Matching	
People	are	extremely	good	at	matching	familiar	faces,	even	when	image	quality	is	poor.	
To	test	this,	Burton,	Wilson,	Cowan	&	Bruce	(1999)	designed	a	face-matching	task	using	
poor	quality	CCTV	footage	(see	Figure	1.5).	This	matching	task	was	presented	to	people	
who	were	familiar	with	the	faces	in	the	CCTV	footage,	and	also	to	a	group	of	unfamiliar	
viewers.	Familiar	viewers	could	match	the	faces	in	the	task	with	almost	perfect	levels	of	
accuracy,	whereas	unfamiliar	viewers	performed	at	chance	level	or	highly	inaccurately	on	
the	 exact	 same	 task	 (Bruce,	 Henderson,	 Newman	 &	 Burton,	 (2001);	 Burton,	 Wilson,	
Cowan	&	Bruce	(1999).	

	

27	
	

	
	
	
	
	
	
	
	
	
Figure	 1.5	 Examples	 of	 face	 pairs	 in	 the	 matching	 task	 involving	 poor	 quality	 CCTV	 footage	 in	 the	
experiments	 conducted	 by	 Burton	 et	 al.	 (1999).	 Familiar	 viewers	 performed	 with	 high	 accuracy	 when	
matching	the	face	pairs.				

	
This	 familiarity	 advantage	 can	 be	 achieved	 quickly.	 Megreya	 and	 Burton	 (2006)	
familiarised	participants	with	faces	by	showing	participants	30	second	long	video	clips	of	
the	 40	 identities	 that	 the	 participants	 were	 told	 to	 try	 and	 learn	 as	 they	 would	 see	 the	
faces	again	in	a	subsequent	face-matching	task.	Even	after	this	brief	familiarisation	with	a	
target	face,	participants	performed	better	in	the	task	in	trials	involving	familiarised	faces	
over	 novel	 faces.	 However	 this	 advantage	 only	 existed	 for	 upright	 faces.	 When	 the	
familiarised	 face	 images	 were	 inverted,	 participants	 performed	 better	 on	 the	 unfamiliar	
upright	faces	than	the	inverted	versions	of	the	familiarised	face	images.	This	is	likely	due	
to	 the	 range	 of	 experience	 that	 a	 viewer	 has	 with	 a	 face	 and	 is	 a	 topic	 which	 will	 be	
addressed	later	in	this	chapter.	Megreya	&	Burton	(2006)	concluded	that	unfamiliar	faces	
are	 “processed	 for	 identity	 in	 a	 qualitatively	 different	 way	 than	 are	 familiar	 faces”.	 The	
authors	 suggested	 that	 unfamiliar	 faces	 are	 processed	 in	 a	 similar	 way	 to	 other	 objects	
whereas	we	have	a	special	ability	for	faces	when	they	become	familiar,	which	relies	on	a	
different	processing	system.	
	
	

	

28	
	

	
	
Familiarity	as	a	Graded	Concept	
Many	 past	 studies	 have	 classified	 faces	 as	 either	 familiar	 or	 unfamiliar.	 Recently,	
experiments	 have	 demonstrated	 a	 graded	 nature	 to	 familiarity,	 with	 familiarity	 with	 a	
face	 increasing	 as	 exposure	 to	 that	 face	 increases.	 To	 test	 for	 a	 graded	 nature	 of	
familiarity,	Clutterbuck	&	Johnston	(2002)	divided	the	familiarity	of	faces	viewed	in	their	
study	 into	 3	 levels,	 rather	 than	 treating	 familiarity	 as	 a	 binary	 variable.	 Faces	 in	
Clutterbuck	&	Johnston’s	(2002)	study	were	described	as	being	highly	familiar,	of	medium	
familiarity	 and	 unfamiliar.	 Familiarity	 distinctions	 for	 these	 celebrity	 face	 images	 were	
made	 by	 a	 group	 of	 eight	 independent	 raters.	 The	 high	 and	 medium	 familiarity	 group	
were	created	based	on	the	mean	familiarity	ratings	for	each	face.	Unfamiliar	faces	were	
faces	 that	 were	 not	 famous	 and	 were	 not	 known	 by	 the	 participants.	 The	 results	
demonstrated	that	participants	were	fastest	at	accurately	matching	full	faces	to	internal	
features	of	a	face	for	the	faces	for	the	highly	familiar	faces	in	the	study	(Figure	1.6).	Speed	
of	 correct	 response	 decreased	 as	 familiarity	 level	 decreased	 across	 each	 of	 the	 three	
familiarity	 groups.	 All	 of	 the	 faces	 used	 for	 the	 familiar	 and	 medium	 familiarity	 groups	
belonged	to	celebrities.	By	breaking	familiarity	into	3	levels	this	study	begins	to	separate	
the	level	of	familiarity	of	these	celebrities	which	begins	to	address	the	issue	that	not	all	
celebrities	 are	 equally	 well	 known.	 However,	 this	 method	 fails	 to	 address	 that	 different	
participants	in	the	study	may	be	more	or	less	familiar	with	different	faces	to	each	other.		
	
	
	
	
	
	
Figure	1.6	Examples	of	full-face	image	on	the	left	was	to	be	matched	with	either	of	the	internal	face	images	
on	the	right.	Familiarity	aided	matching	accuracy	on	this	task.		

	

29	
	

	
	
Clutterbuck	 &	 Johnston	 (2004)	 also	 demonstrate	 a	 graded	 familiarity	 advantage	 for	
accuracy	in	a	gender	judgment	task.	The	authors	again	distinguished	between	three	levels	
of	 familiarity	 but	 constructed	 the	 familiarity	 groups	 in	 a	 different	 way	 to	 their	 previous	
(Clutterbuck	 &	 Johnston,	 2002)	 study.	 Faces	 in	 the	 gender	 judgment	 tasks	 were	 either	
previously	 familiar	 (know	 to	 the	 participant	 before	 the	 experiment),	 of	 acquired	
familiarity,	 or	 unfamiliar	 (never	 seen	 before	 the	 testing	 phase	 of	 the	 experiment).	
Participants	achieved	acquired	familiarity	by	viewing	sets	of	previously	unfamiliar	images	
for	two	second	periods,	with	each	face	image	viewed	a	total	of	ten	times.	The	results	of	
the	study	were	that	participants	were	best	at	the	identity	judgment	task	for	faces	which	
they	were	previously	familiar	with	(familiarity	was	measured	by	eight	independent	raters	
as	 in	 Clutterbuck	 &	 Johnston,	 2002),	 and	 gender	 judgement	 performance	 was	 of	 higher	
accuracy	 for	 acquired	 familiarity	 faces	 than	 for	 the	 unfamiliar	 faces.	 This	 acquired	
familiarity	method	gave	a	similar	pattern	of	results	for	face-matching	speed	(Clutterbuck	
&	Johnston,	2005).	Participants	were	fastest	at	correctly	rejecting	mismatch	faces	when	
they	were	previously	familiar	with	them,	followed	by	those	for	which	they	had	learnt	as	
part	 of	 the	 experiment,	 and	 slowest	 for	 faces	 which	 were	 unfamiliar	 at	 test.	 The	 three	
experiments	conducted	by	Clutterbuck	&	Johnston	(2002,	2004,	2005)	suggest	that	face	
recognition	is	influenced	not	only	by	whether	a	face	is	familiar	or	not,	but	by	the	level	of	
the	viewer’s	familiarity	with	the	face.	
	

1.3 What	Information	is	used	to	Recognise	a	Face?	
The	literature	has	shown	that	people	get	better	at	recognising	faces	when	they	learn	the	
faces	in	question,	but	it	is	unclear	what	information	from	an	image	viewers	actually	use	to	
make	identity	judgments	of	faces.		
	
There	 has	 been	 much	 debate	 in	 the	 face	 recognition	 literature	 over	 which	 information	
from	 a	 face	 allows	 the	 face	 to	 be	 recognised.	 The	 main	 lines	 of	 debate	 over	 the	
information	used	within	a	face	to	aid	identification,	focus	on	a	featural	versus	configural	
processing	account.	Featural	processing	involves	scanning	the	face	for	features	(e.g.	eyes,	

	

30	
	

	
	
nose,	mouth),	and	then	using	the	information	from	the	detail	of	these	features	to	decide	
whether	 the	 face	 is	 known.	 Configural	 processing	 on	 the	 other	 hand,	 argues	 that	 the	
spatial	 layout	 between	 features	 is	 what	 is	 important	 for	 successful	 face	 recognition.	
According	to	the	holistic	processing	theory,	faces	are	identified	‘as	a	whole	entity’,	with	
both	 a	 match	 in	 configural	 and	 featural	 information	 being	 important	 for	 recognition	
(Sergent,	1984;	Bruce,	Hellawell	&	Hay,	1987).	Studies	have	generally	taken	the	approach	
that	 if	 a	 manipulation	 applied	 to	 a	 face	 image	 slows	 down	 and/or	 impairs	 face	
recognition,	 then	 whatever	 was	 changed	 about	 the	 face	 was	 important	 for	 the	 face	
recognition	process	(Donders,	1868/1969).		
	
Featural	Accounts	
Bradshaw	 &	 Wallace	 (1971)	 provided	 one	 of	 the	 first	 studies	 to	 suggest	 that	 face	
recognition	 is	 based	 primarily	 on	 the	 recognition	 of	 facial	 features.	 Their	 experiment	
comprised	 of	 images	 of	 identikit	 faces	 -	 faces	 made	 up	 of	 face	 elements	 [individual	
features],	which	can	be	combined	to	make	a	face.	This	method	is	used	by	police	to	create	
an	image	of	a	suspect	based	on	a	witnesses’	description	of	the	suspect’s	face.	Participants	
in	 Bradshaw	 and	 Wallace’s	 (1971)	 study	 viewed	 pairs	 of	 identikit	 faces	 and	 judged	
whether	 the	 image	 pairs	 presented	 showed	 the	 same	 face	 or	 different	 faces.	 Different	
parings	differed	in	the	number	of	shared	features	in	the	two	images.	For	different	person	
trials,	accurate	identity	judgments	were	made	more	quickly	as	the	number	of	differences	
in	 features	 between	 the	 faces	 increased.	 On	 the	 basis	 of	 this	 result	 it	 seemed	 that	
participants	scanned	the	face	images	until	they	found	a	mismatch	of	features	across	the	
image	 pairs.	 Thus,	 it	 was	 concluded	 that	 the	 results	 were	 best	 explained	 through	 a	
featural	 account	 of	 face	 recognition.	 An	 earlier	 experiment	 by	 Smith	 &	 Neilsen	 (1970)	
supported	this	result.	They	reported	that	as	the	number	of	featural	differences	between	
pairs	 of	 faces	 was	 increased,	 there	 was	 a	 decrease	 in	 the	 time	 needed	 to	 tell	 the	 two	
faces	apart.	This	suggests	that	participants	were	using	a	feature	comparison	technique	to	
aid	their	identity	judgment	of	the	face	images.	These	studies	looked	only	at	performance	
for	unfamiliar	faces,	therefore	they	do	not	provide	information	on	what	is	learnt	during	
the	familiarisation	process	with	a	face.		

	

31	
	

	
	

Figure	 1.7	 Example	 of	 test	 stimuli	 used	 by	 Tanaka	 &	 Farah	 (1993),	 which	 shows	 examples	 of	 isolated	
features,	intact	faces	and	scrambled	faces.			

	
Further	 evidence	 for	 a	 featural	 processing	 account	 comes	 from	 studies	 involving	
scrambled	 faces.	 Scrambled	 face	 images	 contain	 all	 the	 featural	 information	 as	 in	 an	
intact	 face	 image,	 however	 the	 features	 appear	 in	 a	 jumbled	 order.	 Tanaka	 &	 Farah	
(1993)	 tested	 whether	 participants	 were	 able	 to	 recognise	 learned	 identities	 from	
scrambled	 face	 images,	 or	 even	 from	 viewing	 a	 feature	 from	 a	 face	 in	 isolation	 of	 the	
whole	 face.	 They	 reported	 that	 participants	 made	 identity	 judgments	 with	 moderate	
accuracy	 in	 the	 context	 of	 a	 scrambled	 face	 and	 also	 when	 features	 belonging	 to	 a	
previously	seen	identity	were	viewed	in	isolation	of	the	rest	of	the	face	(see	Figure	1.7).	
This	suggests	that	features,	regardless	of	their	configuration	or	even	the	presence	of	the	
rest	of	the	face,	provide	information	which	aids	face	recognition	(Bruyer	&	Coget,	1987;	
Tanaka	&	Farah,	1993).		
	
Configural	Processing	
In	addition	to	the	research	that	argues	for	the	importance	of	featural	information	in	face	
recognition,	 there	 is	 a	 wealth	 of	 research	 that	 configural	 processing	 is	 important.		
Theories	 of	 configural	 processing	 suggest	 that	 the	 layout	 of	 features	 within	 a	 face	 (1st	
order	configural	processing),	specifically	the	metric	distances	between	features	in	a	face	
(2nd	 order	 configural	 processing)	 are	 learnt	 and	 used	 in	 face	 recognition.	 For	 example,	
Haig	(1984)	tested	the	effect	of	digitally	editing	the	distance	between	features	of	a	face,	
without	 editing	 the	 features	 themselves	 in	 any	 way,	 on	 face	 recognition	 performance.	
Haig	(1984)	found	that	even	when	featural	information	was	present	in	a	highly	accessible	

	

32	
	

	
	
form	 (see	 Figure	 1.8	 for	 example	 stimuli),	 recognition	 for	 unfamiliar	 faces	 was	 severely	
dampened	 when	 subtle	 changes	 were	 applied	 to	 the	 relationship	 of	 the	 layout	 of	 the	
facial	features.	This	supports	theories	of	configural	processing.		

	
Figure	 1.8	 Example	 images	 of	 one	 identity	 with	 features	 changed	 in	 configuration	 across	 the	 different	
images	 taken	 from	 the	 study	 by	 Haig	 (1984).	 Features	 themselves	 remain	 intact,	 although	 the	 exact	
distances	between	features	are	changes.			

	
Many	studies	on	this	topic	are	based	on	a	similar	logic;	any	manipulation	that	is	seen	to	
reduce	 recognition	 accuracy	 or	 speed	 is	 thought	 to	 be	 important	 for	 the	 recognition	
process.	It	is	believed	that	configural	information	is	harder	to	access	in	an	upside	down	
image	 than	 in	 the	 upright	 form	 that	 people	 have	 everyday	 experience	 with	 (Yin,	 1969).	
Familiar	orientation	had	been	found	to	be	important	for	past	studies	involving	object	and	
letter	recognition	(Henle,	1942;	Ghent,	1960),	and	it	is	argued	that	the	familiarity	factor	
for	upright	orientation	also	exists	for	face	recognition	(Yin,	1969).	Therefore,	if	configural	
information	 is	 important	 for	 face	 recognition,	 performance	 accuracy	 would	 drop	 in	 an	
inverted	 face	 memory	 task,	 as	 configural	 information	 is	 more	 accessible	 in	 an	 inverted	
than	upright	face	image.	Yin	(1969)	found	exactly	that	result.	In	general,	people	show	a	
bias	 for	 remembering	 upright	 objects,	 with	 various	 upside-down	 objects	 proving	 more	
difficult	to	remember	(Goldstein,	1965;	Hochberg	&	Galper,	1967),	however,	faces	were	

	

33	
	

	
	
disproportionally	 affected	 by	 inversion,	 with	 poorer	 results	 for	 remembering	 faces	 than	
objects	in	the	inverted	image	memory	task	(Maurer,	Grand	&	Mondloch,	2002;	Rossion,	
2009).	Poorer	recognition	performance	for	inverted	than	upright	faces	suggests	a	role	for	
configural	information	in	face	recognition.	Again,	this	finding	was	for	unfamiliar	faces.		
	
Despite	 the	 arguments	 for	 configural	 processing	 provided	 by	 the	 studies	 discussed,	 the	
usefulness	 of	 the	 content	 has	 been	 queried.	 Several	 studies	 show	 that	 configural	
information	 in	 a	 face	 can	 be	 changed,	 and	 faces	 still	 successfully	 recognised.	 Hole,	
George,	Eaves	&	Rasek	(2002)	investigated	whether	familiar	face	recognition	was	affected	
by	 drastic	 changes	 in	 the	 featural	 layout	 of	 the	 face.	 They	 reported	 that	 faces	 could	
undergo	 vertical	 stretching	 of	 twice	 their	 original	 height,	 with	 no	 change	 in	 face	
recognition	 performance	 for	 these	 face	 images	 (see	 Figure	 1.9	 for	 example	 of	 image	
stretching).	Bindemann,	Burton,	Leuthold	&	Schweinberger	(2008)	added	to	this	finding,	
as	they	demonstrated	that	the	N250r	ERP	response	to	faces	is	unaffected	by	stretching	of	
the	face.	These	findings	suggests	that	neural	processes	involved	in	face	recognition,	and	
also	 behavioural	 face	 recognition,	 are	 unaffected	 by	 the	 configural	 changes	 caused	 by	
stretching,	 suggesting	 that	 consistent	 configural	 information	 is	 not	 important	 for	 these	
processes.			
	
	
	
	
	
Figure	 1.9	 Example	 stimuli	 from	 Hole	 et	 al.	 (2002)	 showing	 original	 image,	 vertical	 stretch	 of	 150%	 and	
200%.		
	

The	 configural	 account	 has	 also	 rendered	 failure	 in	 practical	 application.	 Configural	
processing	 has	 failed	 to	 provide	 adequate	 facial	 identification	 results	 in	 both	 early	
	

34	
	

	
	
automated	 face	 recognition	 systems	 (Kelly,	 1970;	 Kanade,	 1973),	 and	 more	 recently	
Kleinberg,	Vanezis,	&	Burton,	(2007)	presented	evidence	that	anthropometry	methods	fail	
to	identify	targets	in	forensic	investigations.	
	
Are	Both	Important?	
Collishaw	&	Hole	(2000)	investigated	the	effect	of	a	range	of	face	image	manipulations	on	
face	recognition	performance	for	both	familiar	and	unfamiliar	faces	(Figure	1.10).	Two	of	
these	 manipulations	 –	 scrambled	 and	 inverted	 (upside	 down)	 faces	 –	 disrupted	 the	
configural	 information	 in	 an	 image,	 but	 kept	 the	 features	 themselves	 intact.	 Another	 –	
blurring	–	made	it	very	difficult	to	access	the	featural	information	in	a	face.	It	was	found	
that	 the	 identities	 in	 the	 image,	 regardless	 of	 the	 viewers	 familiarity	 with	 the	 identity,	
could	be	recognised	‘much	of	the	time’	in	all	of	these	situations.	Thus,	suggesting	that	as	
long	 as	 one	 of	 these	 processing	 methods	 is	 available,	 face	 recognition	 can	 take	 place.	
Further	 image	 conditions	 –	 blurred	 and	 scrambled,	 blurred	 and	 inverted	 –	 were	
recognised	 at	 levels	 of	 around	 chance.	 This	 finding	 reaffirms	 the	 need	 for	 one	 of	 these	
routes	of	recognition	to	be	present	for	successful	recognition	to	occur.		
	
		
	
	
	
	
	
Figure	 1.10	 Example	 stimuli	 from	 Collishaw	 &	 Hole	 (2000).	 Top	 row,	 from	 left	 to	 right	 images	 show	 the	
following	 image	 conditions:	 intact,	 inverted,	 scrambled.	 Bottom	 row	 images	 show:	 blurred,	 blurred	 and	
scrambled,	blurred	and	inverted.		

	
	

35	
	

	
	
In	 summary,	 evidence	 points	 to	 neither	 featural	 or	 configural	 accounts	 providing	 a	 full	
answer	to	the	face	recognition	problem	for	both	unfamiliar	and	familiar	face	recognition.	
Images	 can	 be	 difficult	 to	 recognise	 when	 featural	 information	 is	 present	 but	
configuration	is	changed,	and	recognition	performance	can	also	be	poor	when	configural	
information	 is	 intact,	 such	 as	 in	 photo-negatives	 and	 sketch	 drawings	 (Galper,	 1970;	
Davies,	 Ellis	 &	 Shepherd,	 1978;	 Rhodes,	 Brennan	 &	 Carey,	 1987;	 Bruce,	 1992;	 Kemp,	
1996).	 It	 is	 also	 possible	 that	 recognition	 relies	 somewhat	 on	 texture	 information	 –	
something	ignored	by	the	configural	and	featural	accounts	(Kemp,	1996;	Bruce,	1992).		
	

1.4 Learning	Variability		
It	is	possible	that	what	is	being	learnt	when	becoming	familiar	with	a	face,	is	not	in	fact	a	
specific	factor	such	as	the	exact	details	of	features	themselves	or	of	facial	configuration,	
but	instead	an	accepted	range	of	appearances	for	any	given	face.	People	are	very	good	at	
recognising	 face	 images	 for	 people	 they	 are	 familiar	 with,	 providing	 any	 new	 image	 is	
similar	to	those	previously	experienced,	but	are	poor	at	recognising	people	in	situations	
beyond	this	experience	(Bruce,	1994).	People	vary	in	appearance	naturally	across	multiple	
images.	 This	 can	 be	 a	 result	 of	 changes	 in	 expression,	 pose,	 hairstyle,	 skin	 complexion,	
clothing	 and	 many	 other	 variables.	 Bruce	 (1982)	 demonstrated	 changes	 in	 pose	 or	
expression	 between	 items	 learned	 and	 presented	 at	 a	 later	 recognition	 test,	 lowered	
recognition	 accuracy	 for	 unfamiliar	 viewers.	 Unfamiliar	 viewers	 could	 rely	 only	 on	 the	
information	 from	 their	 one	 previous	 exposure	 with	 the	 face	 image,	 and	 could	 not	
generalise	 this	 exposure	 to	 changes	 of	 expression	 and	 viewpoint.	 Familiar	 viewers	
performed	 accurately	 on	 the	 task	 despite	 these	 image	 changes,	 as	 they	 could	 call	 on	 a	
large	 number	 of	 previous	 encounters	 of	 the	 face	 under	 different	 conditions	 to	 aid	 their	
identity	 judgments	 (Bruce,	 1982;	 Burton	 et	 al.	 1999).	 Familiar	 viewers	 cannot	 however	
generalise	 past	 experience	 with	 a	 face	 to	 aid	 recognition	 in	 the	 case	 of	 novel	 face	
transformations.	 For	 example,	 familiar	 viewers	 have	 been	 found	 to	 perform	 poorly	 in	
recognition	tasks	of	photo	negative	faces	and	images	changed	in	pigmentation	(Bruce	&	
Langton,	1994)	–	these	changes	fell	outwith	the	previous	range	of	experience	acquired	for	
the	familiar	faces.		

	

36	
	

	
	
Further	evidence	to	suggest	that	familiar	viewers	store	an	accepted	range	of	experiences	
from	 a	 face	 come	 from	 studies	 on	 facial	 configuration.	 Bruce,	 Doyle,	 Dench	 &	 Burton	
(1991)	 presented	 participants	 with	 a	 series	 of	 face	 images	 which	 were	 generated	 using	
facial	image	software	Mac-a-Mug.	Different	versions	of	the	same	face	image	were	created	
to	provide	various	configurations	of	the	same	face	(e.g.	the	features	could	be	moved	up,	
down,	further	apart	or	closer	together).	Participants	were	later	presented	with	a	series	of	
test	image	pairs,	and	asked	to	identify	which	image	from	each	of	the	pairs	was	identical	to	
an	image	that	they	had	seen	earlier.	Accuracy	was	at	ceiling	levels	for	recognition	of	an	
identical	 face.	 Faces	 where	 configurations	 lay	 outwith	 that	 in	 the	 previously	 seen	 face	
images	 were	 not	 recognised	 at	 test.	 However,	 the	 central	 image	 of	 those	 previously	
viewed	(although	never	itself	presented),	was	also	rated	as	highly	familiar,	demonstrating	
that	familiar	viewers	would	accept	face	images	which	fell	inside	the	range	of	face	images	
they	had	experienced	for	a	face,	but	reject	any	faces	which	fell	outside	this	experienced	
range	(Bruce	et	al.,	1991).	More	recently,	Sandford	and	Burton	(2014)	asked	participants	
to	resize	an	altered	image	of	a	face	to	make	the	face	‘look	right’.	Familiar	viewers	were	no	
more	 accurate	 than	 non-familiar	 viewers	 at	 this	 task.	 If	 configuration	 does	 not	 remain	
constant	for	faces	across	various	images,	then	it	makes	sense	that	there	would	not	be	just	
one	accurate	configuration,	but	a	range	of	accepted	configurations	that	could	be	stored.			
	
The	 effect	 that	 learning	 variability	 has	 on	 face-matching	 accuracy	 is	 demonstrated	 in	 a	
study	 by	 Jenkins	 White,	 Montfort	 &	 Burton	 (2011).	 First	 they	 highlighted	 the	 difficulty	
that	 variability	 between	 photographs	 of	 the	 same	 face	 causes	 in	 unfamiliar	 facematching.	 Jenkins	 and	 colleagues	 (2011)	 challenged	 participants	 to	 separate	 images	 of	
cards	into	piles	according	to	identity.	Images	of	the	same	identity	were	to	be	placed	in	the	
same	pile.	There	were	in	fact	only	2	identities	-	20	images	of	each.	The	2	identities	were	
Dutch	celebrities	who	were	well	known	in	the	Netherlands	but	unfamiliar	faces	for	British	
viewers.	 Unfamiliar	 (British)	 participants	 averaged	 rating	 of	 the	 number	 of	 identities	
present	among	the	cards	was	7.5.	Familiar	viewers	(Dutch	participants)	were	then	asked	
to	 do	 the	 exact	 same	 task,	 and	 easily	 and	 accurately	 separated	 the	 faces	 into	 the	 two	
identities	present.	This	study	highlights	the	range	of	variation	that	familiar	viewers	have	

	

37	
	

	
	
stored	 for	 an	 identity	 and	 the	 advantage	 this	 holds	 in	 identity	 matching.	 Figure	 (1.11)	
demonstrates	some	of	the	many	ways	that	the	same	faces	can	vary	across	images.		

Figure	 1.11	 Example	 of	 the	 card-sorting	 scenario.	 Figure	 shows	 20	 images	 of	 two	 different	 identities.	
Familiar	viewers	find	this	task	easy	whereas	unfamiliar	viewers	generally	believe	that	there	are	more	than	
two	identities	present.			

Face	Space	Theory	
One	 influential	 concept	 for	 understanding	 face	 recognition	 by	 variation	 is	 the	 theory	 of	
face	space.	It	is	possible	that	when	a	new	face	is	encountered,	or	a	previously	viewed	face	
takes	on	an	appearance	different	to	the	way	it	looked	before,	the	mental	representation	
for	that	face	image	is	then	stored	in	that	person’s	‘face	space’.	Valentine	(1991)	was	first	
to	 propose	 the	 face	 space	 model	 for	 faces.	 Valentine	 argued	 for	 a	 single,	 although	
multidimensional,	face	space,	in	which	faces	for	all	identities	are	stored.	Faces	could	be	
stored	as	a	single	averaged	image	for	each	identity,	or	individual	face	images	could	all	be	
stored	 within	 this	 face	 space.	 Valentine	 believed	 that	 faces	 were	 organised	 within	 this	
space	by	dimensions	specific	to	faces.	Specifically,	faces	were	proposed	to	cluster	within	
face	space	in	ways	which	explain	phenomena	of	face	recognition	including	the	own	race	
bias,	 and	 typicality	 effects	 (Valentine,	 1991).	 There	 are	 however	 shortcomings	 of	 this	
proposal,	 for	 example	 Burton	 &	 Vokey	 (1998)	 argue	 that	 most	 of	 the	 faces	 within	

	

38	
	

	
	
Valentine’s	 proposed	 face	 space	 are	 located	 away	 from	 the	 centre	 of	 face	 space,	
therefore	there	are	very	few	‘typical’	faces.		
	
An	 alternative	 formulation	 of	 face	 space	 would	 be	 that	 instead	 of	 one	 face	 space	
encapsulating	all	faces,	in	order	to	account	for	variation	within	each	individual	face,	each	
face	 may	 have	 its	 own	 face	 space.	 Each	 face	 space	 could	 be	 thought	 of	 as	 a	
multidimensional	‘bubble’	that	encapsulates	all	the	ways	and	range	of	ways	that	one	face	
is	accepted	to	look.	Any	new	image	would	then	be	compared	against	the	representation	
held	 in	 face	 space,	 for	 that	 face,	 and	 against	 all	 other	 stored	 identities’	 face	 spaces,	 in	
order	to	make	an	identity	judgment.	Expertise	would	be	acquired	one	face	at	a	time,	not	
for	all	faces	as	a	class	of	stimuli.		
	
This	 formulation	 is	 in	 line	 with	 the	 familiarity	 advantage,	 as	 familiar	 viewers	 have	 had	
more	 encounters	 with	 a	 face	 or	 been	 exposed	 to	 the	 face	 across	 more	 varied	 viewing	
conditions	 than	 an	 unfamiliar	 viewer.	 Thus,	 when	 a	 person	 is	 becoming	 familiar	 with	 a	
face	they	are	learning	the	different	appearances	that	the	specific	face	can	take,	and	this	
refines	 the	 face	 space	 held	 for	 that	 individual.	 If,	 as	 I	 suggest,	 identity	 judgments	 are	
made	by	comparing	a	new	image	with	the	representation	of	a	face	stored	in	that	person’s	
face	space,	then	identification	accuracy	should	increase	in	alignment	with	refinement	of	
face	space,	i.e.	familiar	viewers	will	be	more	accurate	as	they	have	a	more	refined	face	
space	than	unfamiliar	viewers.	This	means	that	false	match	(different	identity)	items	are	
more	likely	to	fall	outwith	the	range	of	accepted	faces	for	the	target	face	for	familiar	than	
unfamiliar	 viewers,	 and	 true	 (same	 identity)	 images	 of	 target	 will	 be	 more	 likely	 to	 be	
accepted	to	fall	within	the	accepted	range	for	the	target	face.	It	appears	that	exposure	to	
greater	variation	with	a	face	increases	the	likelihood	of	later	correct	identity	judgments	
being	made	(Menon,	White	&	Kemp,	2015).		
	
Within	person	variability	is	a	strong	component	of	this	face	space	model,	as	an	identity	
specific	face	space	would	encompass	the	idea	that	people’s	appearance	naturally	changes	

	

39	
	

	
	
across	 images.	 Sometimes	 people	 can	 look	 unlike	 themselves,	 or	 similar	 to	 another	
person,	 just	 by	 chance.	 This	 incidental	 variation	 between	 faces	 is	 very	 different	 to	
deliberate	change	in	appearance.	Whilst	people	can	look	incidentally	unlike	themselves,	
or	like	another	person	they	can	also	make	a	deliberate	attempt	to	look	unlike	themselves	
(evasion	 disguise)	 or	 like	 another	 person	 (impersonation	 disguise).	 These	 appearance	
changes,	 both	 incidental	 and	 deliberate	 (disguise)	 will	 be	 a	 key	 theme	 of	 challenging	
images	explored	in	this	thesis.		
	

1.5 Face	Recognition	in	Challenging	Situations	
Past	 research	 has	 demonstrated	 that	 people	 are	 very	 good	 at	 matching	 two	 identical	
images	 of	 a	 face	 and	 that	 problems	 instead	 arise	 when	 dealing	 with	 matching	 multiple	
images	of	the	same	face	(Bruce,	1982).	When	the	target	faces	are	unfamiliar	to	viewers,	
performance	on	such	tasks	is	very	poor	(Kemp	et	al.	1997;	Burton	et	al.	1999;	Megreya	&	
Burton,	2006,2007;	White	et	al.	2014).	However	familiarity	with	the	target	faces	aids	face	
recognition	 accuracy	 in	 a	 range	 of	 situations	 (Burton	 et	 al.	 1999,	 Bruce	 et	 al.	 2001,	
Megreya	 &	 Burton,	 2006,	 Jenkins	 &	 Burton,	 2011).	 The	 studies	 discussed	 up	 until	 this	
point	 have	 tested	 face	 recognition	 where	 the	 people	 who	 are	 photographed	 to	 provide	
images	for	identity	comparisons	cooperate	with	the	identity	effort,	i.e.	for	same	person	
trials	the	models	made	no	deliberate	attempt	to	change	appearance	across	the	different	
images	 taken.	 Furthermore,	 different	 person	 trials	 were	 created	 by	 selecting	 the	 most	
similar	 face	 images	 from	 a	 small	 number	 of	 available	 images,	 therefore	 the	 different	
identity	 images	 may	 not	 look	 all	 that	 similar	 to	 each	 other.	 Face-matching	 performance	
could	be	even	worse	for	more	challenging	identity	matching	scenarios.	
	
As	 demonstrated	 above,	 people	 can	 look	 very	 different	 to	 themselves	 across	 different	
images	(Jenkins,	2011).	There	are	also	situations	where	people	naturally	look	very	similar	
in	appearance	to	another	person.	In	addition	to	incidental	change,	in	real	world	scenarios	
people	 may	 have	 strong	 incentives	 to	 create	 deliberate	 changes	 to	 their	 appearance	 -	
either	 to	 evade	 their	 own	 identity	 (evasion)	 or	 to	 impersonate	 someone	 else	
(impersonation).	 There	 may	 be	 reasons	 for	 a	 person	 to	 hide	 their	 identity	 (evasion)	 for	
example	if	they	had	been	banned	from	a	place	but	wanted	to	gain	access.	Impersonation	
	

40	
	

	
	
disguise	 is	 also	 a	 serious	 issue.	 Criminal	 activities	 including	 illegal	 immigration,	 the	
smuggling	of	drugs,	weapons	or	stolen	goods,	human	trafficking	and	terrorist	activity	may	
involve	 the	 use	 of	 stolen	 identity	 documentation	 in	 order	 to	 cross	 borders	 as	 someone	
else.	People	might	choose	a	passport	based	on	natural	similarities	with	the	face.	It	is	also	
possible	that	the	new	document	holder	will	make	an	effort	to	look	like	the	photographed	
face	 on	 the	 document	 in	 order	 to	 reduce	 suspicion	 and	 successfully	 pass	 borders	 using	
the	stolen	identity.	Such	attempts	are	made	quite	frequently,	for	example,	in	2010,	359	
people	were	found	guilty	in	the	UK	for	possessing	false	or	improperly	obtained	ID.	Under	
the	 2006	 Fraud	 Act	 an	 additional	 7,319	 were	 charged	 for	 dishonestly	 making	 a	 false	
representation	to	make	gain	for	oneself	or	another	(Home	Office,	2012).	There	have	been	
some	 attempts	 to	 investigate	 the	 effect	 that	 i)	 natural/incidental	 image	 manipulations;	
and	 ii)	 deliberate	 disguise	 manipulations,	 have	 on	 face	 recognition	 performance.	 These	
will	be	discussed	below.		
	
Natural	Image	Manipulations	
It	is	likely	that	there	are	certain	image	manipulations	that	increase	the	likelihood	of	these	
challenging	 image	 situations	 arising.	 Manipulations	 including	 changes	 in	 pose	 and	
expression	 have	 already	 been	 found	 to	 make	 identity	 judgments	 more	 difficult	 across	
images	(Bruce	et	al.	1982).		
	
It	 is	 possible	 that	 another	 simple	 image	 manipulation,	 changing	 camera-to-subject	
distance	between	comparison	images,	could	make	same	person	identity	judgments	more	
difficult	 than	 comparing	 images	 taken	 at	 the	 same	 distance.	 Harper	 &	 Latto	 (2007)	
showed	 that	 changing	 camera-to-subject	 distance	 changed	 perceived	 weight	 judgments	
of	a	face.	This	manipulation	has	not	yet	been	experimentally	explored	with	reference	to	
face	recognition	ability,	however	the	results	suggest	that	images	changed	in	camera-tosubject	 distance	 would	 be	 more	 difficult	 to	 ‘tell	 together’	 as	 perceptions	 of	 the	 face	
changed	as	a	result	of	camera-to-subject	distance	change.	
	

	

41	
	

	
	
Deliberate	Disguise		
Most	of	the	research	on	recognising	disguised	faces	has	focused	on	changes	to	featural	
information	in	the	face,	with	reference	to	evasion	disguise.	Patterson	&	Baddeley	(1977)	
investigated	 whether	 face	 recognition	 was	 affected	 by	 disguise	 presence.	 In	 this	 study	
participants	viewed	(with	the	intent	of	learning)	a	series	of	face	images.	Participants	were	
later	shown	a	series	of	test	images,	some	of	which	showed	the	previously	seen	identities	
in	 disguise,	 and	 were	 tasked	 with	 recognising	 previously	 seen	 identities	 from	 these	
images.	 Disguise	 manipulations	 reduced	 recognition	 accuracy	 to	 approximately	 chance.	
This	 task	 was	 an	 image	 memory	 task,	 where	 the	 exact	 images	 were	 used	 at	 test	 and	
learning,	except	for	those	images	in	the	disguise	condition.	Disguised	images	included	the	
addition	of	props	to	a	face,	such	as	wigs,	glasses,	fake	moustaches	and	beards.		
	
Terry	 (1993)	 found	 that	 a	 more	 specific	 and	 simple	 form	 of	 disguise,	 the	 addition	 of	
glasses	to	a	face,	was	in	itself	detrimental	to	face	recognition	performance.	However,	this	
study	 reports	 an	 effect	 of	 eyeglasses	 on	 face	 image	 memorability,	 rather	 than	 an	
experimental	 disguise	 manipulation.	 Participants	 viewed	 images	 of	 faces	 (some	 with	
glasses	 and	 some	 without)	 and	 were	 later	 asked	 to	 identify	 the	 previously	 seen	 images	
amongst	distractor	images.	There	was	no	addition	or	removal	of	glasses	between	images.	
This	 was	 an	 image	 memory	 task	 rather	 than	 a	 face	 recognition	 experiment.	 In	 a	 later	
experiment	Terry	(1994)	approached	the	eyeglasses	manipulation	from	a	more	controlled	
angle.	This	time,	participants	learnt	face	images	and	were	then	tested	on	face	images	that	
could	have	had	glasses	or	beards	added	or	removed	to	the	image.	The	removal	of	glasses	
on	a	person	who	had	initially	been	presented	wearing	glasses,	and	the	addition	of	a	beard	
lowered	recognition	accuracy	(Terry,	1994).		
	
Furthermore,	 Righi,	 Peissig	 &	 Tarr	 (2002)	 suggested	 that	 some	 disguise	 manipulations	
were	 more	 detrimental	 to	 memory	 performance	 than	 others.	 They	 reported	 that	
recognition	performance	was	hindered	if	the	face	image	changed	in	any	way	between	the	
learning	and	test	phase.	However,	recognition	was	significantly	worse	when	the	disguise	

	

42	
	

	
	
manipulation	involved	a	change	of	hairstyle	or	removal	of	glasses,	compared	to	when	just	
glasses	were	added	to	a	face.	
	
The	 effectiveness	 of	 different	 types	 of	 disguise	 may	 depend	 on	 the	 viewer’s	 familiarity	
with	 the	 disguised	 face.	 External	 features	 of	 a	 face	 have	 been	 found	 to	 be	 of	 particular	
importance	 in	 unfamiliar	 face	 recognition	 (Bruce	 et	 al.	 1999;	 Bonner,	 Burton	 &	 Bruce,	
2003;	Megreya	&	Bindemann,	2009),	whereas	internal	features	may	be	of	greater	use	in	
familiar	face	recognition	(Ellis,	Shepherd	&	Davies,	1979;	Young,	Hay,	McWeeny,	Flude,	&	
Ellis,	 1985).	 To	 date,	 disguise	 studies	 have	 not	 looked	 at	 the	 internal/external	 feature	
manipulation	in	terms	of	effectiveness	of	disguise	type	and	familiarity	with	the	disguised	
face.		
	
The	 studies	 of	 disguise	 discussed	 so	 far	 have	 all	 involved	 a	 face	 memory	 component.	
Dhamecha,	 Singh,	 Vatsa	 &	 Kumar	 (2014)	 provide	 the	 only	 published	 study	 to	 date	 to	
investigate	 people’s	 ability	 to	 face	 match	 disguised	 faces.	 Photographic	 models	 in	 the	
study	were	given	a	range	of	accessories	which	they	could	use	as	they	wished	to	disguise	
themselves.	Participants	then	completed	a	printed	questionnaire,	which	showed	pairs	of	
face	images,	and	were	tasked	with	deciding	whether	the	images	in	the	pair	were	of	the	
same	person	or	not.	Participants	viewed	faces	of	both	the	same	and	different	ethnicity	to	
themselves,	and	also	familiar	and	unfamiliar	faces.	Highest	accuracy	rates	were	found	for	
same	 ethnicity,	 familiar	 faces.	 This	 study	 provides	 a	 very	 interesting	 first	 look	 at	 facematching	 ability	 for	 disguised	 faces,	 in	 particular	 it	 is	 interesting	 that	 familiarity	 aided	
recognition	performance.	Although	this	study	provides	a	recent	focus	on	the	problem	of	
face-matching	with	disguised	face,	the	stimuli	in	the	study	may	not	be	realistic	(the	sorts	
of	 manipulations	 that	 a	 person	 would	 naturally	 chose	 to	 disguise	 themselves)	 and	
certainly	not	undetectable	disguises	(see	Figure	1.12).	Many	of	the	disguise	manipulations	
occluded	 features	 of	 the	 face.	 Occlusion	 disguise	 would	 not	 be	 effective	 in	 evading	
identity	in	all	situations,	for	example	passport	security	checks	often	request	that	items	of	
occlusion	 are	 removed	 from	 a	 face	 during	 an	 identity	 check.	 Furthermore,	 this	 study	
focuses	exclusively	on	evasion	disguise.	This	is	a	very	interesting	question	to	address	as	
	

43	
	

	
	
people	 may	 have	 very	 strong	 reason	 to	 not	 be	 identified	 as	 their	 true	 self,	 however	
evasion	 disguise	 only	 covers	 half	 of	 the	 disguise	 problem.	 If	 a	 person	 is	 travelling	 on	 a	
stolen	 passport,	 it	 is	 more	 likely	 that	 they	 would	 attempt	 to	 make	 themselves	 look	
specifically	 like	 the	 person	 on	 the	 passport	 rather	 than	 only	 trying	 to	 hide	 their	 own	
identity.	Given	the	threat	of	successful	disguise	to	security	it	would	be	useful	for	studies	
to	investigate	realistic	disguise	for	both	evasion	and	impersonation	situations.	

	
Figure	1.12	Example	stimuli	in	the	Dhamecha	et	al.	(2014)	face-matching	task.		

	
In	summary,	where	disguised	face	recognition	performance	has	been	investigated,	focus	
has	been	exclusively	on	evasion	disguise	(when	a	person	changes	their	own	appearance	
to	look	unlike	themselves).	Stimuli	used	in	these	studies	have	focused	on	simple	disguise	
manipulations,	 mostly	 the	 addition	 of	 props	 to	 occlude	 parts	 of	 the	 face.	 Furthermore,	
experiments	 have	 compared	 disguise	 face	 recognition	 performance	 with	 exact	 image	
matching	which	does	not	address	the	question	of	whether	disguise	impairs	performance	
compared	to	performance	for	recognising	across	different	undisguised	images	of	a	face.		
	

1.6 	Overview	of	Current	Work	
There	 has	 been	 a	 great	 deal	 of	 previous	 work	 on	 human	 face-matching	 performance,	
however	these	studies	have	generally	looked	at	performance	in	tasks	where	the	models	
used	 to	 create	 same	 identity	 stimuli	 have	 cooperated	 with	 the	 identity	 effort	 and	
different	 person	 image	 pairs	 have	 been	 constructed	 from	 a	 small	 number	 of	 available	

	

44	
	

	
	
images.	Performance	could	be	even	worse	for	more	challenging	stimuli.	There	has	been	
some	past	work	on	the	challenging	case	of	deliberately	disguised	faces,	but	this	has	been	
limited	 to	 investigating	 on	 the	 case	 of	 evasion	 disguise,	 generally	 through	 purely	 the	
addition	of	props	to	a	face.		
	
This	thesis	will	investigate	face-matching	performance	for	challenging	images,	including	a	
more	 thorough	 examination	 of	 deliberate	 disguise.	 Past	 research	 has	 established	 that	
people	 perform	 poorly	 when	 matching	 unfamiliar	 faces,	 and	 familiar	 viewers	 perform	
very	 well	 even	 when	 images	 are	 of	 degraded	 quality.	 These	 findings	 come	 from	
experiments	that	have	tested	performance	accuracy	for	matching	cooperative	stimuli.	In	
reality,	there	are	situations	when	people	naturally	look	a	lot	like	somebody	else,	and	also	
situations	 where	 people	 look	 naturally	 different	 across	 multiple	 images	 of	 themselves.	
The	images	that	result	from	these	instances	of	natural	similarities	between	identities	or	
natural	 differences	 in	 appearance	 of	 own	 identity,	 would	 create	 a	 far	 more	 challenging	
task	 than	 the	 cooperative	 stimuli`	 tasks	 created	 to	 date.	 Further	 to	 these	 naturally	
occurring	instances,	people	can	make	deliberate	attempts	to	evade	their	own	identity	or	
to	impersonate	another	identity.	The	effect	of	both	incidental	and	deliberate	changes	in	
own	 appearance,	 and	 incidental	 and	 deliberate	 similarities	 with	 someone	 else’s	
appearance	 will	 be	 tested	 in	 this	 thesis.	 Ways	 to	 improve	 performance	 will	 also	 be	
investigated	with	the	effect	of	familiarity	on	face	matching	performance	remaining	a	key	
theme	 throughout.	 Image	 manipulations	 and	 performance	 on	 face-matching	 tasks	 for	
challenging	stimuli	will	help	to	address	theoretical	questions	regarding	the	methods	used	
for	identity	judgments.		
	
The	second	chapter	of	this	thesis	explores	whether	face-matching	accuracy	is	even	poorer	
for	 challenging	 stimuli	 (ambient	 same	 identity	 images,	 and	 very	 similar	 different	 person	
image	 trials)	 than	 the	 poor	 performance	 found	 previously	 for	 matching	 tasks	 based	 on	
cooperative	 stimuli.	 To	 increase	 task	 difficulty	 and	 match	 the	 challenging	 image	
conditions	 often	 encountered	 in	 forensic	 investigations,	 images	 are	 degraded	 in	 quality	

	

45	
	

	
	
through	image	pixelation	across	a	series	of	three	experiments.	The	effect	of	familiarity	on	
task	performance	is	explored	and	treated	as	a	graded	rather	than	binary	variable.			
	
Chapter	 three	 acknowledges	 the	 problem	 of	 reducing	 image	 quality	 on	 face-matching	
accuracy	 that	 is	 established	 in	 Chapter	 two,	 and	 investigates	 ways	 of	 improving	
performance.	Three	techniques	are	tested,	i)	image	manipulations;	ii)	data	analysis;	and	
iii)	observer	effects.	These	techniques	are	first	investigated	in	terms	of	any	improvement	
gain	they	bring	when	used	alone.	The	effect	of	combining	methods	is	also	examined.		
	
Chapter	 four	 investigated	 whether	 camera-to-subject	 distance	 changes	 influence	 face	
recognition,	with	an	aim	of	investigating	theory	behind	facial	recognition,	whether	facematching	 performance	 is	 impaired,	 and	 finally	 whether	 perceptual	 constancy	 methods	
exist	to	facilitate	identification	across	distances.	The	chapter	tests	the	effect	of	changing	
camera-to-subject	 distance	 on	 the	 facial	 configuration	 of	 a	 face	 as	 measured	 from	 an	
image,	and	also	whether	any	changes	result	in	difficulties	in	identity	judgments.	This	is	an	
incidental	 appearance	 manipulation,	 and	 may	 make	 the	 same	 identity	 look	 different	
across	multiple	images,	and	perhaps	even	different	identities	appear	even	more	different.	
Performance	 accuracy	 for	 both	 familiar	 and	 unfamiliar	 viewers	 is	 again	 tested	 and	
perceptual	constancy	to	help	deal	with	distance	induced	changes	to	a	face	is	explored.				
	
The	final	chapters	directly	address	deliberate	changes	in	appearance	through	the	creation	
of	 a	 new	 disguise	 face	 database,	 which	 encapsulates	 both	 evasion	 and	 impersonation	
disguises	and	also	non-disguised	images	of	these	faces.	Chapter	five	tested	the	effect	of	
disguise	 on	 face	 matching	 performance	 and	 established	 effects	 of	 disguise	 type.	 The	
effect	 of	 familiarity	 was	 investigated	 and	 also	 whether	 unfamiliar	 viewers	 performance	
accuracy	 could	 be	 improved	 if	 they	 were	 informed	 that	 disguises	 might	 be	 present.		
Chapter	six	is	an	exploratory	chapter,	which	aims	at	further	understanding	what	people	
do	 to	 create	 evasion	 and	 impersonation	 disguises,	 and	 which	 of	 these	 approaches	 are	

	

46	
	

	
	
effective.	 This	 chapter	 concludes	 with	 an	 experiment	 that	 tests	 for	 differences	 in	
perceived	personality	judgments	across	non-disguised	and	disguised	faces.		
	
Taken	 together,	 performance	 accuracy	 for	 the	 challenging	 stimuli	 investigated	 is	 even	
worse	 than	 performance	 in	 previous	 face	 matching	 tasks	 that	 were	 constructed	 from	
cooperative	 stimuli.	 There	 are	 important	 distinctions	 between	 natural	 and	 deliberate	
efforts	to	not	look	like	oneself	and	to	look	like	another	person.	This	is	particularly	evident	
in	 the	 study	 of	 deliberate	 disguise.	 Evasion	 and	 impersonation	 disguise	 cause	 different	
levels	of	face	matching	difficulty	and	the	disguises	themselves	are	achieved	through	the	
use	 of	 different	 methods.	 Familiarity	 aids	 performance	 in	 all	 tested	 challenging	 facematching	 tasks,	 and	 there	 are	 several	 other	 methods	 that	 can	 be	 used	 to	 improve	
performance	which	do	not	rely	on	familiarity	with	the	faces	concerned.		
	

	

	

47	
	

	
	

Chapter	2	–	Familiarity	&	Challenging	Faces	
2.1 Chapter	Summary	
This	 chapter	 looks	 at	 the	 role	 of	 familiarity	 in	 a	 naturally	 difficult	 identification	 task	 –	
specifically	 matching	 similar	 faces.	 In	 Experiment	 1	 a	 graded	 familiarity	 advantage	 is	
reported,	with	participants	being	poor	at	the	matching	task	for	unfamiliar	faces	and	much	
better	for	familiar	faces.	This	graded	familiarity	advantage	survived	even	when	the	images	
were	pixelated	to	eliminate	fine	scale	information	in	the	images	(Experiment	2)	but	began	
to	break	down	under	coarse	pixelation	(Experiment	3).		
	
Pixelation	 makes	 featural	 and	 configural	 information	 difficult	 to	 access.	 The	 observed	
advantage	 of	 extremely	 familiar	 faces	 even	 under	 coarse	 pixelation	 suggests	 that	 other	
information	besides	fine	scale	information	in	the	images	was	being	used	to	support	the	
required	discriminations.		

	
2.2 Introduction	
Facial	identification	is	a	task	people	often	assume	they	are	good	at	(Bruce,	1988;	Jenkins	
&	 Burton,	 2011).	 This	 belief	 is	 likely	 held	 because	 our	 everyday	 experiences	 of	 face	
recognition	 with	 familiar	 faces	 cause	 us	 little	 difficulty.	 People	 can	 very	 easily	 identify	
family	 members,	 friends,	 and	 celebrities	 across	 a	 wide	 range	 of	 image	 conditions	
including	different	angles,	different	lighting,	changes	in	pose	and	even	over	images	taken	
years	 apart	 (Bruce,	 1982;	 1994;	 Jenkins,	 White,	 Van	 Montford	 &	 Burton,	 2011).	
Confidence	in	human	face	recognition	ability	is	so	high	that	facial	identification	forms	the	
basis	 of	 many	 major	 security	 systems	 worldwide.	 For	 example,	 passport	 control	 verifies	
personal	identity	by	comparing	photographic	identification	documents	against	the	face	of	
the	holder	(physically	present).		
	

	

48	
	

	
	
Experimental	 evidence	 suggests	 that	 this	 confidence	 is	 misplaced.	 Decades	 of	 research	
has	shown	that	people	make	frequent	errors	in	face	identification	tasks	when	the	tasks	
involve	 unfamiliar	 faces.	 This	 is	 problematic	 as	 the	 security	 situations	 relying	 on	 face	
recognition	 as	 an	 identity	 verification	 method	 generally	 involve	 unfamiliar	 faces,	 not	
familiar	faces.	Experiments	have	tested	human	face-matching	performance	in	a	variety	of	
ways,	 including	 paired	 face-matching	 tasks,	 line	 up	 arrays	 and	 live	 person	 to	 photo	
matching	 (Kemp	 et	 al.	 1997;	 Bruce	 et	 al.	 1999;	 Burton,	 White	 &	 McNeil,	 2010).	
Performance	on	all	of	these	tasks	has	been	found	to	be	highly	error	prone,	with	people	
making	around	10-20%	errors	in	identity	judgments	when	matching	faces,	depending	on	
the	 details	 of	 the	 task	 (Burton	 et	 al.	 2010,	 Bruce	 et	 al.	 1999	 &	 Kemp	 et	 al.	 1997).	
Furthermore,	experience	and	training	in	face	recognition	appears	to	make	no	difference	
to	 task	 performance	 -	 passport	 officers	 performed	 no	 better	 than	 a	 random	 sample	 of	
undergraduate	students	in	a	recent	face	identity	task	that	mimicked	the	passport	control	
face-matching	procedure	(White	et	al.	2014).	
	
The	fact	that	people	are	making	around	1	in	5	errors	in	such	tasks	is	particularly	worrying,	
given	that	this	performance	level	is	from	tasks	where	people	(or	photograph	face	stimuli)	
viewed	 in	 the	 tasks,	 have	 been	 cooperating	 with	 the	 procedure,	 that	 is,	 they	 have	 not	
been	trying	to	subvert	the	identification	(Henderson,	Bruce	&	Burton,	2001;	Burton	et	al.	
2010).	 The	 stimuli	 for	 face	 recognition	 experiments	 are	 often	 new,	 controlled	
photographs	 taken	 specifically	 for	 the	 study.	 For	 instance,	 face	 photographs	 are	 usually	
taken	under	consistent	lighting,	using	good	quality	cameras,	posing	a	neutral	expression	
and	captured	from	a	front	on	angle.	When	several	different	images	of	the	same	person	
have	been	taken	for	a	study,	there	has	been	no	deliberate	attempt	to	make	images	of	the	
same	person	look	different	across	these	multiple	photographs.	Same	person	photographs	
have	mostly	been	captured	within	very	short	time	intervals	–	often	only	minutes	apart	–	
reducing	 natural	 variance	 in	 appearance	 due	 to	 change	 in	 style	 or	 age.	 It	 is	 thus	 very	
possible	 that	 performance	 on	 such	 tasks	 reflects	 a	 level	 of	 performance	 that	 is	 higher	
than	 would	 be	 achieved	 in	 less	 favourable	 but	 more	 realistic	 conditions	 that	 include	
incidental	image	variation.	For	example,	current	legislation	allows	a	passport	photograph	
to	have	been	taken	up	to	ten	years	prior	to	use.	Matching	across	this	decade	span	is	likely	
	

49	
	

	
	
far	more	challenging	than	an	experimental	task	for	which	all	photographs	were	taken	just	
several	 weeks	 before	 testing	 (White	 et	 al.	 2014).	 Similarly,	 the	 image	 quality	 available	
from	CCTV	could	be	very	poor	compared	with	images	in	laboratory	studies.		
	
Another	 characteristic	 of	 previous	 face-matching	 studies	 is	 that	 they	 have	 drawn	 their	
mismatch	identities	from	a	rather	limited	pool.	Different	person	trials	have	typically	been	
created	by	pairing	together	people	whose	face	images	look	a	bit	similar	to	each	other	–	
perhaps	 due	 to	 similar	 hair	 colour	 or	 face	 shape.	 All	 false	 match	 images	 have	 to	 be	
selected	 from	 the	 available	 pool	 of	 photographs	 in	 the	 stimulus	 set.	 Images	 used	 in	
mismatch	 trials	 are	 thereby	 not	 always	 convincingly	 similar	 in	 appearance,	 even	 for	
unfamiliar	viewers.	This	could	make	them	easy	to	reject.		
	
The	upshot	is	that	face-matching	ability	for	very	similar	faces	could	be	even	worse	than	in	
previous	 studies.	 In	 certain	 applied	 situations,	 for	 example,	 when	 using	 a	 fraudulent	
passport,	 people	 may	 have	 very	 strong	 incentives	 to	 use	 the	 identity	 of	 a	 person	 who	
looks	naturally	similar	to	them	in	appearance.	Yet	little	is	known	about	viewers’	ability	to	
discriminate	highly	similar	faces.		
	
Familiarity		
Familiarity	has	been	shown	in	past	studies	to	predict	performance	accuracy	for	both	same	
person	and	different	person	identification	(Burton	et	al.	1999,	Jenkins	et	al.	2011).	Given	
the	 influence	 of	 familiarity	 on	 past	 studies	 of	 face	 recognition,	 familiarity	 may	 be	 an	
important	 factor	 in	 face-matching	 where	 tasks	 include	 naturalistic	 same	 person	 image	
pairs	 which	 encompass	 natural	 within	 person	 variation,	 and	 also	 different	 image	 pairs	
which	are	of	extremely	similar	appearance.			
	

	

50	
	

	
	
Experimental	 participants	 regularly	 perform	 at	 ceiling	 level	 in	 face-matching	 tasks	 when	
the	 images	 available	 for	 comparison	 are	 of	 faces	 that	 are	 familiar	 to	 them	 (Hancock,	
Bruce	&	Burton,	2000).	For	example,	highly	accurate	performance	is	achieved	even	when	
the	 image	 shown	 at	 testing	 differs	 from	 that	 shown	 at	 initial	 presentation	 in	 facial	
expression	 or	 in	 the	 photographed	 angle	 (Bruce,	 1982).	 Jenkins	 et	 al.	 (2011)	
demonstrated	 that	 the	 familiarity	 effect	 holds	 strong	 when	 many	 photographs	 are	
compared	 and	 when	 the	 images	 available	 for	 comparison	 are	 uncontrolled,	 and	 highly	
varied	on	conditions	including	pose,	lighting,	expression,	hairstyle,	and	age.	In	one	of	their	
studies,	participants	were	presented	with	40	shuffled	face	picture	cards.	Unknown	to	the	
participants,	 this	 card	 deck	 comprised	 of	 20	 face	 picture	 cards	 of	 one	 female	 Dutch	
celebrity,	 and	 the	 other	 20	 face	 picture	 cards	 were	 of	 another	 female	 Dutch	 celebrity.	
Both	 unfamiliar	 viewers	 (20	 British	 participants)	 and	 familiar	 viewers	 (20	 Dutch	
participants)	were	asked	to	sort	the	card	deck	by	identity,	grouping	together	photographs	
which	they	believed	showed	the	same	individual’s	face.	Participants	were	given	no	time	
restriction	 and	 could	 make	 as	 many	 or	 as	 view	 groupings	 as	 they	 felt	 reflected	 the	
number	 of	 identities	 present.	 A	 strong	 familiarity	 effect	 was	 found	 in	 this	 experiment.	
Unfamiliar	 viewers	 struggled	 with	 the	 task,	 dividing	 the	 deck	 into	 7.5	 identity	 piles	 on	
average.	 But	 familiar	 viewers	 easily	 group	 the	 cards	 into	 the	 two	 correct	 identities.	
Familiarity	 with	 the	 faces	 concerned	 made	 the	 task	 easy,	 even	 though	 the	 stimuli	 were	
highly	variable	and	this	familiarity	advantage	extends	to	poor	quality	images	(Bruce	et	al.	
2001,	Burton	et	al.	1999).	
	
Findings	 such	 as	 these	 highlight	 that	 familiarity	 can	 make	 face	 identity	 decisions	 easy,	
even	 when	 the	 same	 decisions	 are	 difficult	 for	 unfamiliar	 viewers.	 However,	 there	 has	
been	 very	 little	 research	 into	 whether	 familiarity	 helps	 with	 distinguishing	 extremely	
similar	 faces,	 such	 as	 the	 face	 of	 a	 disguised	 imposter	 from	 the	 true	 identity.	 Some	
support	 for	 the	 notion	 that	 familiarity	 could	 help	 comes	 from	 studies	 on	 telling	 twins	
apart.	Stevenage	(1998)	found	that	after	corrective	feedback	training,	participants	rated	
photographs	of	the	same	twin	to	be	more	similar,	and	images	of	different	twins	as	less	
similar.	Robbins	&	McKone	(2003)	also	report	training	participants	to	distinguish	identical	
twins,	 also	 using	 corrective	 feedback	 to	 aid	 learning.	 This	 study	 focussed	 primarily	 on	
	

51	
	

	
	
holistic	processing	rather	than	identity	judgment.	Although	these	studies	did	not	test	the	
familiarity	effect	directly	and	in	isolation	(as	they	gave	corrective	feedback	between	trials)	
the	finding	that	identical	twin	faces	can	be	learnt	and	distinguished	after	much	exposure	
to	them	during	the	training	phase	provides	scope	for	a	potential	advantage	for	familiarity.	
As	 familiarity	 helps	 in	 the	 case	 of	 distinguishing	 the	 faces	 of	 identical	 twins,	 there	 is	
reason	to	think	that	familiarity	with	a	face	provides	a	good	starting	point	for	investigating	
performance	in	a	matching	task	involving	unrelated,	but	very	similar	faces.		
	
Familiarity:	a	Graded	Effect	
There	are	problems	in	defining	familiarity	with	a	face.	Experimentally,	face	groups	have	
often	 been	 divided	 into	 familiar	 and	 unfamiliar	 faces	 for	 all	 participants	 –	 for	 example	
celebrity	faces	as	the	familiar	face	set	(Clutterbuck	&	Johnston,	2002)	and	a	convenience	
sample	 of	 non-celebrity	 faces	 as	 the	 unfamiliar	 set	 (Burton	 et	 al.	 2010)	 or	 faces	 of	
celebrities	 from	 other	 countries	 who	 are	 unfamiliar	 to	 the	 participants	 being	 tested	
(Burton,	Kramer	&	Ritchie,	2015).	An	alternative	approach	has	been	to	create	a	familiar	
face	 set	 based	 on	 colleagues	 or	 classmates	 of	 the	 experimental	 participants	 and	 to	
compare	their	performance	with	another	group	of	participants	who	would	be	unlikely	to	
know	 the	 target	 faces	 (Burton	 et	 al.	 1999).	 Still	 other	 studies	 have	 familiarised	 viewers	
with	novel	faces	as	part	of	the	experiment	(Clutterbuck	&	Johnston,	2004,	2005).	These	
methods	 are	 not	 without	 their	 faults.	 For	 example,	 not	 all	 faces	 that	 are	 presented	 as	
familiar	(usually	celebrity	faces)	are	familiar	to	all	participants.	Even	among	the	faces	that	
are	known	to	the	participants,	it	is	unlikely	they	will	be	equally	familiar	to	them.	For	these	
reasons,	a	face	cannot	always	be	neatly	categorized	as	familiar	or	unfamiliar	to	a	single	
participant,	let	alone	to	a	group	of	participants.		
	
Clutterbuck	&	Johnston	were	among	the	very	first	to	demonstrate	the	graded	nature	of	
familiarity	experimentally	(Clutterbuck	&	Johston,	2002;	2004;	2005).	In	the	first	paper	in	
this	series,	the	familiar	and	moderately	familiar	faces	used	in	the	study	were	all	celebrity	
face	 images,	 with	 each	 celebrity’s	 familiarity	 category	 chosen	 on	 the	 basis	 of	 familiarity	

	

52	
	

	
	
ratings	provided	by	eight	independent	raters.	 An	increase	in	familiarity	led	to	a	decrease	
in	the	time	taken	to	match	a	full-face	image	to	images	showing	just	the	internal	features	
of	a	face	(see	Figure	2.1).	Participants	were	fastest	at	matching	a	picture	of	the	full	face	to	
images	of	internal	features	for	highly	familiar	faces,	slower	for	moderately	familiar	faces,	
and	slowest	for	unfamiliar	faces	(Clutterbuck	&	Johnston,	2002).	There	was	however	no	
significant	 difference	 in	 performance	 when	 matching	 full-face	 images	 of	 each	 of	 the	
categories	(highly	familiar,	moderately	familiar,	unfamiliar)	to	same	or	different	images	of	
the	external	features	of	the	face.	
	
	
	
	
	
	
	
	
Figure	2.1	Example	of	full	face	and	internal	feature	stimuli	(left)	and	full	face	and	external	features	(right)	
viewed	as	part	of	Clutterbuck	&	Johnston’s	(2002)	face-matching	task.	

	
Clutterbuck	and	Johnston	(2002)	assumed	that	the	faces	in	each	of	the	categories	were	of	
the	same	familiarity	level	for	all	participants	(i.e.	a	face	in	the	highly	familiar	category	was	
assumed	 to	 be	 highly	 familiar	 for	 all	 participants).	 However,	 the	 raters	 who	 rated	 the	
familiarity	of	the	face,	did	not	take	part	in	the	experiment	itself,	so	the	familiarity	bands	
may	 not	 be	 an	 accurate	 reflection	 of	 how	 familiar	 each	 face	 was	 to	 each	 of	 the	
participants	 in	 the	 main	 study.	 It	 is	 possible	 that	 some	 participants	 would	 be	 more	
familiar	with	the	moderately	familiar	celebrities	and	vice	versa.		

	

53	
	

	
	
Clutterbuck	 and	 Johnston	 (2004;	 2005)	 carried	 out	 two	 additional	 studies	 that	 address	
these	concerns	to	some	extent.	They	used	celebrity	face	images	for	the	familiar	category,	
newly	learnt	faces	for	mid-level	familiarity	and	novel	(previously	unseen)	faces	to	provide	
the	 unfamiliar	 level.	 The	 graded	 familiarity	 advantage	 was	 also	 observed	 using	 this	
method	 of	 familiarity	 division.	 Performance	 improved	 as	 familiarity	 increased	 on	 tasks	
involving	 gender	 judgement	 (Clutterbuck	 &	 Johnston,	 2004)	 and	 face-matching	 speed	
(Clutterbuck	 &	 Johnston,	 2005).	 Taken	 together,	 these	 findings	 strongly	 support	
familiarity	as	a	continuous	variable	rather	than	a	binary	concept	–	people	can	be	more	or	
less	 familiar	 with	 a	 face,	 and	 this	 level	 of	 familiarity	 will	 affect	 performance	 on	 facematching	tasks	in	a	graded	way.		
	
When	investigating	the	effect	of	familiarity	on	face-matching	ability	for	very	similar	faces,	
and	using	images	of	same	faces	that	incorporate	natural	variation,	it	will	be	important	to	
ensure	that	familiarity	is	measured	in	a	way	that	accounts	for	i)	familiarity	being	a	graded	
concept	and	ii)	the	notion	that	not	all	celebrity	faces	will	be	equally	familiar	to	all	viewers.		
	

2.3 Celebrity	Faces	&	Celebrity	Lookalikes	
Testing	for	a	familiarity	advantage	requires	faces	that	differ	in	their	degree	of	familiarity.	
Most	 people	 are	 highly	 familiar	 with	 their	 friends,	 colleagues	 and	 family	 members.	 The	
problem	with	this	is	that	familiarity	with	these	faces	is	very	specific	to	the	few	individuals	
who	know	them.	A	study	on	face	familiarity	requires	a	large	number	of	images	of	faces	
that	will	be	familiar	to	many	people	who	view	them	in	the	study,	and	also	a	large	number	
of	 images	 of	 faces	 that	 are	 unfamiliar	 to	 these	 people.	 Ideally	 each	 of	 the	 faces	 used	
across	the	experiment	will	be	familiar	to	some	participants	but	unfamiliar	to	others,	and	
all	of	the	faces	will	be	familiar	to	at	least	some	of	the	participants.	Celebrities	provide	a	
group	 of	 identities	 who	 are	 familiar	 to	 a	 very	 large	 array	 of	 people,	 and	 celebrity	 face	
images	are	easily	accessible	via	Internet	search.	An	additional	advantage	of	using	celebrity	
images	 is	 that	 there	 are	 many	 different	 categories	 of	 celebrities	 including	 pop	 stars,	
reality	 television	 stars,	 actors,	 politicians	 and	 sports	 personalities.	 This	 range	 gives	 the	

	

54	
	

	
	
freedom	to	choose	images	of	celebrities	from	very	different	settings.	Based	on	personal	
interests,	each	participant	will	be	more	or	less	familiar	with	celebrities	from	each	of	these	
categories.	 If	 celebrity	 images	 are	 sampled	 from	 a	 wide	 range	 of	 categories	 (singers,	
actors,	politicians	etc.),	it	is	likely	that	each	participant	will	be	familiar	with	at	least	some	
of	these	celebrity	faces	and	less	familiar	with	others.		
	
Use	of	Lookalikes	as	a	Proxy	for	Imposters		
In	 order	 to	 investigate	 performance	 for	 a	 challenging	 face-matching	 task	 proposed,	 it	 is	
necessary	to	have	access	not	only	to	different	photos	of	the	same	face	that	vary	naturally	
in	 their	 appearance	 -	 such	 images	 are	 known	 as	 ambient	 images	 (Jenkins	 et	 al.	 2011;	
Sutherland	et	al.	2013)	-	but	also	to	photos	of	other	faces	that	look	extremely	similar	to	
the	targets.		
	
Conveniently,	there	is	a	ready	source	of	faces	that	closely	resemble	celebrities	and	that	is	
the	 celebrity	 lookalike	 industry.	 In	 the	 following	 experiments,	 I	 will	 use	 celebrities	 and	
their	professional	lookalikes	to	construct	highly	similar	face	pairs.		
	

2.4 Experiment	1:	Lookalike	Task	
The	purpose	of	Experiment	1	was	to	test	the	effect	of	familiarity	on	performance	accuracy	
in	a	challenging	face-matching	task.	All	of	the	face	stimuli	are	ambient	images	to	sample	
the	 natural	 variation	 in	 each	 person’s	 appearance.	 This	 will	 presumably	 result	 in	 more	
challenging	 same	 person	 trials	 than	 in	 past	 work	 that	 has	 used	 highly	 controlled	
cooperative	images.	The	use	of	celebrity	lookalikes	as	imposters	in	my	experiment	should	
allow	 for	 extremely	 difficult	 different-identity	 pairs,	 compared	 with	 those	 used	 in	
previous	experiments.	The	intention	here	is	to	model	real	life	situations	where	someone	
may	 be	 trying	 to	 pass	 impersonate	 a	 similar	 looking	 person	 on	 a	 fraudulent	 security	
document.		

	

55	
	

	
	
If	familiarity	helps	with	these	very	fine	distinctions,	then	matching	performance	should	be	
more	accurate	for	increasing	levels	of	familiarity.			
	
Method	
Participants	
30	 undergraduate	 students	 (M	 =	 8,	 mean	 age	 =	 20.2)	 at	 the	 University	 of	 York	
volunteered	 as	 participants	 for	 this	 project.	 All	 participants	 were	 paid	 £3	 or	 a	 half	 hour	
course	credit	in	return	for	their	participation.		
	
Stimuli	
Image	Selection	
Face	 images	 of	 30	 celebrity	 identities	 (three	 face	 images	 per	 celebrity),	 and	 one	 face	
image	 of	 a	 professional	 lookalike	 for	 each	 of	 these	 celebrities,	 were	 selected	 as	
experimental	 stimuli	 (120	 images	 in	 total).	 This	 number	 of	 images	 was	 necessary	 to	
provide	 two	 celebrity	 face	 photographs	 to	 constitute	 the	 same-person	 pair,	 and	 one	
additional	 celebrity	 face	 image	 to	 be	 paired	 with	 the	 lookalike	 image	 to	 create	 the	
different	 pair	 in	 the	 face-matching	 task.	 For	 a	 celebrity	 to	 be	 included,	 they	 needed	 to	
have	 at	 least	 one	 professional	 lookalike	 whose	 image	 was	 accessible	 from	 the	 Internet.	
For	 a	 lookalike	 image	 to	 be	 chosen	 as	 suitable	 for	 use	 in	 the	 study,	 a	 viewer	 who	 was	
extremely	 familiar	 with	 the	 celebrity	 in	 question	 approved	 their	 high	 level	 of	 visual	
similarity	to	the	celebrity.		
	
Following	approval	of	the	celebrity	and	lookalike	images	in	the	image	selection,	two	face	
image	pairings	were	created	for	each	celebrity	–	one	showing	two	different	images	of	the	
celebrity,	 and	 the	 other	 showing	 a	 third	 image	 of	 the	 celebrity	 paired	 with	 an	 image	 of	
that	celebrity’s	lookalike	(see	Figure	2.2).	The	lookalike	image	could	appear	on	either	side	

	

56	
	

	
	
of	 the	 screen,	 and	 appeared	 on	 the	 left	 and	 right	 side	 equally	 often	 across	 the	
experiment.		
	
	
	
	
	
	
	
	
Figure	2.2	Face-matching	task	image	examples.	The	pairs	on	the	left	(A)	show	different	identities	(with	the	
imposter	face	on	the	right),	the	pairs	on	the	left	(B)	show	same	identity	pairs.		
	

Design		
This	experiment	was	conducted	using	a	within	subjects	design,	which	tested	the	effect	of	
five	levels	of	face	familiarity	on	the	dependent	variable,	which	was	performance	accuracy	
in	the	face-matching	task.	
		
A	novel	familiarity	scale	was	designed	for	use	at	the	end	of	the	experiment	to	assess	how	
familiar	each	of	the	celebrities	was	to	each	of	the	participants.	This	scale	was	set	across	
the	desk	where	the	experiment	took	place	and	was	a	meter	long	in	length,	marked	for	0100cm,	 with	 0	 representing	 unfamiliar	 faces	 and	 100	 representing	 faces	 that	 were	
extremely	familiar	(see	Figure	2.3).	This	scale	was	intended	to	address	two	limitations	of	
previous	 familiarity	 manipulations,	 i)	 not	 all	 familiar	 (celebrity)	 faces	 are	 familiar	 to	
everyone,	 ii)	 some	 faces	 will	 be	 better	 known	 than	 others	 and	 this	 is	 the	 case	 for	 each	
participant.			
	

57	
	

	
	
Procedure		
Face-Matching	Task		
Participants	took	part	in	a	face-matching	task	involving	60	image	pairs	(two	pairs	for	each	
of	 the	 30	 celebrity	 identities),	 viewed	 on	 a	 computer	 screen.	 The	 participants’	 task	 for	
each	 pair	 was	 to	 determine	 whether	 the	 two	 images	 showed	 the	 same	 identity	 or	
different	 identities	 (i.e.	 one	 of	 the	 images	 was	 of	 the	 lookalike).	 Two	 different	 random	
orders	of	image	pair	presentations	were	created;	each	participant	was	assigned	to	view	
one	 of	 these	 two	 random	 orders.	 Participants	 were	 informed	 that	 the	 lookalike	 images	
could	appear	on	either	side	of	the	screen	and	that	there	was	no	time	limit	for	completing	
the	task.		
	
Familiarity-Rating		
On	completion	of	the	face-matching	task	participants	were	given	photograph	cards	(size	
6cmx4cm)	of	each	of	the	celebrities	that	they	had	seen	in	the	face-matching	task	(N	=	30)	
and	 asked	 to	 rate	 them	 for	 how	 familiar	 they	 were	 with	 the	 celebrity’s	 face	 before	
completing	the	task.	Participants	received	just	one	of	the	three	true	celebrity	images	that	
they	had	viewed	in	the	face-matching	task	for	use	in	the	familiarity-rating	task.	The	image	
viewed	for	each	celebrity	was	selected	randomly	from	the	three	available,	and	the	chosen	
card	for	each	celebrity	remained	the	same	for	all	participants.	Participants	rated	the	faces	
for	familiarity	by	placing	them	on	a	scale	that	ran	from	0	(completely	unfamiliar)	to	100	
(extremely	familiar).	Faces	of	equal	familiarity	could	be	placed	down	vertically	one	above	
the	other	to	create	a	column	of	equally	familiar	faces;	this	was	particularly	useful	for	faces	
that	were	completely	unfamiliar	and	extremely	familiar	(see	Figure	2.3	for	image	of	the	
familiarity	 scale	 in	 the	 experimental	 setting).	 This	 approach	 allowed	 me	 to	 capture	 the	
relative	and	absolute	familiarity	of	the	celebrities	separately	for	each	participant.		

	

58	
	

	
	

Figure	2.3	Photograph	of	one	participant’s	use	of	the	familiarity	scale	taken	immediately	after	completion	in	
the	experimental	setting.	The	far	left	side	of	the	scale	indicates	that	the	face	was	completely	unfamiliar,	and	
the	far	right	depicts	extreme	familiarity	with	the	face.		

	
Analysis	
The	 main	 measure	 of	 interest	 was	 the	 percentage	 of	 correct	 responses	 in	 the	 facematching	task.	To	examine	the	effect	that	familiarity	score	had	on	percentage	accuracy	in	
the	 face-matching	 task,	 I	 grouped	 the	 raw	 familiarity	 ratings	 into	 5	 different	 familiarity	
levels.	 This	 was	 achieved	 by	 binning	 each	 participant’s	 face	 matching	 responses	 into	 5	
familiarity	bands	(quintiles)	based	on	the	participant’s	own	ratings.	In	total	35%	of	faces	
were	placed	in	Band	1,	9%	in	Band	2,	10%	in	Band	3,	8	%	in	Band	4	and	39%	in	Band	5.	
Interestingly,	although	the	majority	of	faces	were	placed	at	the	extremes	of	the	spectrum	
(0-19	and	80-100)	all	participants	placed	some	faces	in	the	middle	familiarity	bands	(2039,40-59,	60-79).		
	
Results	
Percentage	accuracy	scores	were	entered	into	a	one-way	repeated	measures	ANOVA	to	
investigate	the	effect	of	familiarity	on	face-matching	performance	in	this	celebrity	versus	
lookalike	discrimination	task.	Percentage	scores	were	rounded	to	integers	throughout.	
	

59	
	

	
	

Figure	2.4	Percentage	of	correct	responses	in	face-matching	task	(using	fine	quality	200x300	pixel	images)	
for	 each	 familiarity	 quintile;	 1	 (0-19),	 2	 (20-39),	 3	 (40-59),	 4	 (60-79),	 5	 (80-100).	 With	 Band	 1	 being	
completely	unfamiliar	and	Band	5	being	extremely	familiar.	Error	bars	show	standard	error	of	the	mean.		
	

The	 results	 revealed	 that	 face-matching	 performance	 was	 significantly	 affected	 by	 the	
Familiarity	with	the	face	viewed,	F(2.98,	86.51)	=	10.07,	p	=	<.001,	 η p2 =	.26	(GreenhouseGeisser	corrected).	Accuracy	was	lowest	for	the	faces	that	were	most	Unfamiliar	(Band	1,	
M	=	72%,	SE	=	1.58,	CI	=	69	-76),	and	best	for	the	faces	that	were	most	Familiar	(Band	5,	
M	=	90%,	SE	=	1.95,	CI	=	86-94).	As	can	be	seen	in	Figure	2.4,	there	was	a	generally	graded	
increase	in	performance	as	familiarity	increased	(Band	2:	M	=	70%,	SE	=	2.98,	CI	=	64	-76;	
Band	3:	M	=	77%,	SE	=	2.94,	CI	=	75-88;	Band	4:	M	=	81%,	SE	=1.95,	CI	=	77-84).		
	

	

60	
	

	
	
	
	
	
	
	
	
	
Figure	2.5	Pairwise	comparisons	showing	which	familiarity	levels	performance	was	significantly	better	than	
the	other	familiarity	levels.	
	

Pairwise	 comparisons	 revealed	 that	 the	 performance	 was	 significantly	 better	 for	 faces	
that	 were	 placed	 in	 familiarity	 Band	 5	 than	 for	 faces	 in	 Bands	 1,2	 and	 3	 but	 not	 4	 (see	
Figure	2.5).		
	
	

	

61	
	

	
	

	
Figure	2.6	Graph	showing	pattern	for	Same	identity	pairs	correct	response	and	Different	(lookalike)	identity	
pairs	correct	responses.	Error	bars	show	standard	error	of	the	mean.	
	

I	then	analysed	accuracy	separately	for	Same	identity	and	Different	identity	trials	(Figure	
2.6).	For	both	types	of	trial	accuracy	increased	as	familiarity	increased.	As	expected	there	
was	 somewhat	 higher	 accuracy	 in	 the	 Same	 identity	 condition	 than	 in	 the	 Different	
identity	condition,	presumably	reflecting	a	tendency	to	judge	highly	similar	faces	as	the	
same	person.	A	5x2	within-subjects	ANOVA	with	the	factors	of	Familiarity	and	Pair	Type	
confirmed	that	this	difference	was	significant:	overall	accuracy	was	significantly	higher	for	
Same	trials	(M	=	81%,	SE	=	2.11,	CI	=	77-85)	than	for	Different	trials	(M	=	74%,	SE	=	1.84,	CI	
=	 71-78),	 [F(1,29)=5.31,	 p<.05,	 η p2 =.23).	 As	 expected,	 there	 was	 a	 main	 effect	 of	
familiarity	[F(4,116)=11.10,	p<.001,	 η p2 =.12],	indicating	a	graded	familiarity	advantage	for	
Same	and	Different	identity	trials	alike.	There	was	no	interaction	between	Familiarity	and	
Pair	Type	–	familiarity	had	no	more	of	an	effect	for	correct	performance	on	Same	identity	
trials	than	Different	identity	trials	[F(4,116)=.38,	p=.83,	 η p2 =.01].	
	

62	
	

	
	
Discussion		
Participants	 were	 able	 to	 make	 remarkably	 fine	 discriminations	 between	 extremely	
similar	 faces	 (celebrities	 and	 their	 lookalikes),	 and	 to	 integrate	 naturally	 varied	 same	
identity	images.	This	ability	was	underpinned	by	a	graded	effect	of	familiarity.	Accuracy	
was	 at	 its	 peak	 when	 the	 faces	 viewed	 were	 extremely	 familiar	 to	 the	 participants	
(performance	 accuracy	 90%),	 yet	 performance	 was	 much	 worse	 for	 the	 completely	
unfamiliar	faces	(accuracy	72%,	note	that	chance	performance	is	50%).	Task	performance	
generally	increased	with	a	progression	in	familiarity	(Band	2	M	=	70%,	Band	3	M	=	77%,	
Band	4	M	=	81%).	Importantly	these	results	are	based	on	personal	familiarity	scores	for	
each	of	the	celebrities.	This	method	of	analysis	allowed	a	graded	familiarity	effect	to	be	
teased	 out	 despite	 the	 fact	 that	 not	 all	 participants	 were	 familiar	 with	 the	 same	
celebrities.		
	
Performance	 for	 completely	 unfamiliar	 faces	 in	 the	 lookalike	 study	 (72%	 accuracy)	 was	
lower	 than	 identity	 judgment	 accuracy	 found	 in	 past	 studies	 (e.g.	 Bruce	 et	 al,	 1999;	
Megreya	 &	 Burton,	 2007;	 Burton	 et	 al.	 2010).	 When	 compared	 to	 performance	 on	 the	
GFMT,	a	standardised	test	of	face-matching	ability	that	comprised	of	cooperative	stimuli	
and	 limited	 false	 match	 image	 options,	 mean	 performance	 accuracy	 is	 lower	 on	 the	
lookalikes	 task.	 This	 is	 the	 case	 for	 overall	 performance	 accuracy	 on	 the	 unfamiliar	 face	
images	 (lookalikes	 task	 mean	 performance	 =72%,	 GFMT	 =	 89.9%	 (long	 version),	 81.3%	
(short	 version),	 and	 also	 for	 same	 identity	 trials	 (lookalikes	 =	 74%,	 GFMT	 =	 92%	 [long	
version],	79.8%	[short	version])	and	different	identity	trials	(lookalikes	=	70.7%,	GFMT	=	
88%	[long	version],	82%	[short	version]).	Strikingly,	performance	is	nearly	20%	worse	for	
the	naturally	varied	images	of	the	same	celebrity	faces	in	my	study,	than	the	cooperative	
same	 person	 images	 used	 in	 the	 GFMT.	 Furthermore,	 performance	 is	 more	 than	 10%	
worse	for	different	person	trials	in	my	lookalikes	task	than	even	in	the	short	version	of	the	
GFMT,	 which	 includes	 only	 the	 hardest	 items	 in	 the	 GFMT.	 This	 highlights	 that	 both	
photographs	of	the	same	person	which	include	natural	variation,	and	very	similar	looking	
different	identities,	cause	more	problems	to	face	matching	performance	than	previously	

	

63	
	

	
	
captured	by	tasks	constructed	from	cooperative	stimuli.	This	is	concerning	given	that	both	
of	these	image	types	may	be	encountered	in	security	situations	and	attempts	of	fraud.		
	
As	in	past	face	recognition	research	which	has	shown	a	familiarity	advantage	for	both	face	
memory	and	face-matching	(Bruce,	1986,	Ellis	et	al.	1979,	Burton	et	al.	1999),	familiarity	
had	 a	 great	 impact	 on	 face-matching	 performance.	 The	 present	 study	 shows	 that	 the	
benefit	of	familiarity	extends	to	the	challenging	case	of	extremely	similar	lookalike	faces,	
and	 naturally	 varying	 images.	 Whereas	 past	 studies	 were	 often	 very	 easy	 for	 familiar	
viewers,	resulting	in	ceiling	performance	(Hancock,	Bruce	&	Burton,	2000),	the	lookalike	
task	 brought	 performance	 off	 ceiling,	 so	 that	 modulations	 in	 performance	 could	 be	
observed.	Even	with	these	very	challenging	viewing	conditions	accuracy	reached	90%	in	
the	highest	familiarity	band.	Thus,	it	seems	that	the	effect	of	familiarity	is	so	strong,	that	
even	 professional	 lookalikes	 are	 an	 unconvincing	 false	 match	 for	 viewers	 who	 are	
extremely	familiar	with	that	celebrity.		
		
My	 study	 broke	 down	 familiarity	 even	 further	 than	 in	 previous	 studies,	 by	 comparing	 5	
levels	 of	 familiarity	 rather	 than	 just	 3	 as	 used	 by	 Clutterbuck	 &	 Johnston	 (2002,	 2004,	
2005).	Assessing	performance	across	5	levels	of	familiarity	provided	a	more	realistic	and	
accurate	 categorisation	 of	 participants’	 familiarity	 with	 the	 faces,	 allowing	 a	 more	
detailed	exploration	of	the	extent	of	graded	effect	of	familiarity.	Familiarity	with	a	face	is	
not	an	all-or-nothing	phenomenon.	Different	viewers	are	familiar	with	different	faces	to	
differing	 degrees.	 Constructing	 the	 analysis	 around	 that	 insight	 reveals	 much	 finer	
structure	than	a	binary	familiar/unfamiliar	distinction	allows.		
	
Now	 I	 have	 demonstrated	 that	 the	 graded	 familiarity	 advantage	 survives	 to	 the	
challenging	 case	 of	 imposter	 detection,	 I	 want	 to	 see	 how	 much	 further	 I	 can	 push	 the	
effect	of	familiarity.	So	far,	I	have	only	tested	the	effect	for	fine	quality	(un-manipulated)	
images.	 Although	 I	 have	 found	 that	 familiar	 viewers	 can	 tell	 apart	 a	 target	 face	 and	
imposter,	it	is	not	clear	what	they	have	learnt	about	the	familiar	face	which	has	allowed	

	

64	
	

	
	
them	to	do	this.	Given	that	the	faces	in	the	task	were	very	similar	(i.e.	celebrities	and	their	
lookalikes),	 it	 seems	 likely	 that	 fine	 scale	 information	 in	 the	 image	 is	 critical	 for	 making	
the	 necessary	 discriminations.	 If	 so,	 then	 obscuring	 the	 fine-scale	 information	 should	
impair	performance,	even	for	familiar	viewers.	There	are	several	possible	techniques	for	
obscuring	image	detail.	Here	I	used	image	pixelation,	whereby	the	number	of	pixels	in	the	
image	 is	 reduced	 while	 image	 size	 is	 held	 constant.	 This	 technique	 has	 an	 interesting	
pedigree	in	the	psychology	literature	(Harmon	&	Julesz,	1973).	It	also	arises	in	the	context	
of	applied	face	identification	whenever	digital	images	are	enlarged	(Jenkins	&	Kerr,	2013).		
	

2.5 Experiment	2:	Mid	Pixelation	
In	Experiment	1	a	graded	effect	of	familiarity	on	face	matching	performance	was	evident	
for	both	integrating	different	images	of	the	same	face	and	telling	apart	very	similar	faces	
of	 different	 identities.	 In	 that	 experiment,	 fine	 quality	 images	 were	 used	 as	 stimuli.	
However,	 face	 images	 encountered	 in	 applied	 matching	 tasks	 are	 often	 not	 of	 good	
quality.	For	example,	coarsely	pixelated	images	may	be	obtained	by	zooming	in	on	Closed	
Circuit	Television	(CCTV)	footage	to	gain	an	image	of	a	suspect’s	face	(see	Figure	2.7	for	
applied	example).		
	
	
	
	
	
Figure	2.7	Example	of	actual	image	issued	by	the	police	to	the	public	to	assist	with	identification	of	a	man	
caught	on	CCTV	(Howarth,	2016).	This	image	takes	a	pixelated	appearance.		
	

Face	recognition	for	unfamiliar	faces	is	poor	even	under	favourable	image	conditions	(see	
Burton	 et	 al.	 2010).	 The	 task	 is	 even	 more	 challenging	 when	 the	 images	 are	 of	 low	
	

65	
	

	
	
resolution	as	pixelation	disrupts	the	information	that	is	available	from	an	image	(Harmon	
&	Julesz,	1973).	When	an	image	is	pixelated,	the	information	from	several	adjacent	pixels	
is	pooled	to	form	larger	pixels.	The	luminance	of	the	new	pixel	is	determined	by	averaging	
together	the	luminance	values	of	the	constitute	pixels.	High	spatial	frequency	information	
within	that	area	is	thus	lost.	In	tandem,	high	spatial	frequency	noise	is	introduced	at	the	
edges	of	the	new	pixels.	This	is	due	to	the	larger	changes	in	luminance	between	adjacent	
pixels	in	the	new	image	than	between	adjacent	pixels	in	the	original	image.	This	noise	is	
particularly	 disruptive	 to	 viewing,	 as	 the	 visual	 system	 is	 highly	 sensitive	 to	 lines	 and	
geometric	patterns,	making	the	new	pixel	boundaries	difficult	to	ignore	(Harmon	&	Julesz,	
1973).		
	
As	 a	 result	 of	 this	 manipulation,	 it	 is	 difficult	 to	 extract	 exact	 information	 about	 a	 face	
from	 a	 pixelated	 image.	 In	 particular,	 when	 we	 view	 a	 pixelated	 face	 image,	 configural	
information	 (the	 metric	 distances	 between	 features)	 that	 can	 be	 extracted	 from	 a	 face	
becomes	 less	 precise,	 and	 the	 appearance	 of	 features	 becomes	 less	 detailed.	
Unsurprisingly,	 the	 pixelation	 manipulation	 has	 been	 shown	 to	 increase	 the	 difficulty	 of	
image	recognition,	compared	to	a	non-manipulated	version	of	the	image	(e.g.	Harmon	&	
Julesz,	 1973,	 and	 replicated	 by	 Morrone,	 Burr	 &	 Ross,	 1983;	 Sergent,	 1986;	 Bachmann,	
1991;	Costen,	Parker,	&	Craw,	1994,	1996;	Uttal	Baruch	&	Allen,	1995).	However,	people	
can	name	pixelated	celebrity	images,	even	at	very	low	levels	of	pixelation	(Lander,	Bruce,	
&	 Hill,	 2001).	 The	 work	 of	 Lander	 and	 colleagues	 (2001)	 focused	 on	 pixelation	 from	 the	
perspective	 of	 identity	 protection	 issues	 –	 images	 are	 often	 pixelated	 by	 the	 media	 in	
order	to	try	and	protect	identity.	They	argued	that	pixelation	is	not	an	effective	method	of	
obscuring	 identity,	 as	 the	 face	 remains	 identifiable	 to	 a	 familiar	 viewer.	 Although	
recognition	ability	(being	able	to	say	who	the	face	is)	for	pixelated	images	of	famous	or	
learnt	 faces	 has	 been	 tested	 in	 the	 past,	 face-matching	 (determining	 whether	 pairs	 of	
images	depict	the	same	person	or	different	people)	with	pixelated	faces	has	received	little	
attention.		
	

	

66	
	

	
	
Bindemann,	 Attard,	 Leach	 &	 Johnston	 (2013)	 were	 the	 first	 to	 test	 performance	 in	 a	
matching	 task	 using	 pixelated	 unfamiliar	 faces.	 In	 their	 study	 participants	 were	 tasked	
with	matching	a	pixelated	image	to	a	good	quality	image.	The	stimuli	used	in	this	study	
were	the	images	used	in	the	GFMT,	where	one	image	in	each	of	the	image	pairs	had	been	
pixelated	 (see	 Figure	 2.8).	 They	 found	 that	 participants	 were	 much	 better	 at	 matching	
two	 intact	 face	 images,	 than	 matching	 one	 intact	 face	 image	 to	 a	 pixelated	 image.	
Moreover,	performance	declined	as	the	level	of	pixelation	increased.	However,	in	some	
practical	 situations	 investigators	 may	 have	 to	 compare	 multiple	 pixelated	 images	 from	
different	CCTV	footage	to	try	and	piece	together	an	event	sequence.	I	have	been	unable	
to	 find	 any	 published	 research	 that	 previously	 tested	 such	 performance.	 In	 addition	 to	
this,	Bindemann	et	al.	(2013)	used	staged	face	photographs,	using	either	posed	profile	or	
front	 view	 face	 image	 (Burton	 et	 al.	 2010).	 For	 example,	 images	 captured	 from	 CCTV	
footage	may	vary	greatly	in	lighting	and	pose.	It	is	unlikely	that	such	photographs	would	
be	 available	 in	 forensic	 settings.	 For	 these	 reasons,	 it	 is	 important	 to	 measure	 facematching	ability	for	pixelated	images	using	ambient	images,	as	past	research	has	shown	
that	the	same	face	can	appear	drastically	different	between	ambient	photograph	images	
(Jenkins	et	al.	2011).		
	
	
	
	
	
	
Figure	2.8	Example	of	stimuli	used	in	the	face-matching	task	created	by	Bindemann	et	al.	(2013)	
	

These	issues	will	be	addressed	by	Experiment	2,	which	will	test	people’s	ability	to	match	
pairs	 of	 pixelated	 ambient	 images.	 This	 new	 task	 is	 more	 difficult	 than	 previous	
recognition	 and	 naming	 tasks	 because	 both	 the	 lookalike	 and	 the	 celebrity	 image	 map	
	

67	
	

	
	
onto	the	same	individual.	For	example,	the	viewer	might	associate	both	images	with	say,	
Al	Gore,	but	that	is	not	enough	to	solve	this	task.	The	problem	is	to	decide	whether	both	
images	 actually	 show	 Al	 Gore,	 or	 whether	 one	 of	 the	 images	 shows	 an	 imposter	
(lookalike).	 Additionally,	 the	 nature	 of	 the	 paired	 matching	 task	 gives	 a	 baseline	 score.	
Unlike	in	a	naming	task,	chance	performance	is	known	to	be	50%	in	the	paired	matching	
task,	 so	 observed	 performance	 can	 be	 compared	 against	 this	 chance	 level.	 Finally,	 the	
ambient	 images	 used	 in	 this	 task	 provide	 us	 with	 a	 test	 more	 similar	 to	 the	 image	 type	
available	in	real	world	investigations.	
	
To	 address	 all	 of	 these	 issues	 I	 repeated	 Experiment	 1,	 but	 this	 time	 replacing	 the	 fine	
quality	 ambient	 images	 with	 pixelated	 versions	 of	 these	 images.	 The	 aim	 is	 to	 establish	
whether	the	graded	familiarity	advantage	observed	for	imposter	detection	in	fine	quality	
images	 extend	 to	 degraded	 images.	 If	 it	 is	 knowledge	 of	 exact	 facial	 configurations	 and	
fine	featural	detail	that	differentiates	familiar	and	unfamiliar	viewers	in	the	lookalike	task,	
then	obfuscating	that	information	should	eliminate	the	familiarity	advantage.			
	
If	 familiar	 viewers	 are	 using	 the	 detail	 and	 small	 differences	 in	 faces	 to	 solve	 the	 task,	
then	 there	 should	 be	 little	 or	 no	 benefit	 of	 familiarity	 in	 face-matching	 performance	 in	
Experiment	 2,	 where	 this	 information	 is	 difficult	 to	 access.	 On	 the	 other	 hand,	 if	 the	
familiarity	advantage	survives,	that	would	suggest	that	other	information	is	being	used.	
	
Method	
Participants	
30	 undergraduate	 students	 at	 the	 University	 of	 York	 (M	 =	 11,	 mean	 age	 =	 19.7)	
volunteered	as	participants	for	this	project.	Participants	received	payment	of	£3	or	a	half	
hour	course	credit.	None	of	the	participants	had	taken	part	in	the	previous	experiment.	
	

	

68	
	

	
	
Design	&	Stimuli		
As	 in	 Experiment	 1,	 a	 within-subjects	 design	 was	 adopted	 to	 compare	 the	 effect	 of	
Familiarity	 (5	 levels)	 on	 the	 dependent	 variable,	 percentage	 of	 correct	 responses	 in	 the	
face-matching	task.		
	
The	image	pairings	of	the	30	celebrity	faces	and	their	30	lookalikes	faces	were	the	same	
as	Experiment	1.	However,	unlike	Experiment	1,	all	of	these	images	were	pixelated	to	a	
level	of	30	pixels	wide	45	pixels	high	using	Adobe	Photoshop	(CS6)	(see	Figure	2.9	for	a	
side	 by	 side	 example	 of	 the	 stimuli	 used	 in	 Experiment	 1	and	 the	 pixelated	 versions	 for	
use	in	Experiment	2).		
	
	
	
	
	
	
	
	
	
	
Figure	2.9	Example	of	the	image	appearance	for	Experiment	2	(top	pair)	compared	with	the	fine	version	of	
the	same	image	as	used	in	Experiment	1	(bottom)	pair.	These	are	different	image	pairs	of	Al	Gore	with	the	
lookalike	appearing	on	the	right.			

	
	
	

69	
	

	
	
Procedure		
The	procedure	was	the	same	as	in	Experiment	1,	except	that	this	time	participants	viewed	
the	newly	created	pixelated	versions	of	the	image	pairs.			
		
Following	 completion	 of	 the	 face-matching	 task,	 participants	 ranked	 the	 celebrity	 faces	
for	 familiarity	 using	 the	 familiarity	 scale	 as	 in	 Experiment	 1.	 Full	 resolution	 (unmanipulated)	images	were	used	for	the	familiarity-ranking	task	as	in	Experiment	1.		
	
Analysis	
As	 in	 the	 previous	 experiment,	 results	 were	 analysed	 by	 comparing	 the	 percentage	 of	
correct	 responses	 across	 the	 familiarity	 quintiles.	 In	 this	 experiment	 34%	 of	 faces	 were	
placed	in	Band	1,	12%	in	Band	2,	9%	in	Band	3,	10%	in	Band	4	and	34%	in	Band	5.	
	
Results		
A	 one-way	 repeated	 measures	 ANOVA	 was	 performed	 on	 the	 accuracy	 data,	 to	
investigate	 the	 effect	 of	 familiarity	 (5	 levels)	 on	 participants’	 face-matching	 ability	 for	
poor	quality	images.		

	

70	
	

	
	

	
Figure	 2.10	 Graph	 showing	 the	 graded	 effect	 of	 familiarity	 for	 participants’	 face-matching	 task	
performance.	Error	bars	show	standard	error	of	the	mean.	
	

As	 in	 Experiment	 1,	 face-matching	 performance	 was	 significantly	 affected	 by	 Familiarity	
with	 the	 face	 viewed,	 F(3.19,	 92.48)	 =	 2.96,	 p<.05,	 η p2 =	 .09	 (Green-House	 Geisser	
corrected).	Figure	2.10	shows	the	predominantly	graded	effect	of	Familiarity	over	the	5	
familiarity	bands	(Band	1:	M	=	64,	SE	=	2.08,	CI	=	59	-68;	Band	2:	M	=	66,	SE	=	3.22,	CI	=	
59-73;	Band	3:	M	=	70,	SE	=	2.96,	CI	=	64-76;	Band	4:	M	=	68.69,	SE	=	4.07,	CI	=	60-77;	
Band	5:	M	=	76,	SE=2.79,	CI	=	70-82).		
	
Pairwise	 comparisons	 revealed	 that	 accuracy	 for	 the	 highly	 familiar	 faces	 (Band	 5)	 was	
significantly	 better	 than	 performance	 for	 faces	 in	 familiarity	 Band	 1,	 mean	 difference	 =	
12.28,	SE	=	2.75,	CI	=	6.65-17.91,	p<.005	and	also	than	that	of	Band	2,	mean	difference	=	
9.75,	 SE	 =	 3.96,	 CI	 =	 1.64-17.85,	 p<.05.	 There	 were	 no	 significant	 differences	 in	

	

71	
	

	
	
performance	 accuracy	 between	 each	 of	 the	 other	 familiarity	 bands	 (p>.05	 for	 all	
comparisons).		

	
Figure	2.11	Face-matching	performance	broken	down	into	correct	Same	and	Different	identity	trials.	Error	
bars	show	standard	error	of	the	mean.	

	
Accuracy	 data	 were	 also	 analysed	 according	 to	 the	 breakdown	 of	 Same	 identity	 and	
Different	 identity	 correct	 trials	 using	 a	 2x5	 ANOVA	 (see	 Figure	 2.11).	 A	 significant	 main	
effect	 of	 Familiarity	 was	 observed	 [F(4,116)	 =	 3.00,	 p<.05,	 η p2 =.09].	 There	 was	 also	 a	
significant	main	effect	of	Trial	Type	-	participants	were	overall	more	accurate	at	the	Same	
identity	trials	than	the	Different	identity	trials	[F(1,29)	=	4.99,	p<.05,	 η p2 =.15].	However,	
there	 was	 no	 significant	 interaction	 between	 Familiarity	 and	 Trial	 Type	 [F(4,116)=.79,	
p=.53,	 η p2 =.03]	(see	Figure	2.11).		
	

	

72	
	

	
	
Discussion	
Despite	the	pixelation	of	the	images,	accuracy	in	the	lookalike	task	improved	as	familiarity	
increased.		
	
My	 pixelated	 lookalike	 task	 was	 naturally	 more	 challenging	 than	 previous	 identification	
tasks	involving	pixelated	images.	Firstly,	the	lookalike	matching	task	is	more	challenging	
than	 naming	 pixelated	 celebrity	 images	 (Lander	 et	 al.,	 2001),	 as	 in	 my	 task	 both	 the	
lookalike	and	celebrity	lookalikes	could	be	mistaken	for	(and	named	as)	being	the	same	
celebrity.	 In	 addition	 to	 this,	 previous	 matching	 tasks	 have	 involved	 matching	 one	 good	
quality	image	to	one	pixelated	image	(Bindemann	et	al.	2013),	whereas	my	task	required	
matching	 across	 two	 pixelated	 images.	 The	 findings	 from	 my	 challenging	 pixelated	
lookalike	experiment	hence	demonstrated	the	versatility	and	strength	of	familiarity	as	an	
aid	 to	 face	 recognition,	 as	 even	 though	 the	 celebrity	 and	 the	 lookalike	 images	 were	
pixelated,	the	graded	familiarity	effect	on	face-matching	performance	prevailed.	Finally,	
this	study	suggests	that	for	familiar	faces,	learnt	information	other	than	featural	details	
and	 configural	 information	 may	 be	 used	 to	 perform	 the	 task.	 Fine-scale	 image	
information	 was	 more	 difficult	 to	 access	 than	 in	 Experiment	 1,	 yet	 the	 familiarity	
advantage	survived	despite	this.		
	
At	the	current	level	of	pixelation	(30x45	pixels)	some	featural	and	configural	information	
is	still	visible.	This	raises	the	question	of	where	the	familiarity	advantage	will	break	down.	
Presumably	in	the	limiting	case	(1x1	pixel),	performance	on	this	task	would	be	at	chance	
for	 all	 familiarity	 bands.	 The	 graded	 familiarity	 effect	 in	 the	 present	 experiment	 implies	
that	 viewers	 were	 nowhere	 near	 their	 performance	 limit	 in	 this	 task.	 In	 the	 next	
experiment,	 I	 set	 out	 to	 push	 the	 familiarity	 advantage	 to	 its	 limits	 by	 pixelating	 the	
images	even	further.		
	
	

	

73	
	

	
	

2.6 Experiment	3:	Coarse	Pixelation	
Past	research	has	shown	that	there	is	a	limit	to	people’s	ability	to	recognise	a	pixelated	
face	 –	 we	 can	 recognize	 pixelated	 faces	 but	 only	 up	 to	 a	 point.	 The	 lower	 the	 image	
resolution	 the	 higher	 the	 error	 rate,	 with	 results	 eventually	 falling	 to	 chance	 level	
(Bachmann,	 1991;	 Costen	 et	 al.,	 1994,	 1996).	 It	 is	 believed	 that	 a	 familiar	 face	 can	 be	
recognised	 up	 to	 horizontal	 pixelation	 level	 of	 16	 pixels	 per	 face,	 any	 level	 beyond	 this	
results	 in	 a	 steep	 decline	 in	 performance	 (Bachmann,	 1991;	 Costen	 et	 al.,	 1994,	 1996).	
Bindemann	et	al.	(2013)	reported	that	the	pixelation	threshold	was	in	fact	much	lower	for	
matching	 unfamiliar	 faces.	 Bindemann	 et	 al.’s	 (2013)	 study	 reported	 limits	 on	
participants’	abilities	to	match	two	side-by-side	images,	i.e.	the	scenario	in	the	lookalike	
experiment	series.	Bindemann	and	colleagues	(2013)	reported	that	a	large	drop	in	facematching	performance	accuracy	occurred	when	one	of	the	high-resolution	images	in	each	
of	 the	 image	 pairs	 was	 replaced	 with	 images	 of	 a	 horizontal	 resolution	 of	 20	 pixels.	
However,	Lander	(2001)	found	that	people	could	identify	around	half	of	the	familiar	face	
photographs	presented	to	them	when	the	images	comprised	of	a	horizontal	resolution	of	
only	 10	 pixels	 per	 face.	 In	 Bindemann	 et	 al.’s	 (2013)	 unfamiliar	 face-matching	 task	
performance	 was	 at	 around	 chance	 level	 for	 a	 horizontal	 pixelation	 of	 8	 pixels,	 even	
though	 the	 face-matching	 task	 consisted	 of	 co-operative	 stimuli	 (taken	 from	 the	 GFMT)	
presented	side	by	side.	Thus	these	previous	findings	indicate	that	pixelation	will	reach	a	
point	where	the	familiarity	advantage	no	longer	holds.			
	
So	far	the	graded	familiarity	effect	has	prevailed	for	ambient	images	of	extremely	similar	
faces,	even	when	these	images	were	degraded	using	pixelation.	I	previously	argued	that	
reducing	the	number	of	pixels	in	the	image	makes	configural	and	featural	information	in	
the	 image	 more	 difficult	 to	 access.	 The	 accuracy	 data	 from	 Experiment	 2	 suggests	 that	
some	critical	information	was	still	accessible	albeit	at	a	reduced	level	(which	could	explain	
the	overall	poorer	performance	in	Experiment	2).		
	

	

74	
	

	
	
If	 the	 familiarity	 advantage	 remains	 for	 even	 coarser	 pixelation,	 it	 would	 suggest	 that	
information	 other	 than	 fine	 featural	 details	 and	 exact	 configurations	 support	 high	
performance	 by	 familiar	 views	 in	 the	 lookalike	 task.	 If	 the	 familiarity	 advantage	 is	
eliminated,	 this	 would	 suggest	 that	 the	 familiarity	 advantage	 relied	 solely	 on	 the	 finescale	information	that	is	disrupted	by	coarse	pixelation.			
	
Method	
Participants	
30	 undergraduate	 students	 at	 the	 University	 of	 York	 (M	 =	 8,	 mean	 age	 =	 20.2)	
volunteered	as	participants	for	this	project.	Participants	received	payment	of	£3	or	a	half	
hour	course	credit.	None	of	the	participants	had	taken	part	in	the	previous	experiment.	
	
Design		
As	 in	 the	 previous	 experiments,	 Experiment	 3	 was	 a	 within-subjects	 study,	 which	
investigated	the	effect	of	familiarity	on	performance	on	the	lookalike	matching	task.	The	
only	difference	between	this	experiment	and	the	preceding	experiments	was	the	level	of	
pixelation	in	the	stimulus	images.		
	
The	pixelation	level	chosen	for	this	experiment	was	20	pixels	wide	x	30	pixels	high.	This	
particular	 resolution	 was	 selected	 because	 Bindemann	 et	 al.	 (2013)	 reported	 a	 marked	
drop	 in	 performance	 accuracy	 for	 this	 level	 of	 pixelation,	 although	 accuracy	 was	 still	
above	chance	in	their	study.		
	
	
	
	
	

75	
	

	
	
Stimuli		
The	 same	 image	 pairings	 of	 the	 30	 celebrity	 faces	 and	 lookalike	 faces	 for	 each	 of	 these	
celebrities	 were	 used	 as	 in	 Experiment	 1	 and	 Experiment	 2.	 This	 time	 the	 images	 were	
presented	at	a	pixelation	level	of	20x30	pixels	using	Adobe	Photoshop	(CS6).		
	
	
	
	
	
	
Figure	2.12	Example	of	coarsely	pixelated	image	stimuli	used	in	Experiment	3.		

Procedure	
The	procedure	for	this	experiment	was	the	same	as	for	Experiments	1	and	2,	except	that	
the	face-matching	task	now	involved	the	coarsely	pixelated	images	(20x30	pixels).		
	
Following	completion	of	the	face-matching	task	participants	used	the	familiarity	scale	as	
in	the	previous	experiments,	to	indicate	their	level	of	familiarity	with	each	of	the	celebrity	
faces.	As	in	both	previous	experiments,	the	good	quality	image	cards	were	used	for	the	
familiarity	judgement	task.		
	
Analysis	&	Results	
Experiments	 1	 and	 2	 established	 a	 graded	 effect	 of	 familiarity	 on	 task	 performance	 by	
examining	accuracy	at	each	of	five	familiarity	bands.	This	approach	was	not	possible	for	
Experiment	 3	 because	 participants	 used	 the	 middle	 range	 of	 the	 familiarity	 scale	 less	
frequently.	 37%	 of	 faces	 were	 placed	 in	 Band	 1,	 20%	 in	 Band	 2	 and	 43%	 in	 Band	 3.	

	

76	
	

	
	
Dividing	 the	 data	 into	 familiarity	 quintiles	 meant	 that	 the	 middle	 quintiles	 (2,	 3	 and	 4)	
were	 too	 sparsely	 populated	 to	 allow	 meaningful	 statistical	 analysis.	 To	 circumvent	 this	
problem	and	obtain	a	reliable	performance	estimate	for	mid-level	familiarity	faces,	data	
from	 familiarity	 bands	 2,	 3	 and	 4	 were	 pooled	 into	 a	 single	 band.	 This	 resulted	 in	 3	
familiarity	 bands,	 as	 used	 in	 previous	 studies	 (e.g.	 Clutterbuck	 &	 Johnston,	 2002,	 2004,	
2005).		
	
A	 one	 way	 repeated	 measures	 ANOVA	 was	 performed	 on	 the	 accuracy	 data,	 this	 time	
examining	 the	 effect	 of	 Familiarity	 (3	 levels),	 on	 performance	 accuracy	 in	 the	 facematching	task.		

Figure	 2.13	 Percentage	 of	 correct	 responses	 for	 each	 of	 the	 three	 levels	 of	 familiarity	 in	 the	 20x30	 pixel	
condition.	Error	bars	show	standard	error	of	the	mean.	
	

Once	again,	there	was	an	overall	effect	of	Familiarity	for	this	task	[F(2,58)	=	9.44,	p<.001,	

η p2 =.25]	 (see	 Figure	 2.13).	 Mean	 performance	 accuracy	 was	 highest	 for	 highly	 familiar	
	

77	
	

	
	
faces,	Band	1	M	=65%,	(SE	=1.78,	CI	=62-69).	Performance	accuracy	for	Band	2	was	M	=	
54%	 (SE	 =2.84,	 CI	 =	 48-60)	 and	 for	 Band	 1	 M	 =	 58%,	 (SE	 =1.59,	 CI	 =	 54-60)	 (see	 Figure	
2.13).	
	
Pairwise	 comparisons	 revealed	 that	 accuracy	 was	 significantly	 higher	 for	 highly	 familiar	
faces	(Band	3)	than	for	both	faces	of	mid	familiarity	(Band	2)	(mean	difference	=	11.22,	CI	
=	 4.67-17.77,	 p<.005)	 and	 unfamiliar	 faces	 (Band	 1)	 (mean	 difference	 =	 7.70,	 CI	 =	 3.5711.83,	p<.005).	There	was	no	significant	difference	between	accuracy	scores	for	faces	in	
familiarity	Band	1	and	Band	2.		

	
Figure	2.14	Percentage	of	correct	responses	in	face-matching	task	(using	poor	quality	20x30	pixel	images)	
by	familiarity	broken	down	into	same	(dotted	line)	and	different	(dashed	line)	correct	trials.	Error	bars	show	
standard	error	of	the	mean.	

	

	

78	
	

	
	
A	 2x3	 ANOVA	 was	 conducted	 to	 break	 down	 results	 into	 Same	 identity	 and	 Different	
identity	 correct	 trials.	 This	 analysis	 revealed	 a	 significant	 main	 effect	 of	 Familiarity	
[F(2,58)=6.39,	p<.005,	 η p2 =.18].	There	was	however	no	significant	main	effect	of	Pair	Type	
[F(1,29)=1.09,	p=.19,	 η p2 =.04].	There	was	also	a	significant	interaction	between	Pair	Type	
and	Familiarity	[F(2,58)=6.08,	p<.005,	 η p2 =.17).	This	is	illustrated	in	Figure	2.14.	
	
There	was	a	significant	simple	main	effect	of	Familiarity	for	Same	identity	trials	F(2,116)	=	
10.29,	 p<.001,	 η p2 	 =	 .15.	 There	 was	 no	 simple	 main	 effect	 of	 Familiarity	 for	 Different	
identity	trials	F(2,116)	=	2.55,	p>.05,	 η p2 =	.04.		
	
As	a	significant	interaction	was	observed	between	Familiarity	and	Pair	Type,	Tukey	post	
hoc	tests	were	conducted	to	find	out	where	significant	differences	lay.	For	Same	identity	
trials,	 there	 was	 a	 significant	 difference	 in	 performance	 between	 familiarity	 bands	 Low	
(Band	1)	and	High	(Band	3),	and	between	Mid	(Band	2)	and	High	(Band	3).	There	were	no	
significant	differences	between	any	of	the	levels	of	Familiarity	for	Different	identity	trials.			
	
Discussion		
Although	a	familiarity	advantage	survived	in	Experiment	3,	with	people	being	significantly	
better	at	recognizing	the	faces	that	were	extremely	familiar	to	them	(M	=	65%	than	the	
least	 familiar	 face	 (M	 =	 58%),	 the	 graded	 effect	 seen	 in	 Experiments	 1	 and	 2	 did	 not	
emerge	here,	and	accuracy	in	familiarity	Bands	1	and	2	was	numerically	not	much	above	
chance	 level	 (50%).	 It	 seems	 that	 by	 reducing	 image	 quality	 to	 20x30	 pixels	 we	 are	
approaching	the	limit	of	the	familiarity	advantage	in	this	situation.	Breakdown	of	results	
by	trial	type	revealed	that	this	familiarity	advantage	was	driven	by	improved	performance	
for	same	person	trials.	
	

	

79	
	

	
	
It	 is	 interesting	 that	 a	 familiarity	 advantage	 still	 emerged	 when	 comparing	 extremely	
familiar	 faces	 to	 less	 familiar	 faces.	 This	 advantage	 suggests	 a	 role	 for	 coarse	 scale	
information	 even	 when	 discriminating	 extremely	 similar	 faces.	 Performance	 was	 at	
around	chance	level	unless	the	faces	were	extremely	familiar.	In	some	ways	it	may	not	be	
surprising	that	there	was	a	performance	advantage	for	highly	familiar	faces,	as	it	has	been	
previously	shown	that	people	can	recognise	associates	even	in	very	poor	quality	images	
(e.g.	 Burton	 et	 al.	 1999).	 However,	 the	 foil	 faces	 in	 previous	 studies	 have	 been	 generic	
similar	faces	that	merely	share	the	same	basic	description	(e.g.	young	male,	short	black	
hair).	 The	 important	 difference	 here	 is	 that	 my	 lookalike	 foils	 were	 themselves	
recognisable	 as	 the	 celebrities	 they	 were	 impersonating.	 The	 implication	 is	 that	 the	
lookalike	 faces	 differ	 from	 the	 celebrity	 faces	 only	 in	 subtle	 detail.	 Yet	 disrupting	 the	
subtle	 detail	 in	 the	 images	 was	 not	 catastrophic	 for	 familiar	 viewers.	 This	 suggests	 that	
the	 familiar	 viewers	 used	 other	 information	 to	 solve	 the	 task.	 One	 possibility	 is	 that	 at	
least	some	of	the	subtle	differences	are	carried	in	the	low	spatial	frequency	information	
that	is	intact	in	the	pixelated	images.				
	
Another	 cue	 comes	 from	 the	 pattern	 of	 breakdown	 of	 the	 familiarity	 advantage.	 In	
Experiment	3	familiarity	did	not	improve	performance	on	different	identity	pairs.	It	is	thus	
possible	 that	 familiarity	 is	 making	 it	 easier	 to	 determine	 the	 identity	 of	 the	 target	
celebrity	from	the	poor	quality	image,	this	identity	decision	could	come	from	either	the	
celebrity	or	lookalike	image.	When	a	familiar	viewer	can	identify	the	celebrity,	they	then	
have	 access	 to	 all	 the	 representations	 that	 they	 have	 stored	 for	 that	 celebrity’s	 face.	
Familiar	viewers	are	aware	of	many	more	ways	the	celebrity’s	appearance	can	take,	and	
hence	allow	a	greater	range	of	variation	of	appearances	for	the	face,	than	an	unfamiliar	
viewer	 may.	 Therefore	 familiar	 viewers	 may	 be	 more	 accepting	 of	 saying	 same	 to	 the	
image	pairs	in	the	matching	task,	even	though	the	exact	details	can't	be	extracted	from	
the	pixelated	images.	This	approach	would	improve	performance	for	same	identity	trials	
but	 lead	 to	 poorer	 performance	 for	 different	 person	 trials,	 and	 could	 thus	 explain	 the	
pattern	of	results	in	Experiment	3.		
	

	

80	
	

	
	
It	seems	that	faces	are	compared	in	different	ways	depending	on	our	familiarity	with	the	
faces	 involved	 (e.g.	 Megreya	 and	 Burton,	 2006).	 If	 people	 are	 unfamiliar	 with	 the	 face	
they	 may	 be	 matching	 face	 images	 in	 a	 pattern	 matching	 type	 manner,	 similar	 to	 the	
method	used	to	match	images	of	objects	(Hancock	et	al.	2000;	Burton	&	Jenkins,	2011),	
yet	when	people	are	extremely	familiar	with	a	face,	our	findings	suggest	that	people	no	
longer	rely	on	this	pattern	type	matching,	but	can	use	other	information	which	we	have	
learnt	for	familiar	faces,	to	aid	the	matching	task.			
	

2.7 Between	Experiments	Analysis	
In	 order	 to	 compare	 the	 familiarity	 advantage	 across	 experiments	 (and	 across	 image	
quality),	 I	 next	 tested	 how	 the	 results	 of	 Experiment	 3	 compared	 with	 the	 results	 of	
Experiments	 1	 and	 2.	 As	 familiarity	 was	 assessed	 across	 three	 levels	 in	 Experiment	 3,	 I	
reanalysed	 the	 data	 from	 data	 experiments	 1	 &	 2	 in	 the	 same	 way	 to	 allow	 direct	
comparison.		
	
A	 3x3	 mixed	 ANOVA	 was	 performed	 on	 the	 results	 of	 Experiment	 1,	 2	 and	 3.	 	 This	
compared	 Familiarity	 (3	 levels)	 with	 Image	 Quality	 (Experiment,	 3	 levels	 [fine,	 mid	 &	
coarse	 pixelation]).	 The	 ANOVA	 revealed	 a	 main	 effect	 for	 Familiarity,	 such	 that	 facematching	accuracy	increased	with	increasing	familiarity	when	pooling	over	Image	Quality,	
Band	1,	Low	M	=	64%,	Band	2,	Mid	M	=	65%,	Band	3,	High	M	=	77%,	[F(2,	174)	=	35.24,	p	
<.001,	 η p2 =.29].	 Pairwise	 comparisons	 revealed	 that	 these	 differences	 lay	 between	 High	
familiarity	 and	 both	 Mid	 (mean	 difference	 =	 11.33,	 CI	 =	 7.78-14.49,	 p<.001)	 and	 Low	
familiarity	 (mean	 difference	 =	 12.52,	 CI	 =	 9.81-15.24,	 p<.001).	 There	 was	 no	 significant	
difference	between	Low	and	Mid	familiarity,	p>.05.		
	
A	significant	main	effect	was	also	observed	for	Image	Quality.	Performance	accuracy	was	
highest	 for	 the	 Fine	 images	 (Experiment	 1)	 (M=78%,	 SE=1.65,	 CI=74.78-81.36)	 then	 Mid	
pixelation	 images	 (Experiment	 2)	 (M=69%,	 SE=1.65,	 CI=65.51-72.08),	 which	 were	 both	
	

81	
	

	
	
higher	 than	 performance	 for	 the	 Coarse	 pixelation	 images	 (Experiment	 3)	 (M=58%,	
SE=1.65,	CI=54.37-60.94),	[F(2,87)=	39.37,	p<.001,	 η p2 =.47].	
	
There	 was	 an	 interaction	 observed	 between	 Familiarity	 and	 Image	 Quality	 [F(4,174)=	
2.94,	 p<.05,	 η p2 =.06].	 This	 is	 due	 to	 the	 graded	 familiarity	 effect	 seen	 in	 Experiments	 1	
and	2	breaking	down	in	Experiment	3	as	a	result	of	extreme	image	degradation.		
	

2.8 General	Discussion	
It	is	evident	from	this	series	of	experiments	that	being	more	familiar	with	a	face	increases	
a	 person’s	 ability	 to	 tell	 that	 face	 apart	 from	 its	 lookalike,	 and	 also	 to	 ‘tell	 together’	
ambient	images	of	the	same	face.	Familiarity	aids	imposter	detection	even	in	the	case	of	
poor	 quality	 images,	 with	 performance	 increasing	 in	 a	 graded	 manner	 when	 both	 Fine	
pixelation	‘standard’	images	were	viewed	(Experiment	1)	and	when	Mid	pixelation	images	
(30x34	pixels)	were	used	(Experiment	2).	However	the	findings	of	Experiment	3	illustrate	
that	the	graded	familiarity	starts	to	break	down	as	the	pixelation	becomes	more	Coarse	
(20x30	pixels).	Not	only	does	the	graded	pattern	falter	in	the	overall	accuracy	data,	but	
also	the	familiarity	advantage	is	lost	for	different	identity	trials.	
	
My	 findings	 underscore	 those	 of	 previous	 research	 and	 extend	 them	 in	 several	 ways.	 I	
created	a	challenging	face	matching	task	that	addressed	face-matching	performance	for	
ambient	celebrity	face	images	and	very	similar	celebrity	and	lookalike	faces.	Performance	
accuracy	on	this	task	was	even	poorer	than	had	previously	been	established	in	matching	
tasks	which	used	cooperative	stimuli.	In	my	lookalike	task	unfamiliar	viewers	performed	
with	72%	accuracy,	which	was	a	level	of	performance	much	poorer	than	in	the	GFMT	(M	=	
90%	 long	 version,	 81%	 short	 version)	 which	 is	 a	 standardised	 face-matching	 test	
consisting	of	cooperative	stimuli	(Burton	et	al.	2010).	Accuracy	dropped	even	more	with	
degraded	 image	 quality.	 These	 results	 reflect	 that	 human	 face-matching	 performance	 is	

	

82	
	

	
	
even	 worse	 for	 challenging	 images	 than	 previously	 established	 in	 standardised	
cooperative	matching	tests.		
	
My	 research	 also	 adds	 to	 the	 existing	 literature	 on	 familiarity	 as	 a	 graded	 concept	
(Clutterbuck	 &	 Johnston,	 2002,	 2004,	 2005).	 Clutterbuck	 &	 Johnston	 divided	 familiarity	
into	 just	 three	 bands	 (high,	 medium	 and	 low)	 with	 faces	 categorized	 into	 these	 bands	
according	 to	 ratings	 from	 an	 independent	 rater	 group.	 The	 current	 Experiments	 1	 &	 2	
provide	 more	 detailed	 insight	 into	 familiarity	 effects	 by	 tracking	 performance	 over	 five	
levels	 of	 familiarity	 instead	 of	 3.	 Importantly,	 in	 the	 present	 experiments,	 familiarity	
ratings	were	based	on	participants’	own	rankings	of	their	level	of	familiarity	with	a	face,	
unlike	 in	 previous	 studies	 which	 have	 assumed	 equal	 familiarity	 with	 the	 familiar	 face	
stimuli	 for	 all	 participants.	 I	 also	 tracked	 performance	 across	 changes	 in	 image	 quality,	
and	 found	 first	 that	 familiarity	 does	 help	 improve	 performance	 even	 for	 coarsely	
pixelated	 images,	 but	 there	 is	 a	 limit	 to	 the	 familiarity	 advantage,	 especially	 its	 graded	
nature.	 These	 findings	 fit	 with	 previous	 demonstrations	 that	 face	 recognition	 and	 facematching	 performance	 decline	 as	 image	 resolution	 decreases;	 with	 performance	
eventually	falling	to	chance	(Harmon	&	Julesz,	1973;	Bindemann	et	al.	2013).		
		
The	experiments	add	to	the	existing	knowledge	of	the	familiarity	advantage	for	accurate	
face	 recognition.	 My	 study	 provides	 the	 first	 experimental	 investigation	 into	 the	
familiarity	 advantage	 for	 distinguishing	 between	 true	 match	 and	 lookalike	 faces,	 finding	
that	 familiarity	 does	 indeed	 aid	 this	 challenging	 task.	 I	 have	 shown	 that	 the	 graded	
familiarity	advantage	extends	to	a	more	detailed	breakdown	of	familiarity	levels	than	had	
been	 previously	 explored.	 This	 more	 detailed	 analysis	 was	 made	 possible	 by	
acknowledging	 the	 idiosyncratic	 nature	 of	 familiarity	 –	 different	 viewers	 know	 different	
faces	to	different	degrees	–	and	by	allowing	this	insight	to	inform	the	design	and	analysis.		
	
Too	 much	 pixelation	 destroys	 the	 graded	 familiarity	 advantage.	 Although	 an	 overall	
familiarity	advantage	did	carry	through	to	the	case	of	coarsely	pixelated	images	(20x30),	

	

83	
	

	
	
this	 advantage	 was	 carried	 solely	 by	 the	 same	 identity	 pairs.	 This	 eventual	 breakdown	
notwithstanding,	the	general	robustness	of	the	familiarity	effect	against	declining	image	
quality	suggests	that	familiar	viewers	are	using	information	other	than	purely	fine	details	
and	precise	configural	information	to	support	their	make	face-matching	decisions.		
Celebrities	were	used	as	the	target	faces	in	the	experiments.	It	is	therefore	possible	that	
the	 results	 of	 the	 study	 would	 differ	 at	 the	 extremes	 of	 familiarity	 if	 photographs	 were	
taken	 from	 people’s	 everyday	 encounters,	 rather	 than	 celebrities,	 to	 provide	 the	 target	
faces.	For	example,	performance	may	be	better	preserved	if	the	target	face	was	a	family	
member,	 or	 worse	 if	 the	 target	 face	 was	 someone	 who	 had	 never	 been	 seen	 before	
(participants	may	have	had	some	prior	exposure	to	the	celebrity	faces	even	if	they	were	
not	aware	of	it).			
	
As	 well	 as	 their	 theoretical	 interest,	 these	 findings	 may	 also	 have	 practical	 implications.	
Forensic	 investigations	 regularly	 rely	 on	 face	 images	 as	 a	 means	 of	 evidence	 (Loftus	 &	
Doyle,	1992).	I	found	that	the	more	familiar	a	viewer	was	with	the	target	face,	the	better	
was	their	ability	to	reject	similar	faces.	It	seems	logical	that	in	situations	involving	identity	
fraud	 or	 poor	 quality	 images	 that	 a	 viewer	 who	 is	 of	 the	 highest	 available	 level	 of	
familiarity	with	the	target	face	would	be	best	placed	to	judge	the	identity	of	the	person	
concerned	–	and	that	a	little	familiarity	may	be	better	than	none.		
	
In	 summary,	 increasing	 familiarity	 with	 a	 target	 face	 increases	 a	 viewer’s	 ability	 to	
integrate	 different	 images	 of	 the	 same	 person	 and	 to	 distinguish	 images	 of	 different	
people	 –	 even	 in	 the	 context	 of	 very	 similar	 faces,	 and	 poor	 quality	 ambient	 images.	 I	
approach	 a	 limit	 to	 this	 familiarity	 advantage,	 where	 increased	 familiarity	 cannot	 fully	
compensate	for	reduced	image	quality.	In	the	next	chapter,	I	consider	how	performance	
on	this	difficult	face-matching	task	might	be	improved.		
		

	

84	
	

	
	
There	is	also	practical	relevance	to	this	experimental	series.	Performance	was	identified	
to	be	significantly	poorer	as	a	result	of	image	quality	degradation,	yet	there	are	situations	
when	poor	quality	images	are	all	that	are	available	to	aid	an	investigation.	It	is	therefore	
of	interest	to	find	ways	of	improving	performance	for	the	reduced	quality	images	used	in	
these	experiments.	This	will	be	investigated	in	Chapter	3.				

	

85	
	

	

	
	

Chapter	3	–	Improving	Performance	
3.1 Chapter	Summary	
In	this	chapter	I	test	several	ways	of	improving	the	poor	performance	for	pixelated	images	
seen	 in	 Chapter	 2.	 Pixelated	 images	 of	 faces	 are	 often	 encountered	 in	 forensic	
investigations	when	zooming	in	on	digital	images.	Thus	improving	performance	in	this	has	
highly	 applied	 relevance.	 I	 show	 that	 blurring	 the	 pixelated	 images,	 and	 applying	 crowd	
analysis	 can	 both	 improve	 performance	 on	 a	 pixelated	 matching	 task.	 Moreover,	
performance	 benefits	 due	 to	 these	 were	 additive	 meaning	 that	 both	 could	 be	 used	
together	 for	 even	 greater	 performance	 improvement.	 Finally,	 I	 found	 that	 superrecognisers	outperformed	control	participants,	even	in	the	extremely	challenging	task	of	
imposter	detection	for	poor	quality	images.	This	is	the	first	time	that	super-recognisers’	
performance	 has	 been	 shown	 to	 extend	 beyond	 good	 quality	 images	 of	 cooperative	
stimuli.		
	

3.2 Introduction	
In	Chapter	2,	I	showed	that	familiarity	can	help	when	dealing	with	pixelated	images,	but	
this	familiarity	advantage	was	pushed	to	its	limit	when	dealing	with	the	coarsely	pixelated	
images	 in	 Experiment	 3	 (20x30	 pixels).	 In	 forensic	 situations,	 the	 problem	 of	 identifying	
pixelated	 faces	 is	 often	 encountered	 because	 this	 is	 the	 resulting	 image	 type	 from	
zooming	in	on	CCTV	footage	(Bindemann	et	al.,	2013)	or	other	digital	images	(Jenkins	&	
Kerr,	2013).	In	many	such	cases,	finding	a	viewer	who	is	familiar	with	the	faces	concerned	
is	 not	 a	 viable	 option.	 For	 example,	 passport	 security	 officers	 or	 club	 bouncers	 may	 be	
required	to	make	identity	judgments	concerning	many	individuals	in	a	very	short	period	
of	 time.	 With	 poor	 image	 quality	 being	 a	 very	 real	 problem	 in	 applied	 environments,	 it	
would	 be	 useful	 to	 find	 ways	 to	 improve	 human	 face	 recognition	 performance	 for	
pixelated	images	across	the	familiarity	continuum.	In	this	chapter	I	will	test	several	very	
different	 methods	 for	 improving	 pixelated	 face	 recognition,	 specifically	 image	
manipulations,	data	analysis	and	specialist	viewer	groups.			

	

86	
	

	
	
	
Most	previous	attempts	to	improve	face	recognition	performance	have	revolved	around	
training,	usually	using	good	quality	images.	However,	training	approaches	have	met	with	
little	success.	In	an	early	example,	Woodhead,	Baddeley	&	Simmonds	(1979)	evaluated	a	
longstanding	 three-day	 training	 course,	 which	 aimed	 to	 improve	 face	 recognition	
performance	 of	 attendees.	 The	 course	 was	 deemed	 intensive,	 consisting	 of	 lectures,	
demonstrations,	 discussion	 and	 practical	 work.	 Specific	 focus	 was	 placed	 on	 learning	
isolated	 features,	 as	 the	 course	 founders	 believed	 this	 to	 be	 the	 key	 to	 successful	 face	
recognition.	 Before	 this	 study	 was	 conducted,	 the	 success	 of	 the	 programme	 had	 not	
been	 measured.	 To	 measure	 its	 effectiveness,	 attendees	 were	 tested	 on	 face-matching	
and	 face	 memory	 tasks	 both	 before	 and	 after	 the	 three	 days	 of	 training.	 Their	
performance	on	these	tasks	was	compared	with	a	control	group	who	did	not	take	part	in	
the	 training	 course.	 It	 was	 concluded	 that	 undergoing	 training	 did	 not	 significantly	
improve	 performance	 in	 any	 of	 the	 test	 tasks,	 with	 both	 trainees	 and	 controls	 showing	
similar	mean	hit	rates	between	.6	and	.9	depending	on	the	task	type.	In	one	test	training	
actually	led	to	a	significant	decrease	in	performance	compared	to	controls.	The	authors	
explained	 this	 by	 suggesting	 that	 attention	 to	 isolated	 features	 may	 impair	 face	
recognition	performance	rather	than	improve	performance	in	some	instances	(Woodhead	
et	 al.	 1979).	 This	 explanation	 for	 poorer	 performance	 after	 training	 is	 built	 on	 the	
research	of	Winograd	(1976)	who	found	that	when	participants	focused	their	attention	on	
one	specific	facial	feature,	their	memory	for	the	face	was	impaired.		

Figure	3.1	Face	shape	classification	examples	provided	by	Towler	et	al.	(2014).			
	

Overall	 Woodhead	 et	 al.	 (1979)	 showed	 that	 attendance	 on	 a	 training	 course	 which	
focused	 on	 attention	 to	 isolated	 features	 did	 not	 improve	 face	 recognition	 accuracy.	

	

87	
	

	
	
However,	it	is	possible	that	training	in	other	methods	could	help.	Towler,	White	&	Kemp	
(2014)	assessed	the	face	shape	training	strategy,	which	focused	on	classifying	the	shape	
of	a	face	(e.g.	oval,	round,	square)	as	the	basis	for	successful	face	recognition	(see	Figure	
3.1).	Unfortunately	this	technique	was	also	not	successful	in	improving	face	recognition.	
Participants	of	the	course	were	no	better	at	the	GFMT	after	undergoing	the	face	shape	
training	 than	 they	 were	 before	 training.	 The	 face	 shape	 training	 strategy	 seems	 to	 be	
fundamentally	flawed,	as	a	face	does	not	take	on	a	consistent	shape	and	also,	it	is	difficult	
to	 classify	 faces	 by	 shape	 as	 there	 are	 not	 particularly	 clear	 distinctions	 between	 each	
category	e.g.	what	one	person	considers	a	round	face	another	may	consider	to	be	oval.	
This	was	found	to	be	the	case	in	the	study,	as	different	images	of	the	same	individual’s	
face	 were	 frequently	 described	 as	 having	 different	 face	 shapes.	 During	 training	
participants	viewed	a	series	of	five	different	same	identity	photographs.	Each	identity	was	
judged	as	having	the	same	face	shape	across	all	five	photographs	in	only	7%	of	cases.	It	
was	 noted	 that	 the	 perceived	 shape	 of	 a	 face	 was	 not	 a	 diagnostic	 characteristic	 of	
identity,	 hence	 explaining	 why	 face	 shape	 training	 does	 not	 improve	 face	 recognition.	 I	
return	to	the	issue	of	face	shape	in	Chapter	4.		
	
Some	 training	 studies	 have	 shifted	 away	 from	 identifying	 specific	 aspects	 of	 a	 face,	 to	
more	 general	 strategies.	 White,	 Kemp,	 Jenkins	 &	 Burton	 (2014)	 showed	 slight	 but	
significant	 improvements	 in	 face-matching	 performance	 when	 participants	 received	
immediate	feedback	on	their	decisions.	Here	the	authors	were	not	concerned	with	how	
the	 viewers	 made	 their	 decisions,	 but	 instead	 focused	 on	 notifying	 viewers	 on	 whether	
identity	 judgments	 they	 made	 were	 right	 or	 wrong.	 This	 worked	 in	 the	 form	 of	
participants’	 receiving	 a	 correct	 or	 incorrect	 statement,	 immediately	 following	 their	
answer	submission	for	each	trial	of	a	face-matching	task	that	they	completed.	The	images	
remained	on	the	screen	while	the	feedback	was	presented	and	improvement	generalised	
to	new	faces	shown	in	the	task,	but	it	is	not	known	how	long	lasting	the	benefit	from	the	
feedback	would	be.				
	

	

88	
	

	
	
Taking	 a	 rather	 different	 approach,	 Dowsett	 &	 Burton	 (2014)	 also	 report	 some	 positive	
effects	 of	 working	 in	 pairs	 when	 making	 matching	 identity	 judgments.	 Individuals	 were	
tested	for	face-matching	performance,	and	then	tested	as	part	of	a	pair,	and	finally	tested	
individually	 again.	 In	 the	 pair	 judgment	 condition,	 the	 identity	 decision	 was	 made	
collaboratively	after	discussion.	Working	in	a	pair,	lifted	performance	to	the	level	of	the	
higher	 individually	 performing	 member.	 Interestingly,	 the	 effect	 of	 pair	 working	 carried	
over	 to	 improve	 the	 performance,	 especially	 of	 the	 weaker	 pair	 member,	 at	 the	 later	
individual	testing.	This	study,	along	with	the	work	of	White	et	al.	(2014),	provide	evidence	
for	feedback	as	an	important	self-regulator	of	performance	accuracy,	which	may	improve	
face	 recognition	 in	 some	 situations.	 Although	 performance	 improvements	 were	
statistically	significant,	they	were	numerically	small	in	both	studies.			
	
Although	training	courses	have	been	found	not	to	improve	face	recognition	performance,	
and	recent	lab	based	studies	showed	only	small	improvements,	it	remains	a	widely	held	
belief	that	those	who	hold	jobs	that	rely	on	the	ability	to	accurately	identify	faces,	will	be	
better	at	face	recognition	than	people	whose	jobs	do	not	rely	on	this	ability.		
	
In	 reality,	 it	 seems	 that	 highly	 trained	 officials	 are	 no	 better	 than	 untrained	 and	
inexperienced	others	at	matching	faces.	Burton	et	al.	(1999)	reported	that	police	officers	
performed	 at	 the	 same	 level	 of	 accuracy	 as	 undergraduate	 students	 in	 a	 task	 that	
involved	 matching	 poor	 quality	 CCTV	 footage	 images	 to	 comparison	 face	 images.	 More	
recently,	 passport	 officers	 have	 been	 tested	 on	 the	 GFMT.	 Passport	 officers’	 performed	
with	79.2%	accuracy	on	the	task,	whereas	mean	performance	for	the	general	population	
control	group	was	81.3%	(Figure	3.2).	There	was	no	significant	difference	between	these	
performance	scores.	It	could	be	argued	that	the	GFMT	task	does	not	mimic	the	identity	
matching	task	that	passport	officers	perform,	which	involves	comparison	of	a	photograph	
to	 a	 physically	 present	 face	 rather	 than	 to	 another	 static	 image.	 White	 et	 al.	 (2014a)	
addressed	this	by	testing	passport	officers	on	a	task	that	directly	mimicked	the	passport	
control	scenario	of	matching	an	image	to	a	physically	present	face,	and	also	on	image-toimage	matching	performance.	Passport	officers	again	performed	no	better	than	a	control	
	

89	
	

	
	
group	of	undergraduate	students.	Relating	back	to	the	work	discussed	above	on	training,	
no	 relationship	 was	 found	 between	 number	 of	 years	 on	 the	 job	 and	 performance	
accuracy.	 These	 findings	 make	 sense	 with	 reference	 to	 previous	 research	 –	 if	 training	
holds	 little	 benefit	 elsewhere,	 there	 is	 no	 reason,	 other	 than	 perhaps	 increased	
motivation,	 why	 similar	 training	 would	 have	 improved	 performance	 for	 those	 people	
whose	jobs	rely	on	high	face	recognition	accuracy	when	they	have	not	helped	before.		
	
	
	
	
	
	
Figure	 3.2	 Graph	 from	 White	 et	 al.	 (2014)	 passport	 officer	 paper	 showing	 the	 officers’	 performance	
accuracy	on	the	GFMT	alongside	their	employment	duration.	Some	police	officers	performed	very	highly	on	
the	GFMT,	these	high	scores	can	be	found	at	both	ends	of	the	employment	duration	axis.		

	
Taken	 together,	 the	 previous	 research	 on	 improving	 face	 recognition	 performance	
suggests	that	training	is	unlikely	to	improve	face	recognition	performance	on	challenging	
pixelated	images.	In	this	chapter	I	will	attempt	to	improve	face	recognition	performance	
using	three	distinct	approaches,	none	of	which	rely	on	training.	There	are	three	methods	
that	I	will	test	for	improving	face	recognition	performance.	First	I	will	investigate	whether	
image	manipulations,	in	the	form	of	blurring	pixelated	images,	can	improve	performance.	
Second	I	will	examine	whether	crowd	effects,	which	is	a	data	analysis	technique,	can	be	
applied	to	data	that	has	already	been	collected	to	improve	performance.	Third,	I	will	test	
whether	super-recognisers	can	be	relied	upon	to	make	more	accurate	identity	judgments	
than	 controls	 for	 challenging	 images.	 Each	 of	 the	 approaches	 I	 use,	 and	 past	 research	
involving	 these	 techniques	 will	 be	 described	 and	 discussed	 in	 detail	 within	 the	 body	 of	
this	chapter.	

	

90	
	

	
	

3.3 Experiment	4:	Blurring	Pixelated	Images	
It	was	evident	in	Chapter	2	that	although	an	advantage	of	extreme	familiarity	survived	for	
face	 recognition	 performance	 when	 dealing	 with	 coarsely	 pixelated	 image	 pairs,	 overall	
performance	 was	 poor	 compared	 to	 the	 prior	 better	 quality	 image	 pair	 experiments.	
Performance	was	at	around	chance	level	for	faces	that	were	unfamiliar,	and	poor	for	the	
extremely	familiar	faces	relative	to	performance	for	this	familiarity	level	in	Experiments	1	
and	2	of	Chapter	2.		
	
It	 has	 been	 noted	 by	 past	 researchers	 that	 ability	 to	 recognise	 a	 pixelated	 face	 can	 be	
improved	 by	 blurring	 the	 image.	 It	 may	 sound	 somewhat	 counterintuitive	 to	 blur	 a	
pixelated	image,	because	blurring	removes	information	from	the	image.	Harmon	&	Julesz	
(1973)	 explain	 that	 when	 an	 image	 is	 pixelated,	 each	 square	 is	 a	 result	 of	 the	 average	
density	 of	 the	 pixels	 that	 makes	 up	 this	 area	 in	 the	 original	 image.	 There	 is	 more	 of	 a	
difference	in	amplitude	between	two	adjacent	pixels	than	there	may	have	been	between	
two	 of	 the	 pixels	 in	 the	 original	 image.	 This	 difference	 in	 amplitude	 introduces	 high	
frequency	 noise	 at	 the	 pixel	 edges,	 making	 it	 difficult	 to	 extract	 useful	 information.	
Configural	 information	 becomes	 less	 precise	 and	 featural	 information	 less	 detailed.	
However,	 when	 a	 pixelated	 image	 is	 blurred,	 high	 frequency	 noise	 is	 removed	 and	
identity	 is	 easier	 to	 recover	 (Harmon	 &	 Julesz,	 1973;	 Morrone,	 et	 al.	 1983).	 Blurring	 is	
essentially	 the	 same	 as	 low	 pass	 filtering,	 in	 that	 both	 processes	 filter	 out	 high	 spatial	
frequency	information.		
	
Here	 I	 examine	 whether	 blurring	 the	 pixelated	 images	 used	 in	 Chapter	 2,	 Experiment	 3	
will	improve	performance	despite	the	extreme	similarity	of	the	face	images	in	each	pair.	If	
blurring	enhances	performance	on	the	task	then	this	technique	could	be	used	in	applied	
settings	to	aid	facial	identifications	in	forensic	investigations.		
	

	

91	
	

	
	
I	predict	that	removing	high	frequency	image	noise	through	blurring	the	pixelated	images	
will	improve	accuracy	as	compared	against	the	results	of	Experiment	3.		
	
Method	
Participants	
A	group	of	30	undergraduate	students	(M=	6,	mean	age	=	19.7)	at	the	University	of	York	
(who	had	not	taken	part	in	any	of	the	previous	lookalike	tasks	involved	in	this	series	of	
experiments)	volunteered	as	participants	for	this	study.		
	
Design	and	Stimuli	
As	 in	 the	 previous	 experiments,	 this	 experiment	 adopted	 a	 within	 subjects	 design.	 The	
variable	 familiarity	 was	 examined	 at	 three	 levels	 in	 order	 to	 keep	 consistency	 with	 the	
design	of	Experiment	3.	This	allowed	me	to	perform	a	between	experiments	comparison	
of	 performance	 for	 the	 different	 image	 types	 (pixelated	 and	 blurred),	 comparing	 the	
results	of	this	experiment	from	those	from	Experiment	3.		
	
Face-matching	Task		
The	 stimuli	 for	 this	 experiment	 were	 modified	 versions	 of	 the	 face	 images	 from	
Experiment	3.	To	create	the	stimuli	versions	necessary	for	this	new	experiment,	I	took	the	
pixelated	images	used	as	the	stimuli	in	Experiment	3	and	applied	a	blurring	technique	to	
these	images	using	Adobe	Photoshop	(CS6).	The	20x30	pixel	face	images	were	blurred	at	
a	radius	of	3.8	pixels	using	Photoshop’s	Gaussian	blur	function	(see	Figure	15	for	a	side	by	
side	 comparison	 of	 the	 coarsely	 pixelated	 and	 blurred-pixelated	 stimuli).	 This	 blurring	
level	 was	 determined	 via	 pilot	 testing	 in	 which	 two	 raters	 assessed	 changing	 pixelation	
levels	on	a	sliding	scale	and	decided	by	eye	on	a	level	that	they	believed	made	the	image	
easier	to	identify.		

	

92	
	

	
	
	
	
	
	
	
	
Figure	3.3	Identical	images	of	Al	Gore	(left)	and	Gary	Barlow	(right)	shown	as	they	were	presented	in	each	
experimental	 condition.	 The	 image	 on	 the	 left	 for	 each	 identity,	 shows	 the	 coarsely	 pixelated	 image	 as	
presented	 in	 Experiment	 3.	 The	 images	 on	 the	 right,	 show	 the	 image	 on	 the	 left	 of	 it,	 after	 undergoing	
blurring,	and	as	presented	in	Experiment	4.		
	

Procedure		
The	procedure	was	the	same	as	for	Experiment	3	except	that	the	images	viewed	were	a	
blurred	version	of	the	coarsely	pixelated	images.	As	in	the	prior	experiments,	participants	
ranked	 each	 of	 the	 celebrity	 faces	 in	 order	 of	 familiarity	 using	 the	 familiarity	 scale	 (see	
back	to	procedure	section	in	Experiment	1	for	a	detailed	description	of	the	face-matching	
task	and	familiarity	rating	scale	used	for	all	experiments	in	this	series).		
	
Results	
There	was	a	significant	main	effect	of	Familiarity	on	face-matching	performance	for	the	
blurred	version	of	the	task	[F(2,58)	=	6.07,	p<.01,	 η p2 =	.17].	Participants	performed	with	
lowest	accuracy	for	familiarity	band	1	(M	=	62%,	SE	=	2.15),	with	accuracy	increasing	as	
familiarity	increased	for	band	2	(M	=	65.5%,	SE	=	2.5)	and	band	3	(M	=	72.3%,	SE	=	1.81).	
Performance	for	familiarity	band	3	was	significantly	greater	than	for	band	1	and	band	2,	
no	other	differences	were	significant.	
	

	

93	
	

	
	
In	Experiment	3,	the	familiarity	advantage	was	found	only	for	same	identity	trials.	In	order	
to	 check	 whether	 blurring	 recovered	 the	 graded	 familiarity	 advantage	 for	 each	 type,	
performance	accuracy	for	the	blurred	pixelated	faces	was	analysed	according	to	same	and	
different	trials	using	a	mixed	ANOVA.	

Figure	3.4	Graph	showing	performance	accuracy	on	the	blurred	pixelated	task	split	by	same	and	different	
person	trials.	Error	bars	show	the	standard	error	of	the	mean.	

	
For	the	Same	and	Different	identity	trial	breakdown	there	was	a	significant	main	effect	of	
Familiarity	[F(2,	58)	=	6.02,	p<.01,	 η p2 =	.17]	but	no	main	effect	of	Trial	Type	[F(1,29)	=	.96,	
p=.33,	 η p2 =	.03]	and	no	interaction	between	Familiarity	and	Trial	Type	[F(2,58)	=	.13,	p	=	
.87,	 η p2 =	 .01].	 This	 shows	 that	 for	 the	 blurred	 version	 of	 the	 faces,	 familiarity	 improved	
performance	on	same	person	trials	and	on	different	person	trials	(see	Figure	3.4).	
	

	

94	
	

	
	
Between	Experiment	Analysis	
To	 find	 out	 whether	 performance	 was	 better	 for	 pixelated	 and	 blurred	 faces	 than	 for	
pixelated	 faces,	 a	 between	 experiments	 analysis	 was	 conducted	 using	 a	 2	 way	 mixed	
ANOVA.	 Factors	 for	 the	 ANOVA	 were	 familiarity	 (within	 subject	 factor)	 and	 image	 type,	
pixelated	or	pixelated	and	blurred	(between	subjects).		

	
Figure	3.5	Percentage	of	correct	responses	in	face-matching	task	for	each	familiarity	band	(low	familiarity,	
mid	 familiarity,	 high	 familiarity),	 for	 Experiments	 3	 (black	 line)	 &	 4	 (blue	 line).	 Error	 bars	 show	 standard	
error	of	the	mean.	

Overall,	 performance	 accuracy	 was	 significantly	 higher	 when	 pixelated	 images	 were	
blurred	(M	=	66,	SE	=	1.36,	CI	=	64.	-70)	than	performance	had	been	in	the	Experiment	3	
which	consisted	of	the	coarsely	pixelated	images	(M	=	59,	SE	=	1.36,	CI	=	56	-62),	[F(1,58)	
=	16.75,	p<.001,	 η p2 =	.22],	see	Figure	3.4.	
	

	

95	
	

	
	
Additionally,	a	main	effect	of	Familiarity	was	evident	from	the	results	[F(2,116)	=	12.53,	
p>.001,	 η p2 =	 .18].	 Participants	 were	 better	 at	 recognizing	 the	 faces	 they	 were	 most	
familiar	with	(M	=	69,	SE	=	1.27,	CI	=	66-71)	than	the	faces	that	were	of	Mid	(M	=	60,	SE	=	
1.89,	CI	=	56-64)	or	Low	familiarity	(M	=	59,	SE	=	1.34,	CI	=	57-62)	to	them.	No	interaction	
was	observed	between	Familiarity	and	Experiment	[F(2,58)=1.33,	p=.27,	 η p2 =.02].		
	
To	keep	analysis	consistent	with	that	of	previous	experiments,	a	three	way	ANOVA	was	
conducted	for	Familiarity	(within-subjects,	3	levels	–	Low	familiarity,	Mid	familiarity,	High	
familiarity),	 Trial	 Type	 (within-subjects,	 2	 levels	 –	 same,	 different)	 and	 Experiment	
(between	 subject,	 2	 levels	 –	 Experiment	 3,	 Experiment	 4).	 The	 ANOVA	 revealed	 a	
significant	main	effect	of	Familiarity	[F(2,116)	=	11.2,	p<.001,	 η p2 =	.16].	As	expected	from	
Figure	 2.11	 and	 Figure	 3.5	 there	 was	 significant	 three	 way	 interaction	 between	
Familiarity,	 Trial	 Type	 and	 Experiment	 [F(2,116)	 =	 3.53,	 p<.05,	 η p2 =	 .06]	 confirming	 that	
Familiarity	 affected	 performance	 according	 to	 Trial	 Type	 in	 different	 ways	 across	 the	
experiments.	No	other	results	were	significant.		
	
Discussion		
Blurring	the	pixelated	images	had	a	significant	positive	effect	on	performance	accuracy.	
Overall,	 performance	 was	 12%	 better	 when	 participants	 viewed	 blurred	 versions	 of	 the	
pixelated	 images	 (Experiment	 4)	 compared	 to	 when	 participants	 viewed	 the	 pixelated	
image	(Experiment	3).		
	
Previously,	 blurring	 a	 pixelated	 image	 had	 been	 found	 to	 aid	 identity	 recognition	 of	 a	
pixelated	 face	 (Harmon	 &	 Julesz,	 1973;	 Morrone,	 et	 al.,	 1983).	 My	 research	 found	 that	
this	blurring	advantage	extended	to	matching	involving	similar	images.	Blurring	the	image	
to	 a	 greater	 or	 lesser	 degree	 may	 have	 led	 to	 more	 of	 an	 improvement,	 but	 I	 was	

	

96	
	

	
	
primarily	concerned	on	whether	or	not	any	blurring	would	improve	performance,	which	it	
did.		
	
My	experiment	also	leads	to	theoretical	implications,	helping	to	answer	whether	identity	
information	 is	 carried	 in	 high	 or	 low	 spatial	 frequency	 information	 in	 a	 face	 and	 more	
specifically,	 how	 this	 information	 plays	 out	 in	 the	 identifying	 similar	 faces.	 When	 a	
pixelated	image	is	blurred,	the	high	spatial	frequency	information	in	an	image	is	removed.	
It	was	previously	reported	that	faces	could	be	identified	from	blurred	pixelated	images,	
which	 suggests	 that	 it	 is	 the	 low	 spatial	 frequency	 information	 which	 is	 important	 for	
identity	 (Harmon	 &	 Julesz,	 1973;	 Morrone	 et	 al.	 1983).	 My	 task	 involved	 very	 similar	
images	therefore	any	differences	between	the	identities	was	subtle.	As	blurring	improved	
performance	 for	 these	 images,	 one	 interpretation	 of	 my	 findings	 is	 that	 some	 of	 the	
subtle	differences	between	identities	were	held	in	low	spatial	frequencies.		
	
The	findings	of	this	study	are	of	great	practical	relevance.	Pixelated	images	are	often	the	
image	 type	 which	 police	 have	 to	 deal	 with.	 It	 has	 been	 shown	 by	 previous	 studies	 that	
performance	 on	 unfamiliar	 face-matching	 tasks	 is	 extremely	 poor,	 especially	 when	 the	
images	are	of	poor	quality.	Experiments	1,	2	and	3	reiterated	this,	with	the	extreme	and	
more	 challenging	 case	 of	 imposter	 detection.	 I	 found	 that	 performance	 was	 particularly	
poor	 in	 the	 imposter	 detection	 task	 for	 the	 unfamiliar	 faces,	 and	 performance	
deteriorated	 as	 a	 function	 of	 image	 quality.	 Blurring	 the	 pixelated	 images	 is	 a	 simple	
image	manipulation	that	police	could	use	to	aid	the	likelihood	of	correctly	distinguishing	
and	detecting	same	and	similar	faces.		
	

3.4 Experiment	5:	Crowd	Analysis	
In	 some	 situations	 data	 on	 identity	 tasks	 has	 already	 been	 collected	 before	 the	
challenging	 nature	 of	 the	 task	 due	 to	 pixelation	 has	 been	 addressed	 through	 image	
manipulation.	 It	 is	 possible	 that	 more	 accurate	 results	 of	 identity	 judgments	 can	 be	

	

97	
	

	
	
obtained	from	the	existing	data,	through	reanalysing	the	data	already	collected	but	in	a	
different	way.	We	have	seen	already	that	performance	is	poor,	even	for	familiar	viewers,	
when	 the	 images	 are	 coarsely	 pixelated.	 Blurring	 the	 pixelated	 images,	 as	 above,	 did	
improve	face	recognition	performance,	however	accuracy	remained	lower	than	it	did	for	
good	 quality	 images.	 A	 technique	 called	 crowd	 analysis,	 may	 improve	 performance	
accuracy,	 and	 could	 even	 help	 in	 the	 situation	 when	 no	 viewers	 are	 extremely	 familiar	
with	the	faces	whose	identities	are	in	question.	
	
Crowd	 analysis	 is	 based	 on	 the	 idea	 that	 a	 group	 performs	 better	 than	 an	 individual	
(Galton,	 1907;	 Krause	 et	 al.,	 2011;	 White,	 Burton,	 Kemp	 &	 Jenkins,	 2013).	 Rather	 than	
relying	on	a	single	person’s	decision,	the	majority	vote	can	be	obtained	by	pooling	these	
individuals’	results	to	find	the	group	average	response.	The	most	commonly	given	answer	
is	 taken	 as	 the	 group	 answer	 –	 simple	 majority	 rule.	 Crowd	 effects	 are	 thus	 created	 by	
pulling	together	the	mean	result	of	a	group	of	people.	This	mean	answer	is	then	taken	as	
the	new	response	and	compared	against	the	correct	answer.		
	
The	 power	 of	 the	 crowd	 over	 an	 individual	 has	 been	 acclaimed	 for	 quite	 some	 time;	
Aristotle	 addressed	 the	 wisdom	 of	 the	 crowds	 in	 his	 book	 Politics	 (Aristotle,	 published	
1920).	 Later,	 statistician	 Sir	 Francis	 Galton	 (1907)	 demonstrated	 that	 a	 crowd	 could	
outperform	the	individual	by	calculating	the	median	estimated	weight	judgment	of	an	ox	
in	a	guess	the	weight	of	an	ox	competition	at	a	fair.	He	found	that	the	median	score	of	a	
crowd	of	800	people	provided	an	answer	that	was	within	1%	of	the	true	weight	of	the	ox.	
Since	Aristotle	and	Galton,	the	concept	of	wisdom,	or	power	of	crowds	has	been	referred	
to	by	many	different	names.	In	biological	terms,	this	phenomenon	is	commonly	referred	
to	as	swarm	intelligence,	and	refers	to	the	superior	ability	of	groups	in	solving	cognitive	
problems	over	the	individuals	who	make	up	that	group.	The	benefit	of	swarm	intelligence	
has	been	more	recently	replicated	through	experiments	involving	guessing	the	number	of	
marbles	 or	 the	 number	 of	 sweets	 in	 a	 jar	 (Krause	 et	 al.	 2011;	 King	 et	 al.	 2012).	 An	
interesting	finding	in	these	studies	was	that	groups	of	low	performers	could	outperform	
individual	 high	 performers.	 Krause	 et	 al.	 2011	 highlight	 that	 swarm	 intelligence	 is	
	

98	
	

	
	
beneficial	 in	 providing	 a	 more	 accurate	 answer	 to	 a	 sweet	 in	 a	 jar	 question,	 but	 less	
beneficial	 when	 dealing	 with	 questions	 that	 require	 expertise	 of	 knowledge	 –	 this	 was	
tested	 with	 regards	 to	 predicting	 coin	 toss	 odds	 in	 relation	 to	 the	 odds	 of	 winning	 the	
lottery.	For	the	coin	toss	and	lottery	question,	groups	would	only	outperform	experts	if	
the	 group	 size	 was	 larger	 than	 40	 people.	 Group	 sizes	 varied	 between	 2	 and	 80,	 with	
larger	group	sizes	providing	more	accurate	results.		
	
King	et	al.	2012	also	found	that	swarm	intelligence	improved	accuracy	of	sweet	guessing	
judgments,	but	then	adapted	the	experiment	to	find	out	whether	providing	the	individual	
guessers	 with	 additional	 information	 (the	 average	 guess	 taken	 by	 the	 people	 before	
them,	 the	 guess	 of	 a	 randomly	 chosen	 previous	 person,	 the	 guess	 of	 the	 person	 before	
them	or	the	best	guess	which	had	been	made	before	theirs)	would	sway	their	judgment.	
With	no	additional	information,	the	crowd	far	outperformed	the	individuals.	Knowledge	
of	any	of	the	previous	guesses,	except	the	case	of	the	best	previous	guess,	reduced	the	
effect	of	swarm	intelligence.	Access	to	the	best	guess	led	to	more	accurate	results	than	in	
each	 of	 the	 other	 conditions	 at	 both	 and	 individual	 and	 small	 group	 analysis	 level.	 This	
suggests	that	crowd	analysis	does	help,	in	all	situations,	but	the	added	information	of	an	
expert’s	guess	could	help	improve	accuracy	further.		
	
White	et	al.	(2013)	were	the	first	to	have	used	the	wisdom	of	the	crowd	theory	to	address	
the	 problem	 of	 face	 identity	 judgments,	 and	 refer	 to	 the	 technique	 they	 use	 as	 crowd	
analysis.	The	principle	remains	the	same;	the	most	common	result	of	the	group	is	taken	
as	 the	 answer	 (majority	 vote)	 instead	 of	 averaging	 responses	 at	 an	 individual	 response.	
White	 and	 colleagues	 (2013)	 report	 that	 performance	 accuracy	 on	 the	 GFMT	 can	 be	
improved	by	analysing	the	results	of	the	GFMT	(Burton	et	al.	2010)	for	crowd	sizes	of	2,	4,	
8,	 16,	 32	 and	 64	 subjects,	 rather	 than	 looking	 at	 the	 mean	 overall	 response	 at	 an	
individual	 level	 as	 previously	 reported	 (see	 Figure	 2.3).	 It	 is	 important	 to	 note	 that	 the	
authors	 are	 dealing	 with	 the	 exact	 data	 collected	 from	 Burton	 and	 colleagues	 (2010)	
study,	where	mean	performance	for	individual	accuracy	was	89.9%,	it	is	only	the	type	of	
analysis	 which	 has	 changed.	 Crowd	 analysis,	 for	 groups	 as	 small	 as	 four	 people,	
	

99	
	

	
	
outperformed	 mean	 individual	 accuracy.	 Group	 size	 was	 also	 important,	 with	 larger	
groups	providing	more	accurate	results	of	identity	matching	than	when	individual	results	
were	 pooled	 across	 smaller	 groups.	 The	 largest	 group	 size	 of	 64	 led	 to	 performance	
accuracy	 levels	 of	 99.2%.	 Further	 still,	 crowds	 outperformed	 the	 highest	 performing	
individual	when	the	data	was	aggregated	over	a	group	size	of	eight	or	more	(White	et	al.	
2013).		
	
	
	
	
	
	
	
	
	
	
Figure	3.6	Mean	performance	on	items	of	the	GFMT	performance	according	to	different	crowd	sizes	(White	
et	al.	2013).	Graph	shows	performance	accuracy	broken	down	by	trial	type,	with	results	analysed	for	crowd	
sizes	of	1,	2,	4,	8,	16,	32	and	64.				

	
This	 study	 by	 White	 et	 al.	 (2013)	 showed	 for	 the	 first	 time	 that	 crowd	 analysis	 can	
improve	 performance	 on	 face	 identity	 judgments.	 The	 training	 techniques	 discussed	 in	
the	 introduction	 of	 this	 chapter	 focussed	 on	 improving	 performance	 at	 the	 individual	
level	but	this	led	to	no	or	little	success.	Data	analysis	techniques	may	actually	allow	us	to	
improve	 the	 accuracy	 of	 identity	 decisions	 without	 the	 need	 for	 training,	 and	 even	 to	
improve	accuracy	retrospectively,	on	tasks	for	which	data	has	already	been	collected.	This	

	

100	
	

	
	
is	useful	for	new	studies	also,	as	data	collection	methods	would	not	need	to	change	in	any	
way.	 The	 crowd	 analysis	 technique	 has	 proved	 useful	 in	 tasks	 involving	 good	 quality	
images	(e.g.	White	et	al.	2013),	but	has	not	yet	been	applied	to	data	from	tasks	involving	
more	challenging	images.		
	
I	 will	 test	 whether	 performance	 can	 also	 be	 improved	 through	 crowd	 analysis	 for	 my	
pixelated	lookalike	task,	which	includes	very	similar	and	poor	quality	images.	It	is	not	yet	
known	whether	the	crowd	advantage	can	extend	to	help	in	this	very	challenging	case	of	
identity	judgment.	This	is	particularly	interesting	to	investigate,	as	a	more	challenging	task	
will	 allow	 a	 better	 understanding	 of	 the	 advantage	 that	 crowd	 analysis	 can	 hold.	 In	 the	
GFMT,	crowd	analysis	could	only	improve	individual	performance	by	a	maximum	of	10%,	
which	brought	performance	to	ceiling	level	(see	Figure	2.3).	It	is	unknown	whether	crowd	
analysis	could	improve	performance	by	more	than	10%.	Performing	crowd	analysis	on	my	
more	 challenging	 task	 will	 allow	 this	 to	 be	 investigated,	 as	 the	 lower	 baseline	
performance	level	in	this	tasks	allows	far	more	room	for	improvement	than	in	the	GFMT.		
	
Based	on	the	success	of	crowd	analysis	in	previous	situations,	I	predict	that	performance	
accuracy	on	the	pixelated	lookalike	task	will	improve	as	crowd	size	increases.		
	
Method	
Data	
The	dataset	used	in	this	study	was	the	raw	data	from	Experiment	3,	the	coarsely	pixelated	
lookalike	task,	(N	=	30,	M	=	8,	mean	age	=	20.2).		
	
	
	

	

101	
	

	
	
Analysis	
To	 carry	 out	 crowd	 analysis	 I	 calculated	 the	 most	 frequently	 given	 response	 (same	 or	
different)	for	each	item	of	the	coarsely	pixelated	face-matching	task,	Experiment	3,	across	
different	crowd	sizes	(i.e.	subgroups	of	participants)	and	then	used	this	response	as	the	
answer	for	that	item.	Crowd	responses	were	calculated	for	all	items	tested	to	determine	
overall	percentage	accuracy.	Results	were	calculated	across	crowd	sizes	of	1,	3,	5	and	15,	
these	 crowd	 sizes	 were	 selected	 as	 denominators	 of	 30,	 which	 allowed	 the	 creation	 of	
equal	group	sizes	given	that	there	were	30	participants	in	the	study.		
	
Results	&	Discussion	
To	 find	 out	 whether	 mean	 performance	 accuracy	 increased	 with	 increasing	 crowd	 size,	
individual	performance	(crowd	size	1)	was	compared	with	performance	of	crowds	of	3,	5	
and	15	(see	Figure	2.4).	
	

	
	
	
	
		
	
	

	

102	
	

	
	

	
Figure	3.7	Graph	showing	the	mean	accuracy	score	for	crowd	sizes	of	1,	3,	5	and	15	for	the	coarsely	
pixelated	lookalike	task,	Experiment	3.	Error	bars	show	the	standard	error	of	the	mean.		
	

Individuals	(M	=	60.11	%	correct)	were	outperformed	by	each	of	the	crowds,	with	higher	
performance	for	larger	crowds	(crowd	size	3,	M	=	64.16%	correct;	crowd	size	5,	M	=	65	%	
correct;	 crowd	 size	 15,	 M	 =	 67.5	 %	 correct).	 The	 crowds	 were	 made	 up	 of	 the	 same	
people	 who	 contributed	 to	 the	 individual	 performance	 analysis,	 at	 no	 point	 was	 any	
individual	 becoming	 good	 at	 the	 task	 -	 crowd	 scores	 were	 calculated	 after	 independent	
decisions	 had	 already	 been	 made.	 These	 findings	 show	 that	 crowd	 analysis	 did	 improve	
performance	levels,	with	increasing	crowd	size	leading	to	better	performance.	
	
In	 keeping	 with	 previous	 analysis,	 results	 were	 broken	 down	 into	 same	 and	 different	
trials.	 Increasing	 crowd	 size	 increased	 accuracy	 for	 Same	 identity	 trials	 but	 not	 for	

	

103	
	

	
	
Different	identity	trials	(See	Table	3.1).	These	results	mirror	those	of	trial	type	reported	in	
Experiment	3.		
	

1(SE)	

3(SE)	

5(SE)	

15(SE)	

Same	

62.2	(.55)	

66.7(2.85)	

70(3.44)	

73.3(3.33)	

Different	

58(.59)	

61.7(3.8)	

60(2.11)	

57(5)	

Table	3.1	Crowd	analysis	results	broken	down	by	trial	types	(same	identity,	different	identity).		

	
Overall,	 performance	 improved	 through	 crowd	 analysis,	 but	 did	 not	 exceed	 a	 10%	
increase	 in	 performance,	 which	 would	 have	 been	 possible	 given	 the	 task	 difficulty.	 The	
coarsely	 pixelated	 celebrity	 lookalike	 task	 was	 a	 very	 difficult	 task,	 which	 is	 shown	
through	 the	 highest	 group	 performance	 being	 just	 70%	 accuracy.	 It	 is	 possible	 that	
performance	could	be	improved	even	more	if	I	apply	the	crowd	analysis	technique	to	the	
results	 from	 the	 blurred	 pixelated	 faces,	 Experiment	 4.	 I	 have	 shown	 that	 blurring	
improved	performance	on	the	lookalike	task,	possible	that	crowd	analysis	on	the	slightly	
easier,	blurred,	version	may	lead	to	even	better	performance	than	either	crowd	analysis	
applied	to	the	coarsely	pixelated	faces,	or	blurring	the	faces,	alone.			
	
Combining	Improvement	Techniques	(Experiment	5b)	
Blurring	 pixelated	 images	 (Experiment	 4)	 and	 performing	 crowd	 analysis	 to	 pixelated	
images	 (Experiment	 5)	 have	 both	 resulted	 in	 improvements	 in	 performance	 accuracy	
compared	to	performance	on	the	coarsely	pixelated	celebrity	lookalike	task	(Experiment	
3).	 Both	 blurring	 and	 crowd	 analysis	 had	 been	 proved	 successful	 in	 improving	 facematching	 accuracy	 in	 past	 studies,	 yet	 no	 work	 has	 attempted	 to	 combine	 these	
techniques,	i.e.	perform	crowd	analysis	on	the	blurred	pixelated	image	data.		
	
I	predict	that	crowd	analysis	will	have	a	similar	effect	on	the	blurred	pixelated	data	as	it	
did	 on	 the	 coarsely	 pixelated	 data,	 therefore	 increasing	 crowd	 size	 for	 the	 blurred	
pixelation	data	will	lead	to	more	accurate	results.			

	

104	
	

	
	
Method	
Data	
The	dataset	used	in	this	study	was	the	raw	data	from	Experiment	4,	the	coarsely	pixelated	
lookalike	task,	(N	=	30,	M=	6,	mean	age	=	19.7).		
	
Analysis	
Crowd	 analysis	 was	 carried	 out	 on	 data	 from	 the	 blurred	 coarsely	 pixelated	 celebrity	
lookalike	 task,	 Experiment	 4.	 The	 analysis	 procedure	 was	 carried	 out	 in	 the	 exact	 same	
way	as	in	Experiment	4	above.		
	
Results	&	Discussion	
	
To	assess	the	effect	of	crowd	analysis	on	the	blurred	pixelated	images,	crowd	analysis	was	
calculated	 and	 results	 compared	 for	 increasing	 crowds.	 Crowd	 analysis	 improved	 upon	
individual	performance	and	improvements	increased	as	crowd	size	increased.	The	crowd	
size	of	1	(M	=	67.28%	correct)	was	outperformed	by	crowds	of	3	(M		=	72%	correct),	5	(M	
=	 75%	 correct)	 and	 15	 (M	 =	 80%	 correct).	 These	 results	 are	 illustrated	 alongside	 the	
results	of	Experiment	5	in	Figure	3.8.		
	
Crowd	analysis	was	broken	down	by	trial	type	for	the	blurred	images	(Table	3.2).	When	
the	 images	 were	 blurred,	 crowd	 analysis	 improved	 performance	 for	 both	 same	 and	
different	trials.		
	
	
	

	

105	
	

	
	
	

1(SE)	

3(SE)	

5(SE)	

15(SE)	

Same	

64.3(.67)	

68(2.29)	

70.56(2.5)	

73.3(3.33)	

Different	

70.2(.46)	

75(1.87)	

80(2.11)	

87(0)	

Table	3.2	Crowd	analysis	results	broken	down	by	trial	types	(same	identity,	different	identity).		

	
Next,	 results	 of	 blurred	 crowd	 analysis	 were	 compared	 with	 the	 results	 of	 the	 coarsely	
pixelated	experiment	crowd	analysis.		
	

	
Figure	3.8	Graph	showing	the	mean	accuracy	score	for	crowd	sizes	of	1,	3,	5	and	15	for	blurred	version	of	
coarsely	pixelated	lookalike	task	(blue	line)	and	the	coarsely	pixelated	lookalike	task	(black	line).	Error	bars	
show	the	standard	error	of	the	mean.			

	
Performance	 gain	 due	 to	 blurring	 (the	 difference	 between	 the	 black	 and	 blue	 lines	 in	
Figure	2.5)	and	performance	gain	due	to	crowd	effects	(represented	by	the	slope	of	the	

	

106	
	

	
	
lines	 in	 Figure	 2.4)	 are	 additive.	 This	 pattern	 suggests	 that	 the	 two	 gains	 have	
independent	causes	(as	demonstrated	by	the	results	of	Experiments	3	&	4	broken	down	
by	Trial	Type).	It	also	implies	that	both	techniques	can	be	used	in	combination	to	secure	
the	 benefits	 of	 both.	 Indeed,	 using	 both	 techniques	 together	 leads	 to	 a	 large	 (20%)	
accuracy	 gain	 in	 performance,	 from	 60%	 accuracy	 for	 a	 crowd	 size	 of	 1	 in	 the	 coarsely	
pixelated	condition,	to	80%	accuracy	for	a	crowd	size	of	15	in	the	pixelated	and	blurred	
condition.		The	significance	of	this	gain	in	performance	is	apparent	when	compared	with	
the	 gain	 of	 previous	 techniques	 such	 as	 training,	 which	 have	 improved	 performance	
accuracy	 on	 previous	 face-matching	 tasks	 by	 around	 5%.	 Overall,	 crowd	 analysis	 and	
blurring	 has	 taken	 mean	 performance	 of	 individual	 viewers	 from	 60%,	 which	 was	 low	
compared	 to	 both	 unfamiliar	 viewers	 on	 easier	 (unedited)	 no	 pixelation	 versions	 of	 the	
task,	and	compared	to	the	performance	of	familiar	viewers	on	any	of	the	tasks,	up	to	the	
level	 of	 mean	 performance	 accuracy	 level	 of	 viewers	 who	 were	 fairly	 familiar	 with	 the	
faces	 concerned	 in	 the	 easiest,	 no	 pixelated	 version	 of	 the	 task,	 Experiment	 1.	 This	
highlights	 that	 blurring	 and	 crowd	 analysis	 applied	 in	 combination	 greatly	 improving	
performance	for	difficult	images.		
	
My	results	are	important	with	regards	to	methods	for	face	improving	matching	accuracy.	
In	past	research,	large	increases	in	performance	have	been	linked	to	the	use	of	familiar	
viewers	 in	 a	 task	 (e.g.	 Burton	 et	 al.	 1999).	 The	 success	 of	 blurring	 and	 crowd	 analysis	
techniques	 does	 not	 rely	 on	 familiarity	 with	 the	 faces	 in	 the	 matching	 task.	 Crowd	
analysis,	 combined	 with	 blurring	 where	 appropriate,	 could	 thus	 aid	 judgments	 of	 very	
difficult	 identity	 decisions	 and	 provide	 a	 solution	 to	 improving	 face	 recognition	
performance	that	does	not	rely	on	familiarity	with	a	face.		
	

3.5 Experiment	6:	Observer	Factors		
Unfamiliar	 face	 recognition	 ability	 is	 generally	 poor.	 We	 have	 seen	 this	 both	 in	 past	
research	(Burton	et	al.	1999;	Bruce	et	al.	2001;	Kemp	et	al.	1997)	and	in	the	results	of	the	
thesis	 up	 to	 this	 point.	 There	 is	 however	 large	 variation	 at	 an	 individual	 level	 in	 face	

	

107	
	

	
	
recognition	 performance	 –	 some	 people	 consistently	 perform	 with	 very	 high	 levels	 of	
accuracy,	some	with	very	poor	levels	of	accuracy,	and	others	at	intermediate	levels.	This	
was	 seen	 in	 the	 study	 by	 White	 and	 colleagues	 (2014),	 which	 tested	 face-matching	
performance	 for	 passport	 officers.	 It	 was	 reported	 that	 although	 training	 and	 years	 of	
experience	made	no	difference	to	performance,	there	was	a	huge	amount	of	variation	in	
performance	 amongst	 the	 passport	 officers	 themselves	 (White	 et	 al.	 2014a).	 Face	
recognition	is	therefore	generally	thought	of	as	an	ability	that	lies	on	a	spectrum.	At	the	
extreme	 lowest	 end	 of	 the	 face	 recognition	 ability	 spectrum	 are	 people	 with	 congenital	
prosopagnosia	 (Duchaine,	 2011).	 Congenital	 prosopagnosics	 have	 no	 identified	 brain	
deficits,	yet	experience	clinical	level	of	difficulty	with	recognising	faces	(Berham	&	Avidan,	
2005;	 Duchaine	 &	 Nakayama,	 2006).	 At	 the	 other	 end	 of	 the	 spectrum	 are	 people	 who	
consistently	 perform	 with	 exceptionally	 high	 accuracy.	 These	 people	 have	 been	 termed	
‘super-recognisers’	 (Russell,	 Duchaine	 &	 Nakayama,	 2009).	 Super-recognisers	 may	
provide	a	solution	to	the	face	recognition	problem	in	practical	settings.	In	theory,	superrecognisers	 could	 help	 to	 eliminate	 the	 number	 of	 face-matching	 errors	 made,	 in	
situations	 where	 face-matching	 is	 of	 high	 importance.	 If	 super-recognisers	 truly	 make	
fewer	errors	than	others,	which	is	what	these	past	studies	have	shown	(e.g.	Russell	et	al.	
2009),	 then	 employers	 should	 consider	 testing	 candidate’s	 face	 recognition	 ability,	 and	
making	super-recogniser	performance	a	requirement	for	the	job	roles	for	which	accurate	
face	identification	carries	high	importance.		
		
Members	 of	 a	 highly	 specialised	 expert	 forensic	 group	 in	 the	 USA	 have	 however	 been	
found	 to	 outperform	 controls	 on	 tasks	 involving	 face	 recognition.	 The	 ability	 of	 these	
group	 members	 has	 only	 recently	 been	 identified.	 Indeed,	 White	 et	 al.	 (2015b)	 showed	
that	 forensic	 examiners	 in	 the	 USA	 performed	 more	 accurately	 in	 three	 face-matching	
tasks	than	control	groups	of	trained	experts	in	biometric	systems	(referred	to	in	the	study	
as	 ‘controls’)	 and	 also	 undergraduate	 students	 (referred	 to	 as	 ‘students’).	 The	 trained	
experts	 in	 biometric	 systems	 were	 highly	 motivated	 with	 the	 task,	 suggesting	 that	
differences	in	performance	were	not	due	to	differences	in	motivation.	The	face-matching	
tasks	 used	 were	 the	 GFMT	 and	 two	 new	 tests,	 one	 which	 was	 specially	 designed	 to	 be	
challenging	to	both	computers	and	humans	[EFCT]	and	the	other	made	of	stimuli	which	
	

108	
	

	
	
contained	cues	that	would	be	of	use	to	humans	only	[PICT].	It	is	unclear	so	far	whether	
people	 gravitate	 towards	 the	 job	 of	 forensic	 examiner	 as	 a	 result	 of	 innate	 face	
recognition	 ability,	 or	 whether	 the	 extremely	 specialised	 training	 helps	 these	 forensic	
officers	 to	 perform	 highly	 on	 the	 face	 recognition	 tasks.	 As	 highlighted	 earlier	 in	 this	
chapter,	 training	 has	 a	 poor	 track	 record	 for	 improving	 face	 identification	 performance	
(Woodhead	et	al.,	1979;	Towler	et	al.,	2014).	The	lack	of	success	of	previously	evaluated	
training	methods	would	incline	one	to	believe	that	an	innate	ability	for	face	recognition	is	
key.	 It	 seems	 from	 the	 findings	 that	 the	 forensic	 examiners	 were	 using	 different	
techniques	 to	 the	 comparison	 groups	 to	 make	 their	 identity	 judgment.	 This	 could	 be	 a	
result	 of	 innate	 ability	 or	 training.	 It	 would	 be	 interesting	 to	 find	 out	 exactly	 what	 the	
training	 for	 these	 forensic	 examiners	 involved,	 and	 this	 knowledge	 may	 help	 to	 answer	
whether	the	examiners’	face	recognition	ability	is	innate	or	learnt.		
	
Additionally,	further	testing	and	analysis	of	the	data	collected	from	White	and	colleagues’	
(2014)	study	on	passport	officers’	face-matching	performance,	revealed	that	a	subset	of	
the	 passport	 officers	 were	 identified	 as	 particularly	 high	 performers.	 These	 highly	
performing	 officers	 were	 in	 a	 specific	 branch	 of	 the	 passport	 office,	 in	 a	 job	 role	 where	
their	 ability	 to	 successfully	 match	 true	 match	 faces	 and	 identify	 false	 matches	 was	 of	
paramount	importance.	Thus,	high	face	recognition	ability	was	even	more	important	for	
their	job	role	than	for	the	roles	held	by	other	passport	officers	tested	(White	et	al.	2015a).		
It	remains	unknown	whether	the	officers	came	to	their	specific	role	due	to	being	naturally	
best	suited,	or	if	they	learnt	the	skills	required	on	the	job.	This	has	not	been	investigated	
directly	 to	 date.	 As	 years	 of	 training	 was	 recorded,	 and	 found	 to	 have	 no	 effect	 on	
performance	 accuracy,	 it	 may	 be	 more	 likely	 that	 innate	 ability	 guided	 these	 passport	
officers	 into	 their	 specialised	 job,	 where	 face	 accurate	 face	 recognition	 is	 extremely	
important.	 As	 in	 the	 case	 of	 the	 US	 forensic	 examiners,	 more	 information	 about	 the	
training	programmes	and	ideally	also	new	data	that	specifically	tracks	only	the	members	
of	these	highly	specialised	expert	groups	over	the	course	of	training	and	job	experience,	
could	 help	 to	 answer	 whether	 ability	 has	 been	 innate,	 learnt	 or	 perhaps	 even	 a	
combination	of	both.	

	

109	
	

	
	
The	 performance	 of	 super-recognisers	 had	 until	 recently	 not	 been	 tested	 and	 directly	
compared	 to	 controls	 by	 standardised	 tests.	 Instead,	 super-recognisers	 were	 identified	
from	large	groups	of	people	tested	on	standardised	tests,	as	those	with	the	very	highest	
scores.	 Bobak,	 Dowsett	 &	 Bate	 (2016)	 specifically	 tested	 the	 performance	 of	 self	
identified	super-recognisers	to	find	out	whether	they	were	better	than	a	student	control	
group	at	two	standardised	tests	-	the	GFMT	and	a	face-matching	task	involving	models.	
The	 models	 task	 is	 challenging,	 as	 the	 same	 models	 can	 look	 very	 different	 across	
different	 images.	 All	 of	 the	 model	 faces	 were	 unfamiliar	 to	 the	 viewers,	 therefore	 this	
study	 tested	 purely	 unfamiliar	 face	 recognition	 performance	 of	 super-recognisers.	 The	
authors	reported	that	their	group	of	7	self-identified	super-recognisers	were	better	than	
a	 student	 control	 group	 at	 the	 two	 tasks.	 This	 finding	 suggests	 that	 people	 who	 have	
exceptional	 face-recognition	 ability	 are	 in	 fact	 aware	 of	 this	 ability,	 as	 their	 superior	
performance	to	others	was	evident	when	tested	on	standardised	face	identification	tasks.		
	
Super-Recognisers	within	the	MET	Police	
The	 evidence	 provided	 overwhelmingly	 supports	 the	 use	 of	 super-recognisers	 as	 a	
possible	 solution	 to	 the	 face	 identity	 problem.	 It	 therefore	 seems	 sensible	 that	 face	
identity	professionals	consider	employing	super-recognisers.	This	advice	has	been	taken	
by	The	Metropolitan	Police	who	have	recently	established	a	super	recogniser	team	within	
their	 force.	 This	 has	 been	 created	 by	 internally	 recruiting	 officers	 who	 have	 particular	
interest	in	face	recognition	and	performed	well	in	their	undisclosed	face	recognition	test.	
These	officers	regularly	try	to	identify	faces	from	pixelated	CCTV	photographs.	During	the	
establishment	 of	 the	 super-recogniser	 team	 I	 was	 given	 the	 opportunity	 to	 test	 four	 of	
these	MET	police	super-recognisers	in	their	normal	working	environment,	to	see	whether	
they	 outperformed	 our	 undergraduates	 on	 the	 lookalike	 task.	 The	 lookalike	 test	 is	
particularly	relevant	for	measuring	face	recognition	ability	of	the	police	super-recognisers	
as	 it	 tests	 performance	 of	 faces	 of	 differing	 levels	 of	 familiarity	 –	 from	 unfamiliar	 to	
extremely	familiar.	The	police	have	to	deal	with	cases	from	both	levels	of	this	spectrum	
depending	on	whether	the	individual	involved	in	the	crime	is	a	known	repeat	offender	or	
committing	their	first	crime.	Due	to	time	constraints	I	could	test	performance	on	just	one	

	

110	
	

	
	
version	of	the	test.	I	wanted	to	test	performance	on	one	of	the	pixelated	lookalike	tasks,	
as	Chapter	1	demonstrated	that	even	familiar	undergraduate	viewers	do	not	score	with	
perfect	performance	on	these	tasks.	Using	pixelated	images	would	therefore	avoid	ceiling	
effect	for	matching	trials.	Pixelated	images	are	also	an	image	type	that	the	police	have	to	
deal	with	as	part	of	their	investigations,	meaning	that	the	task	holds	practical	relevance.	
The	mid	pixelation	level	(30x45	pixels)	was	chosen	so	that	predicted	performance	would	
be	 at	 a	 level	 expected	 to	 be	 above	 chance	 as	 based	 on	 previous	 findings	 (the	 higher	
pixelation	level	reduced	performance	to	chance	for	all	but	extremely	familiar	faces	in	our	
experiment	 on	 undergraduate	 students),	 this	 also	 guaranteed	 that	 I	 had	 an	 already	
existing	comparison	group	whose	performance	was	above	floor	(>50%	accuracy).	
	
Participants	
Participants	were	4	super-recognisers	(M=4,	mean	age	=	40)	from	the	Metropolitan	police	
force	 super-recogniser	 team,	 New	 Scotland	 Yard	 Central	 Forensic	 Image	 Unit,	 London.	
Performance	 was	 compared	 with	 that	 of	 our	 30	 undergraduate	 students	 tested	 in	
Experiment	2	(M	=	11	male,	mean	age	=	19.7).	
	
Design	&	Stimuli		
The	stimuli	were	images	of	the	30	celebrities	and	lookalikes	for	these	celebrities	used	in	
Experiment	 2.	 As	 the	 comparison	 viewers	 were	 generally	 younger	 than	 the	 superrecognisers,	 I	 took	 care	 to	 ensure	 that	 any	 differences	 in	 performance	 could	 not	 arise	
through	some	celebrities	being	more	familiar	to	one	group	or	the	other.		
	
As	in	Experiment	2,	all	images	were	presented	in	low	resolution	(30	pixels	wide	x	45	pixels	
high).	This	resampling	also	served	the	purpose	of	reducing	matching	images	to	the	level	
that	we	expect	to	avoid	ceiling	effects	in	the	matching	trials.	Examples	of	the	stimuli	are	
shown	in	Figure	3.9.		

	

111	
	

	
	

	
Figure	3.9	Example	trials	from	the	PLT.	Images	on	the	left	show	different	identities	(with	the	imposter	face	
on	the	right).	Images	on	the	right	show	the	same	identity.		

	
Procedure	
The	 procedure	 was	 similar	 to	 that	 used	 in	 Experiment	 2.	 The	 police	 super-recognisers	
viewed	 the	 images	 in	 a	 printed	 booklet	 rather	 than	 on	 a	 computer	 screen.	 Superrecognisers	 were	 presented	 with	 a	 printed	 booklet	 containing	 60	 trials,	 half	 of	 which	
showed	the	same	identity	and	half	of	which	showed	different	identities.	They	were	asked	
to	 make	 a	 same/different	 judgement	 for	 each	 trial.	 Following	 the	 face-matching	 task	
participants	used	a	numerical	scale	to	indicate	their	level	of	familiarity	with	each	celebrity	
whose	face	had	been	viewed	in	the	task	(from	1	[completely	unfamiliar]	to	10	[extremely	
familiar]).		
	
Police	 super-recognisers	 were	 also	 tested	 on	 the	 GFMT	 and	 the	 models	 face-matching	
task	 (MFMT)	 which	 were	 designed	 and	 administered	 by	 other	 researchers	 from	 the	
University	of	York	Facelab.		
	
	
	

	

112	
	

	
	
Results	

	
Figure	 3.10	 Performance	 of	 police	 super-recognisers	 and	 comparison	 viewers.	 Performance	 of	 super-	
recognisers	(SR1–4;	black)	and	comparison	viewers	(white)	on	three	different	tests	of	face	recognition—the	
GFMT	 (left	 column),	 the	 MFMT	 (middle	 column),	 and	 the	 PLT	 (right	 column).	 Vertical	 lines	 indicate	 the	
range	of	scores	for	comparison	groups,	the	deleted	portion	of	the	line	shows	the	standard	deviation,	and	
the	horizontal	notch	shows	the	mean.	In	all	three	tasks,	chance	performance	is	50%.		
	

In	 Experiment	 2,	 results	 were	 broken	 down	 according	 to	 5	 levels	 of	 familiarity,	 with	
results	 analysed	 according	 to	 these	 familiarity	 quintiles.	 Due	 to	 the	 small	 number	 of	
super-recognisers	 that	 I	 was	 able	 to	 test,	 this	 time	 I	 broke	 familiarity	 down	 into	 two	
groups	 (highly	 familiar	 [80-100	 on	 familiarity	 scale]	 and	 less	 familiar	 faces	 [all	 other	
ratings])	to	ensure	sufficient	numbers	of	faces	were	placed	in	each	familiarity	bin.		
	
Analysis	 was	 broken	 down	 according	 to	 the	 two	 familiarity	 bands,	 with	 the	 first	 tests	
being	performed	on	the	subset	of	faces	that	viewers	rated	as	highly	familiar.	Interestingly,	
super-recognisers	gave	high	familiarity	ratings	(80–100)	to	a	very	high	proportion	of	faces	
compared	 with	 controls	 (70%	 of	 faces	 for	 super-recognisers;	 37%	 of	 faces	 for	 controls).	

	

113	
	

	
	
The	 police	 super-	 recognisers	 consistently	 performed	 with	 far	 greater	 accuracy	 than	
student	 participants.	 Overall,	 for	 highly	 familiar	 faces	 the	 controls	 scored	 just	 76%	
correct,	whereas	the	police	super-recognisers	scored	with	93%	accuracy	in	the	lookalike	
test.	 All	 super-recognisers	 performed	 much	 better	 than	 the	 control	 mean,	 with	 one	
super-recogniser	performing	perfectly.	This	performance	is	shown	in	Figure	3.10	(for	full	
results	of	the	GFMT	and	MFMT	see	Robertson,	Noyes,	Dowsett,	Jenkins	&	Burton,	2016).		
	
To	better	understand	the	super-recogniser	advantage,	I	conducted	two	further	analyses.	
First,	I	compared	accuracy	of	super-recognisers	and	controls	on	faces	that	they	rated	as	
less	 familiar	 (0–7	 on	 the	 10-point	 scale;	 i.e.	 those	 not	 included	 in	 the	 above	 analysis).	
Police	 super-recognisers	 outperformed	 controls	 on	 these	 faces	 too	 (76%	 accuracy	 for	
super-recognisers;	66%	accuracy	for	controls),	implying	that	their	performance	advantage	
holds	across	the	whole	familiarity	continuum.	Second,	I	analysed	the	control	participants’	
data	for	an	association	between	i)	the	proportion	of	faces	that	were	given	high	familiarity	
ratings	 and	 ii)	 the	 level	 of	 accuracy	 on	 those	 highly	 familiar	 faces.	 A	 significant	 positive	
correlation	was	found	between	these	two	measures	[r(28)	=	0.39,	p	<	.05],	such	that	the	
highest	 performing	 controls	 were	 qualitatively	 similar	 to	 the	 super-recognisers.	 These	
analyses	 support	 the	 original	 comparison,	 i.e.	 the	 super-recognisers	 are	 performing	 at	
well	above	the	levels	of	controls,	even	when	group	differences	in	famous	face	familiarity	
are	taken	into	account.		
	
Result	 breakdown	 by	 trial	 type	 demonstrated	 that	 SRs	 outperformed	 controls	 at	 both	
same	and	different	identity	pairs,	for	both	the	low	and	high	familiarity	groups	(See	Table	
3.3).			
	
	
	

	

114	
	

	
	
	
	

%	Accuracy	

%	Accuracy	

Low	Familiarity	(SE)	

High	Familiarity	(SE)	

	

	

Overall	

75.72	(8.52)	

93	(5.12)	

Same	

83.65	(11.78)	

95	(2.84)	

Different	

67.79	(19.09)	

90	(3.14)	

Controls	

	

	

SRs	

Overall	

	

66.4	(1.73)	

	

76.36	(2.81)	

Same	

68.98	(2.17)	

81.34	(3.61)	

Different	

64.10	(2.97)	

70.18	(3.54)	

Table	3.3	Performance	accuracy	broken	down	by	viewer	group	and	trial	type.		
	

Crowd	Analysis	on	Super-Recogniser	Data	
Earlier	 in	 this	 chapter	 I	 demonstrated	 that	 crowd	 analysis	 improved	 performance	
accuracy	 on	 the	 coarsely	 pixelated	 face	 matching	 task.	 I	 also	 confirmed	 that	 combining	
methods	 of	 improvements	 had	 additive	 benefits	 on	 performance	 accuracy.	 To	 continue	
the	 investigation	 of	 the	 effect	 combining	 improvement	 methods,	 I	 performed	 crowd	
analysis	on	the	super-recogniser	data.	This	analysis	incorporated	all	items,	regardless	of	
familiarity	 with	 the	 item	 (as	 in	 the	 crowd	 analysis	 earlier,	 Experiment	 5).	 Mean	
performance	 accuracy	 for	 crowd	 size	 of	 1	 (individual	 analysis)	 was	 80%.	 Crowd	 analysis	
improved	performance,	with	mean	performance	accuracy	levels	of	88%	for	a	crowd	size	
of	4	(see	Table	3.5).	This	performance	cannot	be	directly	compared	to	the	crowd	analysis	
in	 Experiment	 5,	 as	 the	 police	 super-recognisers	 completed	 the	 mid	 level	 of	 pixelation	
task	rather	than	the	coarsely	pixelated	version.	Instead	new	analysis	of	Experiment	2	(the	
mid	 pixelation	 level	 face	 matching	 task)	 was	 conducted	 using	 crowd	 analysis.	 Crowd	
results	 for	 the	 mid	 pixelation	 group,	 Experiment	 2,	 are	 shown	 in	 Table	 3.4.	 Increasing	
crowd	size	increased	performance	accuracy,	from	62%	for	a	crowd	size	of	1,	to	83%	for	a	

	

115	
	

	
	
crowd	 size	 of	 15.	 These	 results	 highlight	 the	 exceptional	 face	 recognition	 ability	 of	 the	
super-recognisers,	 as	 a	 crowd	 size	 of	 just	 4	 super-recognisers	 outperformed	 the	 largest	
crowd	size	of	15	controls	(Table	3.5).		
	

Crowd	Size	

1(SE)	

3(SE)	

5(SE)	

15(SE)	

Same	

66.33	(2.14)	

81.33(3.56)	

83.88	(3.26)	

88.3(5)	

Different	

58.77(2.1)	

69	(5.24)	

72.2(6.81)	

78.3(1.67)	

Table	3.4	Crowd	analysis	results	broken	down	by	trial	types	(same	identity,	different	identity)	for	the	mid	
pixelation	(control)	Experiment	2.		
	

Crowd	Size	SRs	

1(SE)	

4	

Same	

86.66	(3.85)	

100	

Different	

74.12(7.67)	

76.67	

Table	3.5	Crowd	analysis	results	broken	down	by	trial	types	(same	identity,	different	identity)	for	the	SRs.	

Discussion		
This	is	the	first	time	that	police	super-recognisers’	performance	has	been	assessed	using	
standardised	tests.	Superior	performance	of	super-recognisers	in	the	pixelated	lookalike	
task	suggests	two	important	conclusions.	First,	the	testing	systems	used	by	the	police	to	
recruit	super-recognisers	are	indeed	successful	in	selecting	people	with	exceptional	face	
recognition	ability.	Second,	these	super-recognisers’	ability	is	not	limited	to	good	quality	
images;	 super-recognisers	 are	 superior	 to	 others	 in	 face	 recognition	 ability	 even	 in	
extremely	 image	 conditions	 involving	 extreme	 similarity	 between	 target	 faces	 and	 foils.	
The	 performance	 of	 super-recognisers	 can	 be	 improved	 even	 further	 through	 crowd	
analysis.	 The	 largest	 crowd	 size	 of	 4	 super-recognisers	 was	 relatively	 small,	 but	
outperformed	even	the	largest	crowd	size	of	30	control	participants.		

	

116	
	

	
	

3.6 General	Discussion	
I	have	shown	in	this	chapter	that	the	poor	performance	found	for	the	highly	challenging	
pixelated	 stimuli	 images	 can	 be	 improved	 upon	 through	 use	 of	 a	 variety	 of	 techniques.	
Performance	 accuracy	 was	 improved	 by	 blurring	 pixelated	 images,	 combining	 results	
using	 the	 crowd	 analysis	 technique,	 and	 using	 the	 results	 of	 super-recognisers.	 Blurring	
and	 crowd	 analysis,	 and	 also	 super-recognisers	 and	 crowd	 analysis	 can	 be	 used	 in	
combination	for	additional	benefit.	
	
Each	of	the	techniques	that	I	investigated	had	shown	promise	in	improving	performance	
in	previous	tasks	of	face	recognition.	However,	none	of	the	techniques	had	been	applied	
to	 such	 a	 challenging	 identity	 matching	 scenario.	 Blurring	 had	 been	 studied	 in	 terms	 of	
face	recognition	rather	than	identity	matching	(Harmon	&	Julesz,	1973).	The	only	study	of	
face	 recognition	 to	 incorporate	 the	 crowd	 analysis	 technique	 analysed	 results	 from	 a	
cooperative	 face-matching	 task	 involving	 good	 quality	 images	 (the	 GFMT).	 Superrecogniser	 performance	 has	 also	 been	 studied	 in	 terms	 of	 good	 quality	 images,	 for	
unfamiliar	faces	only	(Bobak	et	al.,	2016).	Past	studies	have	also	looked	at	each	of	these	
techniques	 exclusively	 for	 unfamiliar	 faces.	 My	 findings	 support	 the	 body	 of	 existing	
research,	 adding	 that	 the	 benefits	 of	 all	 of	 these	 techniques	 extend	 to	 very	 challenging	
pixelated	image	conditions	regardless	of	familiarity	with	a	face.	I	also	examined	the	effect	
of	 combining	 techniques	 in	 the	 combination	 of	 blurring	 combined	 with	 crowd	 analysis,	
and	 super-recognisers	 combined	 with	 crowd	 analysis.	 As	 far	 as	 I	 know,	 these	 different	
techniques	have	never	before	been	applied	in	combination.	Combining	the	techniques	led	
to	even	further	improvements	than	using	any	of	the	techniques	in	isolation.	This	suggests	
that	each	of	the	techniques	improve	performance	in	a	way	that	is	different	from	each	of	
the	other	methods.	Blurring	reduced	high	spatial	frequency	noise	from	a	pixelated	image,	
crowd	 analysis	 relies	 on	 the	 performance	 of	 the	 crowd	 being	 more	 accurate	 than	 the	
individual	and	super-recognisers	perform	highly,	but	the	methods	that	super-recognisers	
use	 are	 unknown	 and	 may	 link	 to	 innate	 ability.	 As	 each	 of	 these	 methods	 aid	 the	 face	

	

117	
	

	
	
recognition	problem	in	a	different	way,	when	methods	are	combined,	results	reflect	the	
additive	benefit	of	each.	
Many	of	the	strengths	of	the	results	of	this	chapter	carry	over	from	the	challenging	nature	
of	 the	 stimuli	 set	 created.	 To	 reiterate	 Chapter	 1,	 the	 image	 pairs	 in	 the	 face-matching	
task	consisted	of	true	match	(two	different	images	of	the	same	celebrity)	and	false	match	
(one	image	of	a	celebrity	and	one	of	a	lookalike	for	that	celebrity).	The	stimuli	used	in	this	
chapter	 are	 pixelated,	 making	 a	 task	 that	 addressed	 a	 difficult	 version	 of	 a	 practical	
problem.	This	design	allowed	us	to	test	the	improvements	to	face	recognition	in	a	more	
challenging	 situation	 than	 ever	 before.	 In	 the	 past,	 ceiling	 effects	 had	 haltered	
improvements.	The	more	difficult	task	that	I	created	allowed	the	effects	of	combinations	
of	 techniques	 to	 be	 recognised,	 as	 base	 level	 performance	 was	 much	 poorer	 than	 on	
previous	 tasks,	 the	 improvements	 made	 by	 techniques	 used	 alone	 and	 in	 combination	
could	be	measured	as	even	combined	techniques	did	not	bring	performance	to	ceiling	on	
the	pixelated	celebrity	lookalike	face	matching	task.		
	
I	 demonstrated	 that	 combining	 techniques	 helped	 to	 improve	 results,	 even	 more	 than	
when	 any	 one	 technique	 was	 used	 in	 isolation.	 However	 due	 to	 the	 limited	 testing	
opportunity	with	the	super-recognisers,	I	do	not	have	results	for	the	super-recognisers	on	
the	 blurred	 pixelation	 task,	 and	 consequently	 could	 not	 perform	 crowd	 analysis	 on	
blurred	 pixelation	 data	 for	 super-recognisers.	 To	 more	 fully	 understand	 the	 effect	 of	
combining	the	three	improvement	methods	I	investigated,	it	would	be	interesting	to	test	
whether	performance	could	be	improved	even	more	through	all	three	techniques	applied	
in	 all	 combinations	 i.e.	 testing	 the	 SRs	 on	 the	 blurred	 version	 of	 the	 task	 and	 then	 in	
addition	to	this	applying	crowd	analysis	to	the	results.		
	
Each	of	these	techniques,	and	the	combinations	addressed	in	this	chapter	could	be	used	
in	 applied	 scenarios	 to	 achieve	 more	 accurate	 face	 identity	 decisions.	 Blurring,	 crowd	
analysis	 and	 the	 performance	 of	 super-recognisers	 provide	 ways	 of	 improving	
performance	where	past	methods	of	training	have	failed.	These	techniques	are	far	easier	

	

118	
	

	
	
and	less	time	consuming	to	implicate	than	training,	furthermore	crowd	analysis	has	the	
unique	 advantage	 of	 being	 applicable	 to	 improve	 performance	 levels	 using	 pre-existing	
data.	 These	 methods	 improve	 face-matching	 performance	 without	 relying	 on	 familiarity	
with	 the	 faces	 that	 are	 available	 for	 comparison,	 making	 them	 particularly	 useful	 in	
applied	 situations	 where	 familiarity	 with	 the	 faces	 concerned	 is	 not	 always	 a	 viable	
option.			
	
Up	until	now,	this	thesis	has	focused	on	face	recognition	for	challenging	stimuli	based	on	
incidental	 differences	 in	 appearance	 across	 multiple	 images	 of	 a	 face	 and	 incidental	
similarity	in	appearance	between	different	person	trial	images.	This	has	shown	that	whilst	
it	is	possible	to	incidentally	look	like	somebody	else,	it	is	also	possible	for	images	of	the	
same	person	to	incidentally	look	like	different	identities	across	images	(e.g.	Jenkins	et	al.	
2011).	All	images	investigated	so	far	have	been	ambient	in	nature,	however	there	could	
be	 particular	 images	 changes	 which	 cause	 these	 situations	 to	 occur.	 In	 particular,	 the	
distance	from	which	an	image	is	taken	from	may	influence	the	appearance	of	a	face,	and	
in	 turn	 influence	 the	 perceived	 identity.	 Chapter	 4	 will	 explore	 this	 effect	 of	 changing	
camera-to-subject	distance,	looking	at	both	the	effect	on	the	configural	information	in	a	
face	 as	 portrayed	 in	 an	 image,	 and	 whether	 any	 changes	 of	 configuration	 effect	 face	
matching	ability	for	unfamiliar	and	for	familiar	viewers.			
	

	

	

119	
	

	
	

Chapter	4	–	Changing	Camera-to-Subject	Distance	
4.1 Chapter	Summary	
The	 experiments	 in	 this	 chapter	 investigated	 effects	 of	 camera-to-subject	 distance	 on	
configural	 properties	 of	 the	 face	 image.	 Changes	 in	 camera-to-subject	 distance	 produce	
non-linear	changes	in	face	measurements	across	images.	These	changes	reduced	accuracy	
in	 an	 unfamiliar	 face-matching	 task	 by	 making	 same	 identity	 images	 look	 less	 like	
themselves.	 Identity	 matching	 performance	 was	 much	 poorer	 when	 unfamiliar	 viewers	
compared	photographs	taken	from	differing	distances,	than	when	the	comparison	images	
were	taken	at	the	same	distance.	Familiar	viewers	were	far	less	affected	by	this	distance	
change	 and	 performed	 at	 very	 high	 accuracy	 levels	 in	 both	 conditions.	 Distance	 cues	
compensate	camera-to-subject	change,	suggesting	the	operation	of	perceptual	constancy	
mechanisms	in	the	high	level	visual	domain	of	face	shape.			
	

4.2 Introduction	
The	earlier	chapters	of	this	thesis,	in	line	with	previous	research,	have	shown	that	facematching	performance	is	generally	poor	for	unfamiliar	viewers.	Challenging	images	in	the	
foregoing	 experiments	 (Chapters	 2	 &	 3)	 resulted	 in	 even	 poorer	 performance	 than	
previous	experiments	based	on	cooperative	stimuli.	So	far	in	this	thesis	I	have	focussed	
on	challenging	image	performance	for	ambient	face	images.	These	images	help	to	capture	
natural	variability	in	the	appearance	of	any	given	face.	For	example,	facial	expression	and	
pose	may	change	across	images,	and	the	environment	that	face	photographs	are	taken	in,	
can	differ.	The	effects	of	such	factors	have	previously	been	examined	in	isolation,	and	it	is	
well	reported	that	these	superficial	image	changes	can	result	in	impaired	performance	on	
tasks	 involving	 identity	 judgment	 (Bruce,	 1982,	 1994;	 Johnston,	 Hill,	 &	 Carman,	 1992;	
Troje	&	Bulthoff,	1996;	O’Toole,	Edelman	&	Bulthoff,	1998;	Bruce	et	al.	1999)	However,	
one	 interesting	 and	 potentially	 important	 factor	 that	 has	 received	 little	 attention	 is	 the	
effect	of	camera-to-subject	distance	on	identity	judgments.			
	
	

120	
	

	
	
A	 few	 isolated	 findings	 suggest	 that	 changes	 in	 camera-to-subject	 distance	 may	 well	
affect	 identification	 accuracy.	 In	 an	 interesting	 paper	 on	 optics,	 Harper	 &	 Latto	 (2001)	
photographed	five	models	(two	male,	three	female)	from	five	different	distances	(0.32m,	
0.45m,	 0.71m,	 1.32m	 &	 2.70m)	 and	 standardised	 the	 size	 of	 the	 resulting	 images	 (see	
Figure	 4.1).	 Face	 images	 taken	 further	 away	 were	 visibly	 flatter,	 giving	 the	 models	 a	
heavier	 appearance	 and	 the	 implication	 is	 that	 the	 faces	 appeared	 to	 have	 different	
shapes	 in	 these	 different	 distance	 conditions.	 Participants	 gave	 higher	 estimates	 of	 the	
models’	weight	as	camera-to-subject	distance	increased.	In	a	later	study	(Bryan,	Perona	&	
Adolphs,	 2012)	 ratings	 of	 trustworthiness,	 competence,	 and	 attractiveness	 were	 lower	
when	the	camera-to-subject	distance	was	reduced	(specifically	when	the	photographs	are	
taken	 within	 personal	 space).	 Taken	 together,	 these	 findings	 suggest	 that	 facial	
appearance	changes	as	a	result	of	camera-to-subject	distance,	yet	no	studies	have	looked	
at	how	these	changes	affect	performance	in	tasks	involving	identity	matching	judgments.	
This	 omission	 is	 perhaps	 surprising,	 given	 the	 emphasis	 on	 configuration	 in	 the	 face	
recognition	 literature.	 The	 configural	 processing	 account	 holds	 that	 each	 face	 has	 a	
unique	 configuration	 that	 is	 learnt,	 and	 that	 it	 is	 knowledge	 of	 this	 configuration	 that	
allows	us	to	tell	familiar	faces	apart.	This	position	seems	to	require	that	the	configuration	
of	a	particular	face	stays	constant	across	images,	but	the	findings	of	Harper	&	Latto	(2001)	
and	Bryan	et	al.	(2012)	suggest	that	it	does	not.	Given	that	camera-to-subject	distance	is	
rarely	 kept	 constant	 in	 practical	 applications	 of	 identity	 matching,	 effects	 of	 camera-tosubject	 distance	 on	 face	 identification	 would	 also	 be	 interesting	 from	 an	 applied	
perspective.		
	
	
	
	
	
Figure	4.1	Changes	in	face	shape	resulting	in	differing	weight	judgments	as	photographs	were	taken	from	
far	distance	(left)	to	near	distance	(right),	example	taken	from	Harper	&	Latto	(2001).		

	

121	
	

	
	
In	 a	 compelling	 illustration	 of	 this	 Burton	 et	 al.	 (2015)	 assessed	 the	 stability	 of	 facial	
configurations	 by	 measuring	 distances	 between	 features	 in	 multiple	 images	 of	 three	
famous	 politicians	 (David	 Cameron,	 Barack	 Obama	 and	 Angela	 Merkel),	 captured	 from	
unknown,	 but	 presumably	 different	 distances	 (see	 Figure	 4.2	 for	 example).	 They	 found	
that	distances	between	features	varied	as	much	between	photos	of	the	same	person	as	
between	 photos	 of	 different	 people.	 This	 observation	 seems	 to	 challenge	 a	
straightforward	 configural	 processing	 account	 of	 face	 recognition	 and	 face	 learning.	
However	 as	 the	 camera-to-subject	 distance	 for	 these	 examples	 was	 unknown,	 a	 more	
formal	 assessment	 of	 its	 effects	 was	 not	 possible.	 Moreover,	 as	 camera-to-subject	
distance	was	not	the	exclusive	focus	of	that	study,	variation	in	other	factors	such	as	pose	
could	also	have	affected	the	configural	measurements.		
	
	
	
	
	
	
	
	
Figure	4.2	Example	of	measurement	figure	taken	from	Burton	et	al.	(2015).	Images	are	standardised	so	that	
interocular	distance	is	the	same.	Metric	distances	are	expressed	as	proportions	of	standardised	interocular	
distance.		

	

	

122	
	

	
	
It	 is	 important	 to	 note	 that	 the	 politicians	 were	 easily	 identifiable	 to	 familiar	 viewers	 in	
any	of	the	face	images,	despite	the	fact	that	these	images	varied	in	configuration.	It	is	a	
well	 established	 finding	 that	 unfamiliar	 face	 recognition	 can	 be	 derailed	 by	 superficial	
image	changes,	whereas	familiar	face	recognition	is	robust	against	such	changes	(Burton	
et	al.	1999;	Jenkins	et	al.	2011).	This	contrast	has	been	linked	to	familiar	viewers	having	
more	 perceptual	 experience	 with	 the	 range	 of	 appearances	 that	 the	 face	 can	 take	
(Jenkins	et	al.	2011).	Assuming	that	familiarity	interacts	with	distance-related	changes	in	a	
similar	fashion,	it	seems	likely	that	these	too	will	impair	performance	more	for	unfamiliar	
viewers	than	familiar	viewers.		
	
Although	no	previous	studies	have	examined	effects	of	camera-to-subject	distance	in	an	
identity	matching	task,	a	study	by	Liu	(2003)	investigated	the	effect	of	camera-to-subject	
distance	 on	 face	 recognition	 accuracy	 in	 a	 memory	 task.	 A	 face	 image	 of	 each	 identity	
used	 as	 stimuli	 in	 this	 experiment	 was	 edited	 using	 Matlab	 software	 to	 create	 two	
versions	 of	 each	 face	 image	 –	 one	 which	 reflected	 the	 appearance	 that	 the	 face	 would	
take	 at	 a	 far	 distance,	 and	 one	 that	 reflected	 the	 appearance	 of	 the	 face	 from	 a	 near	
distance.	 Participants	 viewed	 face	 images	 in	 a	 learning	 phase,	 and	 at	 test	 were	 shown	
either	 the	 identical	 image,	 or	 the	 image	 altered	 to	 reflect	 the	 distance	 which	 differed	
from	 the	 test	 image.	 Liu	 (2003)	 showed	 that	 the	 same	 image	 of	 a	 face	 is	 harder	 to	
recognise	 as	 having	 been	 seen	 before	 if	 the	 image	 is	 digitally	 altered	 to	 reflect	 the	
changes	 that	 would	 result	 from	 altering	 camera-to-subject	 distance.	 My	 question	 of	
interest	 is	 different	 to	 this	 in	 at	 least	 two	 important	 ways.	 First,	 manipulating	 a	 single	
image	of	a	face	is	not	the	same	as	presenting	two	different	images	of	a	face	(Bruce,	1982;	
Jenkins	et	al.	2011).	There	is	both	behavioural	(Bruce,	1982)	and	neural	evidence	for	this	
(Bindemann,	 Burton,	 Leuthold	 and	 Schweinberger,	 2008).	 Bindemann	 et	 al.	 (2008)	
reported	 that	 the	 brain	 responses	 from	 the	 N250r	 (an	 event	 related	 brain	 response	
generated	from	the	fusiform	gyrus)	differ	depending	on	whether	multiple	images	of	the	
same	 face	 are	 presented,	 or	 if	 repetitions	 of	 the	 same	 face	 image	 are	 viewed.	 Perhaps	
more	importantly	with	reference	to	Liu’s	(2003)	study,	the	N250r	response	was	the	same	
for	 repetitions	 of	 the	 same	 face	 image	 and	 for	 when	 a	 digitally	 altered	 version	 of	 this	
	

123	
	

	
	
image	 (stretched)	 was	 presented.	 In	 addition	 to	 this,	 fMRI	 adaptation	 has	 shown	 that	
neural	representations	in	the	Fusiform	Face	Area	(FFA)	to	images	of	the	same	face	does	
not	vary	with	changes	in	size,	expression	or	pose	of	the	face	in	the	image	(Grill-Spector	et	
al.	1999;	Andrews	&	Ewbank,	2004).	These	results	highlight	that	responses	to	stimuli	may	
differ	 when	 simply	 altering	 the	 same	 image	 of	 a	 face	 digitally,	 compared	 to	 presenting	
multiple	 images.	 Liu’s	 stimuli	 really	 created	 an	 image	 recognition	 task,	 as	 images	 were	
changed	only	by	simulated	distance	manipulation.		
	
Second,	 I	 am	 interested	 in	 perceptual	 matching,	 as	 distinct	 from	 recognition	 memory.	
Perceptual	matching	is	interesting	in	its	own	right	because	it	allows	us	to	set	aside	aspects	
of	task	difficulty	that	arise	from	the	fallibility	of	memory,	and	to	focus	on	those	difficulties	
that	 remain	 at	 the	 perceptual	 level.	 Face	 matching	 tasks	 also	 model	 the	 task	 faced	 by	
forensic	and	security	officials	of	determining	identity	from	multiple	face	images.	
	
I	will	conduct	3	studies	to	investigate	the	effects	of	changing	camera-to-subject	distance	
on	facial	appearance	in	an	image,	and	the	consequent	effects	of	those	image	changes	on	
tasks	 involving	 identity	 judgment.	 I	 will	 first	 characterise	 configural	 changes	 across	
multiple	images	of	the	same	individuals	taken	at	known	distances.	I	will	then	investigate	
whether	 these	 changes	 translate	 into	 difficulties	 in	 face-matching	 accuracy	 for	 familiar	
and	unfamiliar	viewers.	Finally	I	will	evaluate	whether	the	visual	system	compensates	for	
distance	related	changes	in	the	face	image	when	distance	cues	are	available.	
	

4.3 Experiment	7:	Facial	Configuration	Measurements	
The	purpose	of	this	study	was	to	relate	camera-to-subject	distance	to	facial	configuration.		
The	apparent	size	of	an	object	changes	with	viewing	distance,	in	the	sense	that	the	size	of	
the	 retinal	 image	 changes.	 Linear	 changes	 in	 the	 size	 of	 a	 face	 image	 (e.g.	 rescaling	 a	
	

124	
	

	
	
photograph)	 do	 not	 affect	 configural	 information	 because	 they	 do	 not	 alter	 the	 relative	
distances	 between	 features.	 Consistent	 with	 this	 observation,	 both	 behavioural	 and	
neuroimaging	 studies	 have	 found	 that	 face	 recognition	 is	 unaffected	 by	 linear	 rescaling	
(Grill-Spector	 et	 al.	 1999;	 Andrews	 &	 Ewbank,	 2004).	 For	 3D	 objects	 (e.g.	 live	 faces	 as	
opposed	 to	 face	 photographs),	 the	 optical	 situation	 is	 different.	 Changes	 in	 camera-tosubject	 distance	 generate	 non-linear	 changes	 in	 the	 image,	 such	 that	 different	 parts	 of	
the	image	are	affected	to	differing	degrees	(Pirenne,	1970).	For	convex	objects,	including	
faces,	distant	viewing	leads	to	flatter	appearance,	whereas	closer	viewing	leads	to	more	
convex	 appearance	 (see	 Figure	 4.1).	 To	 relate	 this	 transformation	 to	 the	 notion	 of	
configuration	in	the	face	perception	literature,	I	measured	distances	between	key	facial	
features	 in	 photos	 that	 were	 taken	 at	 different	 viewing	 distances.	 My	 expectation	 was	
that,	as	a	reflection	of	the	flat-to-convex	variation,	the	change	in	viewing	distance	would	
affect	measures	near	the	edge	of	the	face	more	strongly	than	it	affects	measures	near	the	
centre	of	the	face.	
		
Photographic	Procedure	
	
The	 images	 used	 for	 all	 of	 these	 studies	 were	 face	 photographs	 of	 18	 final	 year	
undergraduates	 at	 the	 University	 of	 York.	 To	 allow	 construction	 of	 face-matching	
experiments	 (Megreya	 &	 Burton,	 2008;	 Burton	 et	 al.	 2010)	 these	 models	 were	
photographed	 in	 2	 separate	 sessions,	 one	 week	 apart.	 In	 each	 session,	 each	 model	 was	
photographed	at	2	distances—Near	(camera-to-subject	distance	=	.32m)	and	Far	(camerato-subject	 distance	 =	 2.7m),	 following	 Harper	 &	 Latto	 (2001).	 This	 resulted	 in	 4	
photographs	 for	 each	 of	 the	 18	 models:	 Week	 1	 Near,	 Week	 1	 Far,	 Week	 2	 Near,	 and	
Week	2	Far	(72	photos	in	total).	All	models	were	photographed	with	a	neutral	expression	
using	an	Apple	iPhone	5	on	default	settings.	Photos	were	then	cropped	around	the	head	
to	remove	clothing	and	background.	For	anthropometric	analysis,	all	images	were	resized	
to	an	interocular	distance	of	150	pixels,	preserving	aspect	ratio	(see	Figure	4.3).		
	
	

	

125	
	

	
	
Anthropometric	Analysis	
For	 each	 model	 in	 each	 condition,	 I	 measured	 5	 feature-to-feature	 distances	 that	 have	
been	specified	in	the	configural	processing	literature:	corner	of	eye	to	edge	of	nose	(left	
and	right;	Leder	&	Carbon,	2006),	corner	of	nose	to	corner	of	mouth	(left	and	right;	Leder	
&	Bruce,	2000),	and	nose	to	mouth	(Leder	&	Carbon,	2006).	Precise	anatomical	definitions	
were	as	used	by	Burton	et	al.	(2015).	The	corner	of	the	eye	is	defined	as	the	centre	of	the	
canthus,	the	corner	of	the	nose	as	the	lateral	extent	of	the	nasal	flange,	and	the	corner	of	
the	 mouth	 as	 the	 lateral	 extent	 of	 the	 vermillion	 zone.	 Figure	 4.3	 shows	 these	
measurements	 for	 one	 model.	 All	 distances	 are	 expressed	 in	 units	 of	 standardized	
interocular	distance.	
	
	
	
	
	
	
Figure	4.3	Example	of	the	measurements	taken	for	two	of	the	photos	of	one	model.	Measurements	taken	
were	the	distances	between:	left	eye	to	nose,	right	eye	to	nose,	left	nose	to	mouth	corner,	right	nose	to	
mouth	corner	and	centre	of	nose	to	mouth.		

	
Results	and	Discussion	
For	each	of	the	5	feature-to-feature	metrics,	I	performed	a	separate	2x2	ANOVA	with	the	
within-subjects	 factors	 of	 Photographic	 Session	 (Week)	 (Week	 1	 versus	 Week	 2)	 and	
Camera-to-Subject	 Distance	 (Distance)	 (Near	 versus	 Far).	 Results	 of	 these	 analyses	 are	
summarized	in	Table	4.1.	
	
	

	

126	
	

	
	
	
	

Week	

	

	

	

	

	

Dist	

	

	

	

	

	

	

Avg1	

Avg2	

Diff	

F	

p	

ES	

AvgN	

AvgF	

Diff	

F	

p	

ES	

EN(L)	

.57	

.57	

.00	

.05	

>.05	 .00	

.58	

.56	

-.02	

1.67	

>.05	

.09	

EN(R)	

.58	

.57	

.01	

.20	

>.05	 .01	

.59	

.57	

-.02	

3.53	

>.05	

.17	

NM(L)	

.42	

.41	

-.01	

1.58	 >.05	 .09	

.38	

.45	

.07	

39.82	 <.001	 .70	

NM(R)	

.42	

.41	

-.01	

1.23	 >.05	 .07	

.38	

.45	

.07	

42.97	 <.001	 .72	

NM(C)	

.22	

.23	

.01	

1.10	 >.05	 .06	

.20	

.24	

.04	

17.26	 <.001	 .5	

Table	 4.1	 Table	 showing	 mean	 measurements	 for	 each	 photograph	 condition.	 EN	 stands	 for	 ear	 to	 nose	
measurement,	and	NM	represents	nose	to	mouth.	The	letters	following	denote	the	side	of	the	image	which	
the	measurement	was	taken	for,	L	=	left,	R	=	right	&	C	=	centre.	Average	measurements	are	calculated	for	
week	1	(Avg1)	and	week	2	(Avg2)	at	both	near	(AvgN)	and	far	(AvgF)	distances.		

	
Photographic	Session	(Week)	had	no	significant	effect	on	any	of	the	measurements	(p	>	.1	
for	all),	indicating	similar	viewpoint	and	expression	in	both	sessions.	More	importantly	for	
this	 study,	 Camera-to-Subject	 Distance	 (Distance)	 had	 a	 significant	 effect	 on	 some	
measures	 but	 not	 on	 others.	 Specifically,	 the	 more	 peripheral	 nose-to-mouth	
measurements	 were	 greater	 for	 Far	 images	 than	 for	 Near	 images,	 whereas	 the	 more	
central	 eye-to-nose	 measurements	 were	 statistically	 equivalent	 at	 the	 two	 camera	
distances	we	compared.	This	pattern	in	the	anthropometric	data	corroborates	the	flatter	
appearance	of	the	Far	images	and	the	more	convex	appearance	of	the	Near	images.	More	
generally,	 it	 confirms	 the	 non-linear	 effect	 of	 camera-to-subject	 distance	 on	 configural	
information	for	this	image	set:	some	measurements	change	more	than	others.	I	next	used	
a	 paired	 matching	 task	 to	 assess	 the	 implications	 of	 these	 configural	 changes	 for	
perception	of	facial	identity.		
	

4.4 Experiment	8:	Face-Matching	&	Camera-to-Subject	Distance		
In	 the	 GFMT,	 a	 standard	 matching	 experiment,	 participants	 are	 presented	 with	 pairs	 of	
face	photographs	that	were	taken	with	different	cameras	(Burton	et	al.	2010).	For	each	
pair,	 the	 participant’s	 task	 is	 to	 decide	 whether	 the	 2	 photos	 show	 the	 same	 person	
(Same	 trials;	 50%	 prevalence)	 or	 2	 different	 people	 (Different	 trials;	 50%	 prevalence).	
Despite	the	simplicity	of	this	task,	error	rates	are	high	when	the	faces	are	unfamiliar,	as	

	

127	
	

	
	
the	 viewer	 has	 no	 way	 to	 distinguish	 image	 changes	 from	 identity	 changes.	 When	 the	
faces	are	familiar	errors	are	virtually	absent,	presumably	because	variation	in	appearance	
is	better	characterised	by	the	viewer	(Jenkins	&	Burton,	2011).	
	
Here	I	extend	the	standard	paired	matching	task	by	adding	camera-to-subject	distance	as	
an	 experimental	 factor.	 Because	 face	 images	 change	 with	 camera-to-subject	 distance,	
manipulating	 this	 distance	 allows	 very	 specific	 predictions	 to	 be	 made:	 for	 viewers	 who	
are	unfamiliar	with	the	faces	concerned,	a	change	in	camera-to-subject	distance	should	
impair	performance	on	Same	Identity	trials	(because	it	generates	dissimilar	images)	and	
should	improve	performance	on	Different	Identity	trials	(for	the	same	reason).	If	identity	
judgments	by	familiar	viewers	rely	on	facial	configurations	then	the	same	should	apply	to	
their	 performance.	 However,	 given	 that	 familiar	 viewers	 readily	 see	 through	 changes	 in	
viewpoint,	 lighting,	 facial	 expression,	 and	 other	 factors,	 it	 is	 anticipated	 that	 familiar	
viewers	 might	 similarly	 see	 through	 changes	 in	 camera-to-subject	 distance,	 such	 that	
their	performance	would	be	unaffected	by	this	manipulation		
	
Method	
	
Participants		
45	 psychology	 undergraduates	 at	 the	 University	 of	 York	 participated	 in	 exchange	 for	
payment	or	course	credit.	23	of	these	participants	were	first-year	students	who	arrived	at	
the	University	of	York	after	our	photographic	models	had	left,	and	hence	had	never	seen	
the	faces	in	the	stimulus	set	(verified	post-test;	see	Procedure	section	below).	We	refer	to	
these	 participants	 as	 Unfamiliar	 viewers	 (M	 =	 4,	 mean	 age	 =	 18.7).	 The	 remaining	 22	
participants	were	other	students	from	the	same	year	group	as	our	photographic	models,	
and	had	spent	over	two	years	studying	on	the	same	course	(M	=	3,	mean	age	=	22.14).	I	
refer	 to	 these	 participants	 as	 Familiar	 viewers	 because	 they	 had	 seen	 the	 faces	 in	 the	
stimulus	set	routinely	over	those	two	years.	
	
	
	

	

128	
	

	
	
	
Stimuli	and	Design	
	
The	 same	 stimulus	 images	 were	 used	 as	 in	 Experiment	 8.	 Images	 were	 cropped	 around	
the	face	to	remove	extraneous	background.	The	face	images	measured	700	pixels	wide	by	
900	pixels	high.		
	
In	 order	 to	 create	 the	 face-matching	 task,	 images	 were	 paired	 according	 to	 Same	 and	
Different	 Identity	 trials.	 The	 different	 identity	 trials	 were	 created	 by	 pairing	 the	 most	
similar	face	images	from	those	available.	Image	pairs	were	also	constructed	according	to	
Same	 and	 Different	 Camera-to-Subject	 Distance.	 Crossing	 the	 within	 subject	 factors	
Identity	 and	 Distance	 resulted	 in	 four	 stimulus	 conditions:	 (i)	 Same	 Identity,	 Same	
Distance;	(ii)	Same	Identity,	Different	Distance;	(iii)	Different	Identity,	Same	Distance;	and	
(iv)	 Different	 Identity,	 Different	 Distance.	 Figure	 4.4	 shows	 example	 pairs	 from	 each	
condition.		
	
	
	
	
	
	
	
	
	
	
	
	
Figure	4.4	Example	of	one	identity	with	each	of	their	four	image	identity	pairings	shown.	The	first	column	
shows	image	pairs	of	the	same	identity	and	the	second	column	shows	different	identity	pairs.	The	first	row	
shows	same	camera-to-subject	distance	pairs	and	the	bottom	row	shows	pairs	where	the	images	are	of	
different	camera-to-subject	distance.	

	

	

129	
	

	
	
For	each	pair,	the	participant’s	task	was	to	decide	whether	the	two	images	showed	the	
same	person	or	two	different	people.	This	allowed	a	percentage	accuracy	score	to	be	
calculated	for	each	participant	in	each	condition.	Both	Unfamiliar	and	Familiar	Viewer	
groups	carried	out	this	task.	
	
Procedure	
	
Participants	were	tested	in	groups	and	worked	individually	in	silence.	Participants	viewed	
a	total	of	144	image	pairs	as	part	of	a	face-matching	task.	These	images	were	viewed	as	
part	 of	 a	 timed	 PowerPoint	 presentation	 in	 which	 each	 image	 pair	 was	 visible	 for	 5	
seconds	followed	by	a	3,2,1	countdown	before	the	next	image	pair	appeared.	Within	this	
viewing	and	countdown	time	participants	were	required	to	decide	whether	the	images	in	
the	 pair	 they	 had	 just	 viewed	 showed	 the	 same	 person’s	 face	 or	 if	 the	 faces	 were	 two	
different	 people.	 Participants	 recorded	 their	 answers	 by	 circling	 ‘same’	 or	 ‘different’	 on	
an	answer	sheet	to	indicate	their	identity	judgement	for	each	image	pair.		
	
After	completing	the	face-matching	task,	participants	completed	a	familiarity	check.	For	
this,	participants	viewed	images	of	each	of	the	identities	that	were	stimuli	in	the	task	and	
simply	ticked	a	box	to	indicate	whether	they	were	familiar	with	each	face	or	left	the	box	
unmarked	if	the	faces	were	unknown	to	them.	This	allowed	me	to	exclude	the	data	from	
faces	that	were	unfamiliar	to	any	of	the	familiar	viewer	group,	or	familiar	to	any	of	the	
unfamiliar	group.		
	
Results	
	
A	2x2x2	mixed	ANOVA	between	Identity,	Viewer	Familiarity	and	Distance	was	performed	
and	confirmed	that	overall	performance	was	significantly	lower	for	Unfamiliar	viewers	(M	
=	85.71,	SE	=	.80,	CI	=	84.11	–	87.31)	than	Familiar	viewers	(M	=	97.89,	SE	=	.81,	CI	=	96.25	
–	99.53),	F(1,43)	=	114.8,	p<.001,	 η p2 	=	.73,	this	finding	was	expected	given	that	familiar	
viewers	 have	 been	 found	 to	 be	 more	 robust	 against	 superficial	 image	 changes	 than	
unfamiliar	 viewers.	 There	 was	 a	 significant	 main	 effect	 of	 Identity,	 with	 performance	

	

130	
	

	
	
being	 higher	 for	 Different	 Identity	 (M	 =	 93.36,	 SE	 =	 .91,	 CI	 =	 91.53	 –	 95.2)	 than	 Same	
Identity	 trials	 (M=	 90.25,	 SE	 =.87,	 CI	 =	 88.49	 –	 92.00),	 F(1,43)	 =	 5.17,	 p	 =	 .03,	 η p2 	 =	 .11.	
However	 there	 was	 no	 interaction	 between	 Identity	 and	 Viewer	 Familiarity	 [F(1,43)	 =	
1.53,	p	=.22,	 η p2 	=	.03].		
	
There	was	also	a	significant	main	effect	of	Distance,	with	participants	performing	better	
for	 Same	 Distance	 trials	 (mean	 =	 94.92,	 SE	 =.55,	 CI	 =	 93.82	 –	 96.03)	 than	 Different	
Distance	trials	(mean	=	88.69,	SE	=	.85,	CI	=	86.97	–	90.38),	F(1,43)	=	51.90,	p<.001,	 η p2
=.547.	This	demonstrates	that	changing	camera-to-subject	distance	lowers	face-matching	
performance	 accuracy.	 In	 addition	 to	 this	 both	 the	 interaction	 between	 Distance	 and	
Viewer	 Familiarity	 [F(1,43)	 =	 21.80,	 p<.001,	 η p2 	 =	 .34]	 and	 the	 interaction	 between	
Distance	 and	 Identity	 [F(1,43)	 =	 133.64,	 p<.001,	 η p2 	 =.76]	 were	 significant.	 Finally,	 there	
was	a	significant	three	way	interaction	between	Identity,	Distance	and	Viewer	Familiarity	
F(1,43)	=	117.45,	p<.001,	 η p2 	=	.73.	To	break	down	this	3-way	interaction,	I	next	carried	
out	separate	2x2	within-subjects	ANOVAs	for	the	Familiar	and	Unfamiliar	groups.		
	

	

131	
	

	
	

	
Figure	4.5	Effect	of	changing	camera-to-subject	distance	on	performance	accuracy	in	the	face-matching	task	
for	familiar	(F)	and	unfamiliar	viewers	(U),	for	same	and	different	identity	trials,	at	both	same	and	different	
distances.	Error	bars	show	the	standard	error	of	the	mean.			

	
As	 expected,	 Familiar	 viewers	 performed	 with	 very	 high	 accuracy	 in	 both	 the	 Same	
Distance	 and	 Different	 Distance	 conditions.	 Accuracy	 was	 significantly	 higher	 for	 Same	
Distance	image	pairs	(M	=	98.84	%	correct,	SD	=	1.37)	than	for	Different	Distance	image	
pairs	(M	=	96.70	%	correct,	SD	=	4.17),	[F(1,21)	=	5.49,	p<.05,	 η p2 	=	.21],	even	though	the	
magnitude	of	this	effect	was	small.	There	was	no	significant	effect	of	Identity	[F(1,21)	=		
3.41,	p	=	.08,	 η p2 	=	.14]	and	no	interaction	between	these	two	factors	p>.05.	
	
	
More	 importantly,	 Unfamiliar	 viewers	 performed	 significantly	 better	 when	 the	 paired	
images	 were	 taken	 at	 the	 Same	 Distance	 (M	 =	 90.85%	 correct,	 SD	 =	 4.97)	 compared	 to	
Different	Distance	(M	=	80.54,	SD	=	6.83)	[F(2,22)	=	51.20,	p<.001,	 η p2 =	.67].	There	was	no	
significant	 effect	 of	 Identity	 [F(1,22)	 =	 3.48,	 p=.08,	 η p2 	 =	 .14],	 but	 critically	 there	 was	 a	

	

132	
	

	
	
significant	 Identity	 Distance	 interaction	 [F(1,22)	 =	 155.13,	 p<.001,	 η p2 =	 .88].	 For	 Same	
Identity	pairs,	accuracy	was	higher	for	Same	Distance	(M	=	97.33,	SD	=	3.09)	images	than	
Different	 Distance	 (M	 =	 69.29,	 SD	 =	 13.41)	 images.	 For	 Different	 Identity	 pairs	 the	
opposite	 was	 true:	 accuracy	 was	 higher	 for	 Different	 Distance	 (M	 =	 91.86,	 SD	 =	 6.29)	
images	 than	 for	 Same	 Distance	 images	 (M	 =	 84.37,	 SD	 =	 10.64).	 Simple	 main	 effects	
revealed	 that	 there	 was	 a	 significant	 effect	 of	 Identity	 for	 both	 Same	 [F(1,44)	 =	 19.34,	
p<.001,	 η p2 	=.	31]	and	Different	Distance	images		[F(1,44)	=	58.73,	p<.001,	 η p2 	=.	57].	And	
there	 was	 a	 significant	 effect	 of	 distance	 for	 Same	 Identity	 pairs	 	 [F(1,44)	 =	 191.92,	
p<.001,	 η p2 	=.	81],	and	also	for	Different	Identity	pairs		[F(1,44)	=	13.69,	p<.001,	 η p2 	=.	31].		
	
	
Discussion	
	
Changing	 camera-to-subject	 distance	 impaired	 viewers’	 face-matching	 ability.	 Accurate	
face	recognition	remained	easy	for	viewers	who	were	familiar	with	the	faces	concerned	
regardless	 of	 camera-to-subject	 distance	 (mean	 performance	 across	 same	 distance	 and	
different	distance	conditions	=	99%	accuracy).	Unfamiliar	viewers	performed	much	more	
poorly	in	the	different	distance	condition	(M	=	81%)	than	the	same	distance	condition	(M	
=	 91%).	 There	 was	 a	 crucial	 cross	 over	 interaction	 in	 the	 unfamiliar	 group	 –	 with	
performance	 decreasing	 as	 a	 result	 of	 changes	 to	 camera	 to	 subject	 distance	 only	 for	
same	identity	trials.	
	
	
This	pattern	of	results	for	unfamiliar	viewers	suggests	that	the	kinds	of	configural	change	
evident	 in	 the	 measurement	 study	 join	 the	 long	 list	 of	 image	 changes	 that	 can	 thwart	
identification.	This	finding	shows	that	the	effects	of	changing	camera-to-subject	distance	
identified	 in	 the	 past	 (Latto	 &	 Harper,	 2007)	 do	 indeed	 also	 impair	 face	 recognition.	
Distance	 related	 image	 changes	 had	 been	 found	 to	 impair	 face	 memory	 for	 digitally	
edited	compared	with	identical	images	(Liu,	2003).	My	study	demonstrates	that	distance	
manipulations	also	impair	face-matching.		

	

133	
	

	
	
	
The	 finding	 that	 familiar	 viewers	 were	 impervious	 to	 these	 non-linear	 changes	 in	 facial	
configuration	 suggests	 that	 familiar	 face	 recognition	 is	 not	 strongly	 dependent	 on	
distances	 between	 features	 in	 face	 recognition.	 It	 seems	 that	 when	 learning	 a	 face,	
people	are	not	learning	any	specific	configurations	of	a	face,	as	familiar	viewers	are	able	
to	see	through	changes	in	camera-to-subject	distance,	where	these	configurations	differ	
across	the	images.		
	
	
The	findings	of	these	studies	raise	an	interesting	question:	if	faces	look	more	flat	or	
convex	at	different	viewing	distances,	why	do	we	not	notice	these	shape	changes	in	daily	
life?	The	next	experiment	addresses	this	question.		
	

4.5 Experiment	9	-	Perceptual	Constancy	for	Face	Shape	
In	 the	 measurement	 study	 I	 demonstrated	 that	 images	 undergo	 non-linear	 changes	 in	
configuration	 as	 a	 result	 of	 changed	 distance.	 Experiment	 8	 showed	 that	 these	 image	
changes	 can	 easily	 disrupt	 unfamiliar	 face	 matching:	 unfamiliar	 viewers	 were	 poorer	 at	
matching	 pairs	 of	 faces	 when	 the	 two	 images	 were	 taken	 from	 different	 camera-tosubject	distances,	compared	with	when	the	images	were	taken	from	the	same	distance.	
In	real	life	however,	we	do	not	tend	to	notice	changes	in	face	shape.	For	example,	faces	of	
people	walking	towards	us	do	not	appear	more	convex	as	they	approach.	There	are	in	fact	
lots	of	examples	of	not	noticing	image	change	for	images	other	than	faces.	These	include	
changes	in	colour	and	brightness,	for	example	clothes	are	perceived	as	the	same	colour	
even	 under	 different	 types	 of	 lighting;	 the	 colour	 is	 perceived	 as	 constant	 through	
calibration	 of	 white	 (Webster	 &	 Mollen,	 1995).	 Shapes	 are	 also	 perceived	 constant	
through	 the	 use	 of	 depth	 cues	 (Pizlo,	 1994).	 Each	 of	 these	 scenarios	 is	 an	 example	 of	
perceptual	constancy	–	where	the	visual	system	uses	information	from	the	environment	
to	 overcome	 image	 changes.	 Perceptual	 constancy	 has	 been	 studied	 intensively	 but	
normally	in	the	context	of	low-level	visual	features	such	as	colour	and	shape.			

	

134	
	

	
	
It	 is	 evident	 that	 perceptual	 constancy	 can	 help	 viewers	 make	 sense	 of	 the	 colours	 or	
shapes	 that	 they	 see.	 Following	 this,	 one	 possibility	 is	 that	 perceptual	 constancy	
mechanisms	 compensate	 for	 changes	 in	 face	 shape	 caused	 by	 viewing	 distance.	 In	 the	
same	way	that	viewers	perceive	the	shape	of	an	open	door	as	rectangular,	even	though	
the	 retinal	 projection	 is	 trapezoidal,	 they	 may	 view	 face	 shapes	 as	 a	 constant	 shape	
despite	distance-related	distortions	by	taking	account	of	the	viewing	distance	for	the	face	
image.	 It	 may	 be	 that	 when	 looking	 at	 a	 photograph	 on	 a	 screen	 no	 information	 of	
viewing	 distance	 is	 available,	 removing	 the	 ability	 to	 compensate	 for	 distance	 and	 as	 a	
result	the	face	images	look	different.		
	
In	 order	 to	 investigate	 if	 perceptual	 constancy	 applies	 to	 face	 shape	 under	 these	
conditions	I	will	investigate	whether	a	distance	cue,	indicating	that	one	image	was	taken	
from	further	away	and	one	close	up,	can	overcome	the	effect	that	a	change	of	camera-tosubject	had	on	recognition	accuracy	as	found	in	Experiment	8.	My	approach	to	this	will	be	
to	 manipulate	 congruency	 between	 the	 actual	 camera-to-subject	 distance	 and	 the	
distance	implied	by	cues	in	the	display.	Specifically,	I	will	present	the	two	face	images	in	
each	 pair	 at	 two	 different	 sizes	 –	 a	 small	 image	 implying	 a	 long	 viewing	 distance	 and	 a	
large	image	implying	a	short	viewing	distance.	In	congruent	trials,	the	size	cues	will	be	in	
sympathy	with	the	images,	so	that	the	near	image	is	larger	and	the	far	image	is	small.	In	
incongruent	 trials,	 the	 size	 cues	 will	 conflict	 with	 the	 images,	 so	 that	 the	 near	 image	 is	
small	 and	 the	 far	 image	 is	 large.	 If	 a	 perceptual	 constancy	 mechanism	 compensates	 for	
distance-related	 distortions,	 then	 participants	 should	 perform	 more	 accurately	 on	
congruent	trials	than	on	incongruent	trials,	when	the	two	images	show	the	same	person.	
This	is	because	the	valid	distance	cues	will	allow	constancy	to	correctly	‘undo’	the	optical	
distortion,	 making	 the	 images	 look	 more	 similar.	 At	 the	 same	 time,	 accuracy	 should	 be	
lower	 for	 congruent	 trials	 than	 for	 incongruent	 trials,	 when	 the	 two	 images	 show	
different	people.	This	is	because	with	invalid	distance	cues,	the	constancy	mechanism	will	
compensate	 in	 the	 wrong	 direction,	 exaggerating	 differences	 between	 the	 images,	
making	them	look	less	similar.		

	

135	
	

	
	
	
Method	
Participants	
30	 unfamiliar	 participants	 (male=6,	 age	 =	 19)	 from	 the	 University	 of	 York	 took	 part	 as	
participants	in	this	study.		
Design	
Experiment	 9	 was	 largely	 based	 on	 the	 design	 of	 Experiment	 8.	 However,	 in	 this	
experiment	the	image	pairs	featured	a	distance	cue.	In	half	of	the	trials	this	distance	cue	
was	 Congruent	 with	 the	 true	 distances	 for	 which	 the	 photos	 were	 taken	 e.g.	 the	 near	
photo	was	shown	as	a	bigger	image	on	screen	and	the	far	image	as	smaller.	Other	times	I	
swapped	the	image	sizes,	to	create	an	Incongruent	image	pairs	e.g.	the	near	image	was	
made	smaller	to	appear	far	away,	and	the	far	image	enlarged	to	appear	near.		
	
Congruent	image	pairs	were	created	by	keeping	the	near	photograph	its	natural	size	(the	
original	 size	 of	 the	 saved	 photograph,	 not	 resized	 in	 any	 way),	 and	 the	 far	 image	 its	
natural	 size.	 This	 resulted	 in	 a	 larger	 ‘near’	 photograph,	 and	 a	 smaller	 ‘far’	 photograph.	
Incongruent	images	were	created	by	resizing	the	far	image	to	be	the	natural	size	that	the	
near	image	was,	and	resizing	the	near	image	to	take	the	size	that	the	original	far	image	
was.	This	created	a	small	‘near’	image	and	a	large	‘far’	image,	hence	incongruent	to	the	
natural	display	format	these	images	would	take.	In	addition	to	this,	perspective	lines	were	
added	to	the	images	to	provide	an	additional	depth	cue.	The	perspective	lines	supported	
the	interpretation	of	distance	in	the	displays.	See	Figure	4.6	for	an	example	of	congruent	
and	incongruent	same	and	different	identity	pairs.		
	

	

136	
	

	
	

	
Figure	4.6	Example	of	congruent	and	incongruent	face	image	pairs	(with	distance	cues)	for	same	and	
different	identities.		

	
Procedure	
The	task	was	a	face-matching	task	as	in	Experiment	8,	however	the	stimuli	now	featured	
distance	cues.	Participants	were	tasked	with	making	same	or	different	identity	judgments	
for	each	face	pair	that	they	saw.	144	pairs	were	viewed	in	total.		
	
	
	
	
	
	
	
	
	

	

137	
	

	
	
Results	&	Discussion	

	

Figure	4.7	Graph	showing	the	percentage	of	correct	responses	for	both	congruent	and	incongruent	image	
pairs	broken	down	by	same	and	different	identity	trials.	Error	bars	show	the	standard	error	of	the	mean.			

A	 2x2	 ANOVA	 was	 conducted	 to	 compare	 the	 effect	 of	 Distance	 Cues	 (Congruent	 &	
Incongruent)	and	Identity	(Same	&	Different),	on	face	matching	accuracy	(see	Figure	4.7).	
The	analysis	revealed	no	significant	main	effect	of	Distance	Cue,	comparing	accuracy	on	
Congruent	(M	=	80.82)	with	Incongruent	(M	=	78.49)	trials	F(1,27)	=	3.29,	p	=	.08,	 η p2 	=	
.11.	However,	there	was	a	significant	main	effect	of	Identity	F(1,27)	=	8.13,	p	=	.008,	 η p2 	=	
.23,	 with	 higher	 recognition	 accuracy	 for	 Different	 (M	 =	 85.35)	 rather	 than	 Same	 (M	 =	
73.97)	identity	trials.	More	importantly,	there	was	also	a	significant	interaction	between	
Congruence	 and	 Distance,	 F(1,27)	 =	 23.22,	 p<.001,	 η p2 	 =.46,	 confirming	 that	 the	
congruency	 manipulation	 had	 opposite	 effects	 for	 Same	 Identity	 and	 Different	 Identity	
trials.	

	

138	
	

	
	
Simple	 main	 effects	 revealed	 that	 as	 anticipated,	 participants	 were	 significantly	 more	
accurate	 at	 Same	 identity	 trials	 for	 Congruent	 image	 pairs	 (M	 =	 78.38,	 SE	 =	 3.3)	 than	
Incongruent	image	pairs	(M	=	69.57,	SE	=	3.6),	F(1,54)=	22.44,	p<.001,	 η p2 	=	.29.		
	
My	second	prediction	was	also	met	with	results	showing	that	for	different	identity	trials,	
viewers	were	more	accurate	for	incongruent	image	pairs	(M	=	87.42,	SE	=	1.8),	than	for	
congruent	image	pairs	(M	=	83.27,	SE	=	2.18),	F(1,54)	=4.97,	p<.01,	 η p2 =	.08.		
	
The	 results	 from	 Experiment	 9	 suggest	 the	 operation	 of	 a	 perceptual	 constancy	
mechanism	at	the	level	of	face	shape.	For	unfamiliar	faces,	participants	were	better	for	
same	 identity	 trials	 when	 provided	 with	 a	 congruent	 distance	 cue.	 In	 this	 situation	 the	
constancy	 mechanism	 would	 be	 giving	 the	 information	 necessary	 to	 make	 the	 images	
look	more	similar	e.g.	compensating	in	the	correct	direction	for	the	differences	in	images	
as	a	result	of	distance.		
	
Further	evidence	for	the	constancy	mechanism	comes	from	the	result	found	for	different	
identity	trials.	In	the	case	of	incongruent	different	identity	face	trials	if	the	logic	applied	
above	was	followed,	the	incongruent	distance	cues	would	lead	to	being	compensated	in	
the	 ‘wrong’	 direction,	 making	 the	 images	 look	 even	 more	 different	 than	 they	 would	
otherwise.	 Hence	 correct	 different	 judgments	 were	 greater	 when	 the	 differences	 were	
amplified	due	to	incongruence	than	when	they	were	viewed	in	the	congruent	condition.	If	
no	perceptual	constancy	mechanism	were	being	applied	to	face	perception	no	differences	
in	performance	would	have	been	observed	across	conditions.		
	
	
	

	

139	
	

	
	

4.6 General	Discussion	
In	this	chapter	I	have	shown	that	changes	in	camera-to-subject	distance	lead	to	non-linear	
changes	in	face	measurements	across	images.	These	changes	had	a	detrimental	effect	on	
face	 identification	 efforts	 of	 unfamiliar	 viewers	 –	 identity	 matching	 performance	 was	
much	 poorer	 when	 unfamiliar	 viewers	 compared	 photographs	 taken	 from	 differing	
distances,	than	when	the	comparison	images	were	taken	at	the	same	distance.	This	effect	
was	 driven	 by	 difficulty	 for	 matching	 same	 identity	 trials.	 Familiar	 viewers	 were	 far	 less	
affected	 by	 this	 distance	 change	 and	 performed	 at	 very	 high	 accuracy	 levels	 in	 both	
conditions.	Furthermore	I	provide	evidence	of	perceptual	constancy	effects	at	the	level	of	
face	 shape.	 Congruent	 distance	 cues	 aided	 recognition	 of	 same	 identity	 faces,	 with	
distance	 cues	 compensating	 for	 the	 apparent	 differences	 in	 faces	 due	 to	 camera-tosubject	distance.	In	line	with	this,	an	incongruent	image	cue	made	the	images	look	even	
more	different	to	each	other,	and	hence	increased	accuracy	on	correct	different	person	
judgements.	 These	 findings	 suggest	 that	 perceptual	 constancy	 can	 account	 for	 the	
apparent	 continuity	 of	 facial	 appearance	 at	 different	 viewing	 distances,	 and	 that	 valid	
distance	cues	are	required	for	the	smooth	operation	of	this	mechanism.		
	
Previous	 research	 had	 shown	 that	 perceptions	 gathered	 from	 a	 face	 image,	 including	
weight	estimates	and	social	inferences,	differed	as	a	result	of	changed	camera-to-subject	
distance		(Harper	&	Latto,	2001;	Bryan	et	al.	2012).	My	analysis	showed	that	non-linear	
changes	 in	 metric	 distances	 between	 facial	 features	 can	 account	 for	 such	 perceptions.	
Most	importantly,	in	addition	to	the	social	inference	effects	reported	previously,	my	study	
demonstrated	 that	 these	 configural	 changes	 affect	 accuracy	 in	 identity	 judgment.	 The	
camera-to-subject	 manipulation	 greatly	 impaired	 unfamiliar	 viewers,	 whereas	 familiar	
viewers	 were	 barely	 familiar	 at	 all.	 The	 level	 of	 familiarity	 with	 each	 face	 was	 not	
recorded	 –	 participants	 rated	 faces	 as	 either	 familiar	 or	 unfamiliar.	 It	 could	 be	 the	 case	
that	some	of	the	familiar	viewers	were	not	all	that	familiar	with	the	faces	involved,	which	
could	 explain	 the	 small	 difference	 found	 for	 familiar	 viewers	 over	 distance	 change.	 My	
findings	are	very	much	in	line	with	past	research	demonstrating	superior	performance	of	
familiar	viewers	in	face	recognition	task	over	unfamiliar	viewers	(e.g.	Burton	et	al.	1999,	

	

140	
	

	
	
Jenkins	 et	 al.	 2011).	 In	 addition	 this	 research	 is	 the	 first	 to	 show	 that	 face	 matching	
performance,	 in	 addition	 to	 face	 memory	 (Liu,	 2003)	 is	 impaired	 by	 camera-to-subject	
distance	change	across	images.		
	
Moreover,	 the	 findings	 of	 this	 chapter	 suggest	 that	 learning	 configural	 information	 –	 at	
least	in	the	conventional	sense	of	distances	between	features	–	is	not	the	key	to	learning	
a	face.	Experiment	7	highlighted	that	the	configuration	of	a	face	does	not	remain	constant	
across	multiple	images	if	these	images	have	been	taken	from	different	camera-to-subject	
distances.	 Yet,	 it	 is	 known	 that	 people	 can	 accurately	 identify	 celebrities	 who	 have	
become	familiar	faces	through	our	exposure	to	images	and	video	footage,	which	capture	
the	 celebrity	 from	 many	 different	 distances.	 It	 seems	 more	 likely	 that	 when	 people	 are	
become	familiar	with	a	face	they	are	gaining	experience	under	a	variety	of	conditions,	this	
would	 include	 learning	 how	 the	 face	 looks	 from	 a	 range	 of	 different	 distances	 (e.g.	
Jenkins,	 2011).	 This	 fits	 with	 the	 conception	 of	 face	 space	 advanced	 throughout	 this	
thesis.	 On	 this	 account,	 familiar	 viewers	 have	 a	 range	 of	 experience	 with	 a	 face,	 and	
hence	 experience	 of	 seeing	 the	 face	 over	 several	 different	 distance	 configurations.	
Unfamiliar	 viewers	 do	 not	 have	 this	 refined	 face	 space	 and	 must	 instead	 make	 identity	
judgments	using	only	the	information	that	is	in	the	images	in	front	of	them.	This	lack	of	
perceptual	 constancy	 interacts	 with	 configural	 change	 in	 the	 following	 ways.	 First	 it	
makes	 it	 harder	 to	 ‘tell	 together’	 same	 identity	 images	 with	 different	 configurations,	 as	
the	 viewer	 does	 not	 have	 experience	 of	 the	 variability	 in	 appearance	 for	 that	 identity.	
Second,	 it	 supports	 viewers’	 performance	 on	 different	 identity	 trials,	 as	 the	 different	
camera-to-subject	 distances	 tend	 to	 exaggerate	 natural	 differences	 between	 the	 faces	
such	that	the	identities	would	be	more	likely	to	be	categorised	as	belonging	to	different	
identities	 in	 face	 space.	 Notably,	 the	 distance	 change	 only	 makes	 different	 people	 look	
more	different	because	they	look	similar	to	begin	with	-	it	would	be	possible	for	different	
people	to	look	less	different	as	a	result	of	distance	change	if	for	example	one	was	fat	and	
one	was	thin.		
	

	

141	
	

	
	
These	 experiments	 also	 advance	 the	 theoretical	 understanding	 of	 unfamiliar	 face	
recognition;	in	particular	I	have	proposed	that	a	perceptual	constancy	mechanism	exists	
whereby	 distance	 cues	 help	 an	 individual	 make	 sense	 of	 a	 face	 image	 viewed.	 Whereas	
past	studies	have	shown	perceptual	constancy	for	colour	and	basic	geometric	shapes,	and	
even	for	the	size	and	shape	of	humans,	to	my	knowledge,	this	is	the	first	study	to	propose	
a	constancy	mechanism	for	faces.	
	
Several	practical	implications	follow	from	these	findings.	Firstly,	according	to	the	results	
of	 Experiment	 7,	 anthropometry	 (in	 which	 the	 images	 used	 in	 investigation	 are	 likely	
taken	from	unknown	different	distances)	is	not	a	reliable	method	of	facial	identification.	
This	 is	 because	 images	 of	 faces	 do	 not	 hold	 constant	 configurations	 -	 measurements	
between	 features	 of	 a	 face	 change	 across	 images	 when	 these	 images	 have	 been	 taken	
from	different	distances	(Kleinberg,	Vanezis	&	Burton,	2007).	Additionally,	as	the	results	
of	Experiment	8	show	that	changing	the	distance	from	which	a	photograph	of	a	subject	
was	 taken	 between	 images	 reduces	 identification	 accuracy	 when	 making	 identity	
judgments	 based	 on	 face-matching,	 it	 would	 be	 advised	 that	 wherever	 possible,	 there	
should	be	consistency	in	distance	from	which	photographs	are	taken	from	in	forensic	or	
security	situations.	For	example	better	identification	rates	may	be	met	if	security	officials	
took	photographs	of	a	suspect	from	a	standard	distance,	which	could	be	compared	with	
photographs	 taken	 of	 the	 same	 suspect,	 from	 this	 same	 standard	 distance,	 following	 a	
separate	incident,	no	matter	where	the	incident	occurred.	If	this	is	not	possible,	caution	is	
urged	 when	 comparing	 images	 that	 were	 taken	 at	 different	 or	 unknown	 distances,	
because	 there	 are	 systematic	 differences	 in	 how	 they	 will	 appear.	 Familiar	 viewers	 are	
largely	exempt	from	this	caution	because	they	can	see	through	configural	changes	easily.	
Where	this	is	not	possible,	providing	accurate	distance	cues	could	improve	performance	
for	unfamiliar	viewers.	
	
In	conclusion,	face	configuration	in	an	image	changes	as	a	result	of	changes	in	camera-tosubject	 distance.	 These	 changes	 affect	 performance	 in	 face-matching	 tasks,	 with	
unfamiliar	 viewers	 being	 very	 strongly	 affected.	 The	 finding	 that	 accurate	 distance	 cues	
	

142	
	

	
	
help	 viewers	 to	 see	 through	 these	 changes	 in	 configuration	 suggests	 the	 operation	 of	 a	
perceptual	constancy	mechanism.			
	
Up	 until	 now	 this	 thesis	 has	 addressed	 incidental	 creations	 of,	 or	 naturally	 occurring,	
challenging	stimuli.	The	next	chapters	of	this	thesis	will	move	on	to	explore	a	second	type	
of	 challenging	 stimuli,	 where	 deliberate	 attempts	 are	 made	 to	 change	 appearance	
through	disguise.			
	

	

	

143	
	

	
	

Chapter	5	–	Matching	Disguised	Faces	
5.1 Chapter	Summary	
I	created	a	sophisticated	disguise	face	database	which	is	the	first	to	include	both	evasion	
and	 impersonation	 disguise	 items.	 Models	 were	 unguided	 in	 their	 disguise	 efforts.	 Any	
props	 that	 they	 requested	 to	 aid	 their	 disguises	 were	 provided	 and	 all	 models	 were	
informed	 of	 the	 effect	 of	 changing	 face	 shape	 as	 a	 result	 of	 camera	 to	 subject	 distance	
(Chapter	4).		
	
Disguise	 impaired	 face-matching	 performance.	 Performance	 for	 unfamiliar	 viewers	 was	
poor,	and	knowing	that	disguises	may	be	present	did	not	improve	this.	Evasion	disguise	
caused	 most	 difficulties	 for	 face-matching,	 followed	 by	 ‘impersonation	 similar’	 then	
‘impersonation	random’	disguises.	Familiar	viewers	were	much	better	at	seeing	through	
disguise	than	unfamiliar	viewers	but	even	familiarity	did	not	help	viewers	completely	see	
through	evasion	disguise.	Links	to	theories	of	face	space	and	familiarity	are	discussed.		
	

5.2 Introduction	
This	 chapter	 will	 explore	 face-matching	 performance	 for	 disguised	 faces.	 Previous	
research	has	shown	that	unfamiliar	face-matching	is	poor,	with	people	making	between	
10-20%	 errors	 on	 the	 GFMT	 (long	 &	 short	 version	 respectively)	 which	 is	 a	 standardised	
test	of	face	image	matching,	and	also	in	matching	tasks	involving	an	ID	card	image	and	a	
physically	 present	 person	 (Burton	 et	 al.	 2010;	 Kemp	 et	 al.,	 1997;	 White	 et	 al.,	 2014).	
However	these	results	come	from	tasks	that	have	used	cooperative	stimuli	(when	same	
identity	trials	were	created	there	was	no	deliberate	attempt	to	make	an	individual	look	
different	across	multiple	images)	and	mismatch	trials	were	created	from	the	most	similar	
match	taken	from	a	limited	sample	of	face	images.		
	

	

144	
	

	
	
I	 found	 that	 performance	 was	 even	 worse	 when	 using	 more	 difficult	 stimuli	 images	
involving	 incidental	 disguise	 (Chapters	 2	 &	 4)	 than	 had	 been	 found	 previously	 for	
cooperative	 stimuli.	 	 These	 include	 the	 case	 of	 ambient	 same	 person	 images	 and	
extremely	 similar	 imposter	 face	 images	 (Chapter	 2)	 and	 images	 where	 face	 shape	 is	
changed	as	a	result	of	camera-to-subject	distance	(Chapter	4).	These	studies	have	helped	
in	gaining	a	better	understanding	of	face	recognition	in	challenging	circumstances	and	my	
findings	 have	 also	 demonstrated	 the	 power	 of	 familiarity	 –	 so	 far	 familiarity	 has	 led	 to	
large	increases	in	performance	on	the	difficult	tasks	that	I	have	developed.		
	
With	 such	 promising	 results,	 I	 now	 invest	 in	 creating	 a	 disguise	 face	 database	 to	 allow	
face-matching	 performance	 accuracy	 to	 be	 tested	 in	 what	 may	 be	 the	 most	 challenging	
case	yet	–	when	a	person	is	deliberately	trying	to	evade	their	own	identity	or	impersonate	
someone	 else.	 	 As	 I	 am	 creating	 the	 database	 from	 scratch	 it	 will	 include	 all	 aspects	 of	
disguise	 that	 I	 am	 interested	 in.	 Past	 studies	 have	 relied	 on	 incidentally	 occurring	
similarities	 between	 faces	 to	 provide	 different	 person	 trials	 in	 face-matching	 tasks	 and	
used	two	images	of	the	same	person	that	may	differ	in	naturalistic	incidental	ways.	Now,	
in	 addition	 to	 creating	 no	 disguise	 image	 pairs	 similar	 to	 those	 used	 in	 past	 matching	
tasks,	 I	 attempt	 to	 create	 disguise	 image	 pairs.	 These	 will	 include	 creating	 deliberate	
similarities	between	different	identities	(impersonation	disguise)	as	well	as	creating	same	
person	 trials	 where	 there	 is	 a	 deliberate	 attempt	 to	 alter	 appearance	 across	 images	
(evasion	disguise).	These	images	will	allow	face-matching	performance	for	disguised	faces	
to	be	directly	compared	to	performance	for	undisguised	faces	for	the	first	time.		
	
Disguise	in	Face	Recognition	Memory	
Only	a	few	past	studies	have	attempted	to	approach	questions	related	to	face	recognition	
performance	 and	 disguise.	 Patterson	 and	 Baddeley	 (1977)	 were	 the	 first	 to	 do	 so,	
publishing	 a	 paper	 that	 consisted	 of	 two	 disguise	 face	 memory	 experiments.	 Their	 first	
experiment	 tested	 face	 recognition	 memory	 performance	 for	 identical	 face	 images	
compared	with	face	images	changed	in	pose	or	expression,	and	also	faces	in	disguise.	The	

	

145	
	

	
	
disguise	 stimuli	 were	 images	 of	 actors,	 with	 the	 same	 actors	 photographed	 in	 two	
different	 character	 roles.	 This	 allowed	 one	 of	 these	 images	 to	 be	 presented	 during	 the	
learning	 phase	 and	 the	 other	 at	 test.	 Participants	 were	 divided	 into	 two	 groups	 for	 the	
learning	 phase	 in	 the	 experiment.	 Both	 groups	 were	 required	 to	 view	 face	 images	 with	
the	intent	of	memorising	them.	However,	one	group	was	instructed	to	focus	on	making	
personality	 judgments	 for	 the	 faces	 they	 saw	 and	 the	 other	 instructed	 to	 focus	 on	 the	
features	 of	 the	 faces	 (discussed	 further	 in	 Chapter	 5).	 Participants	 were	 aware	 that	 the	
appearance	of	the	individuals	might	change	when	they	saw	the	face	again	at	test.	At	test	
participants	again	viewed	face	images,	and	were	this	time	tasked	with	deciding	whether	
or	 not	 the	 identities	 presented	 at	 test	 had	 featured	 in	 the	 earlier	 memory	 task.	
Recognition	performance	was	poorer	for	images	presented	in	disguise	at	test	than	images	
that	were	presented	unchanged	[identical	image	presented	again	at	test]	or	changed	only	
in	pose	or	expression.	Recognition	performance	was	near	chance	for	faces	presented	in	
disguise	at	test.		
	
The	disguise	stimuli	used	in	Patterson	&	Baddeley’s	(1977)	first	experiment	in	their	paper	
discussed	above,	were	multiple	images	of	the	same	actors,	whose	appearances	differed	
across	 images	 as	 a	 result	 of	 matching	 different	 character	 roles	 for	 different	 acting	 jobs.	
This	stimuli	acquisition	method	meant	that	the	exact	changes	made	to	appearance	by	the	
actors	across	images	were	unknown	to	the	experimenters,	and	hence	the	effect	of	each	
change	could	not	be	easily	investigated.	A	more	controlled	manipulation	of	disguise	was	
applied	 in	 Experiment	 2.	 Here	 Patterson	 &	 Baddeley	 created	 the	 disguise	 stimuli	
themselves	 by	 asking	 volunteer	 male	 models	 to	 create	 disguises	 using	 props.	 Disguise	
through	 purely	 the	 addition	 of	 props	 is	 a	 common	 approach	 to	 disguise	 undertaken	 by	
past	 work.	 I	 will	 here	 on	 refer	 to	 disguise	 by	 props	 alone	 as	 simple	 disguise.	 Models	 in	
Patterson	 &	 Baddeley’s	 study	 were	 instructed	 to	 add	 wigs,	 add	 or	 remove	 beards,	 and	
add	or	remove	glasses	across	a	series	of	photographs.	I	will	refer	to	disguise	based	on	the	
experimenter’s	 instruction	 as	 prescribed	 disguise	 and	 situations	 where	 models	 disguise	
themselves	as	they	wish	as	free	disguise.	Patterson	&	Baddeley’s	experiment	thus	fits	the	
category	 of	 prescribed	 simple	 disguise.	 Participants	 were	 tested	 on	 their	 recognition	
performance	 for	 disguise	 compared	 to	 no	 disguise	 images	 in	 a	 similar	 manner	 to	 in	
	

146	
	

	
	
Patterson	 &	 Baddeley’s	 Experiment	 1.	 As	 in	 the	 case	 of	 the	 recognising	 disguise	 actor	
images,	participants	in	Patterson	&	Baddeley’s	(1977)	second	experiment	were	worse	at	
recognising	 a	 face	 identity	 when	 it	 was	 presented	 in	 a	 different	 disguise	 at	 test	 to	
learning,	than	recognising	an	exact	image	that	they	had	viewed	before.	Specific	effects	of	
each	disguise	manipulation	investigated	in	this	experiment	are	discussed	in	Chapter	5.	
	
Terry	(1994)	also	explored	the	effect	of	disguising	individuals	on	memory	for	computergenerated	faces.	Experiments	included	the	addition	or	removal	of	glasses,	or	the	addition	
or	 removal	 of	 a	 beard	 on	 computer	 generated	 face	 images.	 Removing	 glasses	 reduced	
recognition	accuracy,	as	did	the	addition	of	a	beard,	and	also	the	removal	of	a	beard,	but	
to	 a	 lesser	 extent.	 However,	 adding	 glasses	 did	 not	 affect	 recognition	 accuracy.	 Righi,	
Peissig	 &	 Tarr	 (2012)	 replicated	 this	 result	 with	 real	 face	 images	 from	 the	 TarrLab	
database,	 showing	 that	 the	 addition	 of	 glasses	 did	 not	 affect	 face	 recognition	 memory	
performance	as	much	as	the	addition	of	a	wig	or	the	removal	of	glasses.	These	findings	
show	that	a	change	in	appearance	in	form	of	a	disguise	is	not	always	clear	cut	in	terms	of	
the	 effect	 it	 will	 have	 on	 face	 recognition	 –	 some	 manipulations	 impair	 performance	
whereas	other	do	not.		
	
Both	 of	 these	 studies	 (Patterson	 &	 Baddeley	 1977	 &	 Terry,	 1994)	 report	 overall	
impairments	to	face	recognition	memory	when	an	image	is	presented	in	a	changed	form	
(in	 a	 disguise)	 at	 test	 compared	 to	 at	 presentation	 despite	 significant	 difference	 in	 the	
disguise	 stimuli	 used	 in	 each	 experiment.	 Patterson	 &	 Baddeley’s	 (1977)	 Experiment	 2	
and	Righi	et	al.	(2012)	consisted	of	prescribed	disguise	stimuli	whereas	Terry	(1994)	used	
computer-generated	 stimuli	 images.	 This	 prominent	 effect	 of	 disguise	 on	 face	 memory	
makes	 it	 seem	 likely	 that	 face-matching	 tasks	 will	 also	 be	 affected	 by	 disguise	
manipulations	as	factors	such	as	changes	in	pose	and	expression,	which	have	influenced	
face	 recognition	 memory,	 have	 also	 reduced	 performance	 for	 face-matching	 in	 past	
studies	(Bruce,	1982;	Hancock	et	al.	2000).		
	

	

147	
	

	
	
Disguise	in	Face-matching	
Dhamecha,	Singh,	Vatsa	&	Kumar	(2014)	are	the	only	researchers	to	have	tested	human	
face-matching	accuracy	for	disguised	faces.	Their	study	compares	human	face	recognition	
performance	 to	 machine	 algorithm	 performance.	 The	 main	 reason	 for	 human	
performance	being	investigated	was	to	learn	from	the	strategies	used	by	humans	in	order	
to	enhance	the	algorithms.		

Figure	 5.1	 Images	 from	 the	 IDV1	 database.	 Props	 include	 glasses,	 fake	 beards	 and	 moustaches,	 medical	
maskes	and	hats	turbans.		
	

Dhamecha	et	al.	(2014)	created	the	IIII-Delhi	Disguise	Version	1	face	database	(IDV1)	for	
their	 study	 (see	 Figure	 5.1).	 This	 consisted	 of	 images	 of	 75	 models	 each	 of	 whom	 had	
been	given	access	to	accessories	that	they	could	use	to	disguise	themselves.	Models	were	
photographed	 in	 5	 disguises,	 and	 in	 1	 no	 disguise	 (neutral)	 condition.	 All	 photographs	
were	 taken	 under	 nearly	 identical	 lighting	 conditions,	 with	 a	 frontal	 pose	 and	 neutral	
expression,	 thereby	 limiting	 options	 for	 creating	 disguise.	 Accessories	 for	 disguise	 were	
wigs,	 fake	 beards	 and	 moustaches,	 sunglasses	 and	 glasses,	 hats,	 turbans,	 veils,	 and	
medical	masks.	Models	could	wear	just	one,	or	multiple	accessories	for	their	disguises	but	
were	simply	told	‘to	use	the	accessories	at	their	will	in	order	to	get	disguised’,	therefore	
the	models	conducted	simple	free	disguise.	The	props	and	the	study	itself	focus	towards	
occluding	 specific	 areas	 of	 the	 face	 (e.g.	 eyes	 hidden	 by	 glasses,	 medical	 masks	 hiding	
mouth).	 Occlusion	 of	 features	 is	 a	 typical	 result	 of	 many	 simple	 disguise	 manipulations.	
The	authors	point	out	that	the	photographs	used	as	stimuli	included	all	parts	of	the	face	
being	 hidden	 at	 least	 once.	 Although	 occlusion	 of	 features	 may	 be	 effective	 in	 hiding	
identity,	it	is	important	to	remember	that	concealing	parts	of	a	face	would	often	not	be	
an	effective	disguise	in	the	context	of	identity	fraud	as	these	disguise	props	would	have	to	
be	 removed	 for	 facial	 comparison	 inspections.	 Concealment	 of	 features	 through	 use	 of	

	

148	
	

	
	
props	is	a	different	type	of	disguise	to	manipulating	the	way	features	themselves	look	and	
may	lead	to	very	different	results.	
	
The	 study	 reported	 that	 human	 performance	 accuracy	 for	 matching	 the	 face	 pairs	
differed	 as	 a	 result	 of	 ethnicity	 of	 the	 faces	 viewed	 and	 also	 familiarity	 with	 the	 faces.	
Performance	 accuracy	 was	 lowest	 for	 matching	 unfamiliar	 faces	 of	 different	 ethnicity	
(66%	accuracy),	followed	by	unfamiliar	faces	of	the	same	ethnicity	(69.5%)	and	highest	for	
the	 familiar	 faces,	 all	 of	 which	 were	 the	 same	 ethnicity	 (75%	 accuracy).	 Performance	 of	
the	 algorithm	 proposed	 in	 the	 paper	 was	 comparable	 with	 the	 unfamiliar,	 different	
ethnicity	human	viewers	(Dhamecha	et	al.,	2014).			
	
Dhamecha	 et	 al.	 (2014)	 provide	 the	 first	 assessment	 of	 disguise	 face-matching	 accuracy	
by	 human	 viewers.	 In	 particular,	 the	 free	 disguise	 aspect	 used	 by	 Dhamecha	 and	
colleagues	 was	 interesting	 in	 that	 it	 produced	 a	 stimuli	 set	 of	 faces	 that	 differed	 from	
each	other	in	their	disguises,	with	multiple	disguises	for	each	face.	However,	this	design	
also	had	limitations.	One	specific	area	of	concern	is	that	there	were	inconsistencies	in	the	
matching	 task.	 Participants	 in	 the	 study	 made	 same	 or	 different	 identity	 judgments	 to	
pairs	of	face	images.	Sometimes	this	involved	matching	a	neutral	face	to	a	disguise	face,	
but	more	often	participants	were	matching	pairs	of	disguised	images.	Nearly	all	the	faces	
had	some	type	of	disguise	present,	therefore	the	study	compared	performance	between	
familiarity	 and	 ethnic	 groups	 on	 a	 task	 of	 matching	 disguised	 faces	 rather	 than	
investigating	 whether	 disguise	 impaired	 performance	 more	 than	 the	 matching	 of	
undisguised	face	pairs.	This	design	meant	that	a	direct	comparison	between	performance	
accuracy	for	disguised	and	undisguised	faces	could	not	be	made.	An	additional	limitation	
of	the	design	is	that	images	were	cropped	so	that	only	internal	cues	could	be	used.	This	
disregards	 any	 influence	 of	 external	 cues	 which	 may	 actually	 have	 been	 able	 to	 help	
contribute,	 or	 alternatively	 could	 have	 worsened,	 human	 participants’	 face-matching	
performance.	 This	 would	 have	 been	 an	 interesting	 comparison	 to	 have,	 particularly	 as	
previous	 work	 suggests	 that	 unfamiliar	 viewers	 rely	 heavily	 on	 external	 features	 when	
making	 facial	 identity	 judgments	 (see	 Chapter	 6	 for	 further	 discussion	 of	 features	 and	
	

149	
	

	
	
identity	judgments)	(Bruce	et	al,	1999,	2001;	Bonner,	Burton	&	Bruce,	2003;	Megreya	&	
Bindemann,	2009).		
	
Limitations	of	Previous	Disguise	Research	
There	 are	 four	 main	 limitations	 of	 the	 previous	 research	 conducted	 on	 facial	 identity	
performance	 for	 disguised	 faces.	 These	 are	 i)	 that	 past	 tasks	 compare	 recognition	 of	
disguised	 faces	 with	 identical	 image	 comparisons,	 ii)	 a	 focus	 on	 memory	 tasks,	 iii)	
investigation	of	only	evasion	disguise,	ignoring	impersonation,	and	iv)	the	use	of	simple,	
often	 prescribed,	 disguises	 as	 the	 stimuli.	 The	 reasons	 why	 each	 of	 these	 points	 limit	
disguise	research	is	discussed	below.		
	
Firstly,	 performance	 for	 disguised	 face	 identity	 judgments	 have	 most	 often	 been	
compared	 with	 performance	 for	 remembering	 exact	 images.	 Patterson	 &	 Baddeley,	
(1977)	 forced	 comparisons	 of	 image	 recognition	 (remembering	 the	 exact	 image)	 with	 a	
difficult	case	of	face	recognition	(different	images	of	the	same	face),	rather	than	testing	
whether	 disguised	 face	 recognition	 is	 more	 difficult	 than	 undisguised	 face	 recognition	
(across	 different	 undisguised	 images	 of	 the	 same	 identity),	 which	 would	 be	 a	 more	
meaningful	 comparison.	 Some	 of	 the	 studies	 have	 actually	 made	 alterations	 to	 the	
learning	 image	 itself	 to	 create	 a	 disguise	 condition,	 which	 recreated	 a	 scenario	 more	
similar	to	image	matching	than	purely	investigating	disguise	(e.g.	Terry	1994).		
	
Recognition	 of	 identical	 images	 is	 an	 easy	 task	 for	 humans	 (Bruce,	 1982).	 Facial	
appearance	 changes	 incidentally	 across	 multiple	 different	 images.	 In	 real	 life	 facematching	scenarios,	it	is	performance	across	these	types	of	changes	that	are	of	interest,	
as	it	not	possible	to	capture	the	exact	same	image	of	a	person	across	time.	This	has	now	
been	 recognised	 in	 the	 study	 of	 face	 recognition.	 Ambient	 face	 images,	 which	 include	
naturalistic	 and	 incidental	 changes,	 such	 as	 differences	 in	 expression,	 are	 now	 typically	
the	 image	 type	 used	 in	 tasks	 that	 try	 to	 capture	 facial	 identity	 performance	 accuracy.	

	

150	
	

	
	
Comparisons	 with	 image	 matching	 performance	 do	 not	 match	 on	 to	 the	 real	 world	
problem	 where	 naturalistic	 differences	 exist	 between	 multiple	 images	 of	 a	 face.	 In	 the	
investigation	 of	 disguise,	 it	 is	 thus	 important	 that	 performance	 for	 disguised	 faces	 is	
compared	 against	 performance	 for	 faces	 that	 include	 this	 incidental	 change	 in	
appearance	
	
Further	to	the	focus	on	image	matching,	past	research	is	mostly	confined	to	the	study	of	
face	 recognition	 memory	 for	 disguised	 faces	 (Patterson	 &	 Baddeley	 1977;	 Terry,	 1994;	
Righi	 et	 al.	 2012).	 This	 thesis	 focuses	 on	 face-matching	 accuracy,	 as	 this	 is	 a	 frequently	
used	identification	check	that	does	not	rely	on	any	memory	component.	The	only	task	to	
date	 which	 has	 looked	 at	 face-matching	 accuracy	 for	 disguise	 faces	 (Dhamecha	 et	 al.	
2014)	 investigated	 the	 performance	 of	 computer	 algorithms	 on	 the	 task	 compared	 to	
human	 performance,	 but	 did	 not	 provide	 a	 control	 of	 performance	 for	 faces	 in	 no	
disguise.	Previous	research	has	found	that	unfamiliar	face-matching	is	poor	in	cooperative	
facial	image	comparison	task,	but	I	believe	it	to	be	interesting	to	know	whether	disguise	
impairs	face	recognition	performance	further.	In	criminal	or	undercover	police	situations	
there	can	be	very	strong	incentives	to	carry	out	a	realistic	and	successful	disguise,	but	it	is	
not	yet	confirmed	that	disguises	impair	face-matching	performance.	
	
Previous	research	on	disguise	has	focused	exclusively	on	evasion	–	changes	to	appearance	
that	 make	 the	 model’s	 own	 identity	 difficult	 to	 determine.	 Disguise	 could	 also	 involve	
impersonation,	changing	appearance	to	look	like	a	specific	other	person,	but	this	has	not	
been	addressed	by	disguise	research	to	date.	Most	stimuli	databases	include	exclusively	
male	faces	and	some	studies	have	relied	on	computer	generated	stimuli	rather	than	real	
human	 face	 images	 to	 explore	 evasion	 disguise.	 A	 database	 of	 male	 and	 female	 human	
faces	 in	 both	 evasion	 and	 impersonation	 disguise	 is	 necessary	 to	 better	 understand	
disguise.		
	

	

151	
	

	
	
A	final	limitation	of	the	previous	research	on	disguise	is	that	the	stimuli	images	used	have	
been	 unsophisticated.	 Simple	 disguises	 have	 dominated	 which	 involve	 the	 addition	 or	
removal	of	props	to	a	face	from	a	limited	supply	of	props	provided	by	the	experimenters.	
This	limits	the	ways	in	which	individuals	can	disguise	themselves	and	may	not	reflect	the	
props	that	the	person	would	choose	to	disguise	themselves	naturally.	Simple	disguise	has	
often	 led	 to	 occlusion	 of	 features.	 This	 disguise	 technique	 would	 create	 ineffective	
disguises	in	many	contexts	of	identity	fraud.	Most	disguises	have	been	prescribed	by	the	
experiments,	meaning	that	no	information	has	been	gathered	on	the	changes	that	people	
naturally	 make	 to	 their	 appearance	 to	 create	 disguises.	 Patterson	 &	 Baddeley	
(Experiment	1)	were	the	only	researchers	to	test	more	naturalistic	disguises	in	terms	of	
the	 actor	 photographs,	 but	 the	 intent	 at	 time	 of	 photograph	 for	 these	 appearance	
alterations	was	not	specifically	disguise.	Examples	of	existing	disguise	face	databases	are	
shown	below	-	figures	make	limitations	of	the	stimuli	involved	evident.		
	

5.3 Existing	Disguise	Face	Databases	
The	AR	&	IDV1	Databases	
The	AR	database	(Martinez	&	Benavente,	1998)	and	the	IDV1	database	(Dhamecha	et	al.,		
2014)	are	the	only	existing	disguise	face	databases	to	contain	images	of	real	people	under	
a	variety	of	disguises.	In	the	case	of	the	AR	database,	disguises	include	changes	in	lighting	
and	deliberate	changes	in	expression,	which	led	to	incidental	changes	in	the	appearance	
of	the	identity,	and	also	more	deliberate	changes	to	appearance	through	the	addition	of	
props	 –	 sunglasses	 and	 a	 scarf	 to	 hide	 features	 of	 the	 face	 (Figure	 5.2).	 This	 database	
consists	 of	 only	 changes	 that	 constitute	 evasion	 disguise	 and	 incidental	 appearance	
change.	 The	 IDV1	 database	 (image	 shown	 previously	 in	 Figure	 5.1)	 kept	 expression	 and	
lighting	 constant;	 therefore	 disguise	 was	 created	 purely	 through	 the	 addition	 of	 props.	
Both	the	AR	database	and	the	IDV1	database	show	rather	unconvincing	disguises,	in	that	
it	is	obvious	that	people	are	trying	to	hide	their	appearance	through	props.		

	

152	
	

	
	
	
	
		
	
	
	
	
Figure	5.2	Images	from	the	AR	database.	Disguise	manipulations	are	limited	to	a	change	of	expression	or	
the	addition	of	sunglasses	or	a	scarf.			
	
	The	National	Geographic	Database	

The	 National	 Geographic	 Database	 (Ramathan,	 Chowdhury	 &	 Chepella	 (2004)	 also	
includes	real	faces	image	with	a	disguise	component,	but	consists	of	46	images	of	just	one	
identity.	It	is	questionable	whether	this	database	should	really	be	considered	a	disguise	
database	 at	 all,	 as	 many	 of	 the	 photographs	 (e.g.	 the	 top	 row	 in	 Figure	 5.3)	 fit	 the	
category	of	incidental	change	as	they	show	natural	variation	between	multiple	images	of	
a	 face	 rather	 than	 a	 deliberate	 attempt	 to	 change	 appearance.	 The	 bottom	 row	 (Figure	
5.3)	 does	 consitute	 disguise	 images,	 but	 these	 disguises	 are	 very	 obviously	 fake.	 It	
appears	that	the	disguises	have	been	applied	to	the	images	themselves	(e.g.	addition	of	
beard	or	moustache)	rather	than	to	the	model	prior	to	the	photograph	being	taken.		
	
	
	
	

	

153	
	

	
	
	
	
	
	
	
Figure	5.3	A	sample	of	images	from	the	National	Geographic	Database	(Ramathan,	et	al.,	2004).	
	

TarrLab	Face	Database		
A	 final	 example	 of	 human	 disguise	 face	 images	 used	 in	 the	 previous	 research	 is	 the	
TarrLab	face	database.	This	includes	images	of	faces	photographed	naturally	and	also	the	
same	 identities	 photographed	 wearing	 a	 wig	 and	 glasses	 (Figure	 5.4).	 Sometimes	 these	
glasses	 are	 reading	 glasses	 and	 at	 other	 times	 they	 are	 sunglasses,	 which	 occlude	 the	
model’s	eyes.	An	advantage	of	the	TarrLab	database	is	that	images	are	of	both	males	and	
female	 faces,	 whereas	 all	 other	 databases	 discussed	 consist	 of	 male	 faces	 only.	
Additionally,	 the	 fact	 that	 models	 are	 photographed	 both	 with	 and	 without	 disguise	
allows	 direct	 comparisons	 of	 disguise	 to	 be	 made.	 The	 database	 is	 however	 limited	 to	
prescribed	 simple	 disguise,	 and	 many	 of	 the	 disguises	 include	 occlusion	 of	 features.	 As	
discussed	 previously,	 items	 that	 occlude	 features	 may	 have	 to	 be	 removed	 in	 security	
screening	scenarios.		
	
	
	
	
	
	

	

154	
	

	
	
	
	
	
	
	
	
	
	
Figure	5.4	Example	images	taken	from	the	TarrLab	face	database.		
	

Synthetic	Face	Database	
Disguise	face	databases	made	up	of	images	of	human	faces	have	been	rather	limited	to	
date.	 Singh,	 Vatsa	 &	 Noore	 (2009)	 acknowledge	 and	 address	 this	 by	 creating	 a	 new	
disguise	face	database	which	encompass	a	greater	range	of	disguises,	but	are	not	images	
of	 real	 human	 faces	 (Figure	 5.5).	 Singh	 et	 al.	 (2009)	 used	 Faces	 Software,	 which	 is	 a	
programme	 used	 to	 make	 face	 images	 based	 on	 facial	 descriptions	 in	 police	
investigations,	 to	 build	 synthetic	 faces	 in	 different	 disguise	 conditions.	 Computer	
algorithm	 performance	 was	 then	 tested	 on	 this	 database,	 with	 the	 best	 performing	
algorithm	 performing	 with	 71%	 accuracy	 when	 dealing	 with	 images	 that	 had	 multiple	
components	 of	 disguise.	 The	 study	 looked	 exclusively	 at	 algorithm	 performance	 on	 the	
disguise	 face	 database,	 with	 the	 aim	 of	 identifying	 which	 algorithm	 performed	 with	
highest	 accuracy	 on	 this	 image	 set.	 There	 was	 no	 control	 condition	 to	 compare	
performance	 for	 undisguised	 face	 images.	 This	 study	 is	 therefore	 limited	 to	 evasion	
disguise	performance	for	computer	algorithms	on	a	set	of	computer	generated	disguised	
face	images.	Moreover	the	disguises	are	not	realistic,	for	example,	a	pirate	style	eye	patch	
is	unlikely	to	be	worn	by	a	disguise	perpetrator.		

	

155	
	

	
	

Figure	5.5	Examples	from	the	synthetic	face	disguise	database	(Singh	et	al.	2009).		

	

	

5.4 FAÇADE	Database	
Shortcomings	of	Existing	Databases	
As	 seen	 from	 previous	 research,	 existing	 databases	 are	 limited	 -	 they	 are	 unrealistic	 or	
involve	occlusion	of	features	and	include	only	evasion	disguise	images.	It	is	also	extremely	
important	 to	 investigate	 impersonation	 disguise,	 as	 in	 some	 instances	 people	 may	 have	
very	 strong	 incentives	 to	 pass	 themselves	 off	 as	 a	 specific	 other	 person.	 Impersonation	
could	include	the	case	of	illegally	accessing	a	country	on	a	stolen	passport.		
	
Furthermore,	 past	 databases	 are	 rather	 unsophisticated	 in	 the	 disguise	 manipulations	
applied.	 All	 of	 the	 disguise	 databases	 rely	 on	 simple	 disguise.	 Many	 are	 prescribed	
disguises	 (whereby	 the	 experimenters	 have	 told	 the	 model’s	 exactly	 how	 to	 apply	 the	
props)	but	in	cases	where	models	have	been	free	to	disguise	themselves	as	they	wished	
disguises	 were	 severely	 restricted	 to	 the	 addition	 of	 a	 limited	 selection	 of	 props	 and	
combinations	 of	 these.	 As	 models	 were	 never	 given	 the	 opportunity	 to	 disguise	

	

156	
	

	
	
themselves	with	free	reign	over	changes	to	their	appearance,	very	little	is	known	about	
the	disguise	process	itself,	let	alone,	what	works.		
	
With	 these	 shortcomings	 in	 mind,	 I	 created	 my	 own	 database	 of	 disguised	 and	
undisguised	 faces,	 named	 the	 FAÇADE	 image	 database,	 which	 I	 will	 use	 to	 investigate	
deliberate	disguise.		
	
Goals	for	the	FAÇADE	Database	
I	created	the	FAÇADE	image	database	to	allow	face	recognition	to	be	explored	more	fully.	
Creating	 a	 disguise	 database	 from	 scratch	 allowed	 the	 effect	 of	 disguise	 manipulations	
and	the	process	in	creating	disguises	to	be	investigated	in	more	detail	than	ever	before.	A	
major	 goal	 of	 this	 database	 was	 to	 allow	 research	 to	 be	 conducted	 to	 directly	 compare	
face-matching	 performance	 for	 disguised	 against	 undisguised	 faces.	 In	 addition	 to	 this	 I	
wanted	 to	 break	 down	 disguise	 further	 –	 to	 explore	 evasion	 disguise	 and	 compare	 and	
contrast	this	to	impersonation	disguise	for	the	first	time.		
	
Unlike	 past	 disguise	 databases,	 which	 were	 made	 up	 of	 images	 of	 models	 who	 had	
applied	prescribed	simple	disguises,	I	wanted	to	be	able	to	explore	how	people	disguised	
themselves	when	they	were	given	the	opportunity	to	create	their	own	disguises.	Use	of	
free	disguise	will	allow	insight	into	what	measures	people	would	most	naturally	take	to	
disguise	themselves	and	also	allow	me	to	explore	what	disguises	in	the	database	worked	
best	based	on	results	from	the	matching	task.	This	exploratory	analysis	will	be	conducted	
in	Chapter	6.	
	
My	 initial	 goal	 and	 focus,	 following	 the	 theme	 of	 face-matching	 performance	 in	
challenging	 situations	 investigated	 throughout	 the	 rest	 of	 this	 thesis,	 was	 to	 assess	
whether	face-matching	performance	is	influenced	by	the	presence	of	disguise.	

	

157	
	

	
	
	
Approach	to	the	Image	Acquisition	
As	 mentioned	 above,	 I	 distinguish	 between	 two	 different	 types	 of	 disguise	 within	 my	
disguise	 stimuli—Evasion	 (trying	 not	 to	 look	 like	 oneself)	 and	 Impersonation	 (trying	 to	
look	like	a	specific	target	person).	To	capture	this	distinction,	I	gave	26	volunteer	models	
(i)	a	photo	of	themselves,	and	asked	them	to	make	themselves	look	unlike	that	reference	
photo	for	a	subsequent	shoot	(Evasion	condition),	and	(ii)	a	photo	of	someone	else,	and	
asked	 them	 to	 make	 themselves	 look	 like	 that	 person	 (Impersonation	 condition).	 It	 is	
possible	that	similarity	of	the	impersonation	face	may	influence	disguise	effectiveness.	To	
investigate	 this,	 two	 impersonation	 conditions	 were	 present	 for	 each	 identity	 –	
impersonating	 a	 similar	 face	 (rated	 to	 be	 the	 most	 similar	 out	 of	 33	 match	 faces	 by	 a	
group	of	3	viewers)	and	impersonating	a	face	that	was	selected	at	random.	A	no	disguise	
photograph	was	also	taken	of	each	model.		
	
Models	were	instructed	that	the	task	was	to	look	as	much	like	or	unlike	the	identity	of	the	
target	profile	photograph,	rather	than	the	image	itself	i.e.	I	was	interested	in	an	identity	
match	 task	 rather	 than	 image	 matching,	 which	 itself	 has	 already	 been	 addressed	 as	 a	
shortcoming	of	previous	research.		
	
One	of	my	goals	for	the	FAÇADE	database	was	to	create	a	database	that	would	not	only	
consist	 of	 realistic	 disguise	 stimuli	 for	 both	 undisguised,	 evasion	 disguise	 and	
impersonation	 disguise	 faces,	 but	 in	 the	 acquisition	 of	 which,	 could	 further	 the	
understanding	 of	 the	 things	 which	 people	 do	 to	 create	 each	 of	 these	 disguises.	 Models	
were	thereby	unguided	in	their	disguise	effort,	as	it	was	of	interest	to	us	to	find	out	what	
people	 did	 to	 disguise	 themselves.	 To	 allow	 this	 exploration	 of	 changes	 made,	 models	
wrote	down	all	changes	they	made	to	their	appearance,	and	the	intent	of	each	change,	as	
they	carried	out	their	disguise	transformations.	
	

	

158	
	

	
	
I	 provided	 models	 with	 any	 props	 that	 they	 wanted	 to	 aid	 their	 disguises.	 Props	 were	
ordered	 on	 request	 to	 match	 the	 models	 wishes,	 but	 including	 clothing,	 wigs,	 hair	
accessories,	 makeup,	 glasses	 and	 jewellery	 (see	 Figure	 5.6	 for	 selection	 of	 props	 used).	
Hats,	sunglasses	and	any	other	occluding	items	were	not	allowed,	as	these	props	have	to	
be	 removed	 if	 someone	 was	 physically	 present	 in	 these	 items	 at	 passport	 security.	
Models	 were	 instructed	 that	 their	 resulting	 appearance	 must	 not	 be	 considered	 out	 of	
place	as	being	a	real	person’s	work	identity	badge	image,	rather	than	fancy	dress.		
	

Figure	5.6	Sample	of	props	used	to	create	the	disguise	face	database.		

	
Photographs	 were	 taken	 at	 a	 time	 that	 suited	 the	 model,	 with	 photographs	 (evasion,	
impersonation	similar,	impersonation	random)	not	necessarily	captured	on	the	same	day.	
This	was	left	to	personal	choice	of	the	model	and	gave	our	models	time	to	make	changes	
to	 their	 appearance	 between	 photography	 sessions	 such	 as	 waiting	 for	 hair	 and	 beard	
growth.		
	
	
Motivation	
It	 was	 important	 that	 our	 models	 were	 highly	 motivated	 and	 dedicated	 to	 create	
convincing	 disguises.	 This	 was	 a	 very	 time	 consuming	 task	 for	 our	 models	 and	 required	
	

159	
	

	
	
much	thought	and	some	made	large	changes	to	their	appearance	including	having	their	
hair	 cut	 or	 died,	 or	 beard	 shaven/grown.	 It	 was	 up	 to	 the	 models	 which	 changes	 they	
made,	so	they	could	decide	how	far	they	wanted	to	go	with	the	disguises.	Models	were	
motivated	by	cash	reward,	with	the	best	disguise	(based	on	performance	in	Experiment	1)	
in	 each	 condition	 (evasion,	 impersonation	 similar,	 impersonation	 random)	 receiving	 a	
cash	 reward	 of	 £50.	 The	 models	 were	 extremely	 dedicated	 to	 the	 task	 and	 became	
competitive	with	one	another	to	create	the	best	disguise.			
	
Selecting	the	best	disguise	images	
Several	 images	 (9	 images)	 of	 the	 models	 were	 taken	 in	 each	 disguise	 condition,	 with	
models	varying	aspects	such	as	facial	expression,	pose	and	lighting	to	try	and	create	the	
best	image	for	each	disguise	condition	(see	Figure	5.7).	A	group	of	four	unfamiliar	viewers	
(the	stimuli	selection	group)	worked	together	to	decide	upon	the	most	convincing	match	
or	mismatch	images	for	each	model	in	each	condition.	The	stimuli	selection	group	viewed	
the	Impersonation	(similar	and	random)	images	and	the	Evasion	images	for	each	of	the	26	
models	alongside	the	corresponding	reference	image	of	the	target	face	(the	model’s	own	
face	 in	 evasion	 condition,	 and	 the	 faces	 of	 the	 people	 the	 models	 were	 trying	 to	
impersonate	 in	 the	 impersonation	 conditions).	 The	 group	 knew	 that	 they	 were	 dealing	
with	 images	 of	 people	 in	 disguise,	 and	 were	 informed	 of	 the	 true	 identity	 classification	
(same	or	different)	in	each	matching	situation	to	aid	them	in	their	decisions.		
	
	
	
	
	
	
	

	

160	
	

	
	
	
	
	
	
	
	
	
	
Figure	5.7	Image	taken	during	stimuli	selection	process.		

	
In	 order	 to	 better	 understand	 what	 was	 driving	 the	 decisions	 made	 by	 the	 stimuli	
selection	 group	 I	 asked	 them	 the	 following	 questions	 for	 each	 identity	 in	 each	 disguise	
condition:	1).	Which	image	provides	the	best	match?	2).	How	good	is	the	disguise	(rated	
on	 a	 scale	 of	 1	 –	 7	 with	 1	 representing	 ‘not	 a	 good	 disguise	 at	 all’	 and	 7	 indicating	 an	
extremely	 good	 disguise)?	 3).	 What	 is	 it	 about	 the	 chosen	 image	 which	 makes	 it	 an	
effective	disguise?	The	results	of	each	of	these	are	discussed	in	Chapter	6,	Understanding	
Disguise,	but	for	now	I	will	continue	to	focus	on	the	database	itself	and	the	image	pairs	
that	will	be	used	to	test	face-matching	performance.	
	

Face	Image	Pairs	
The	 images	 chosen	 by	 the	 stimuli	 selection	 group	 provided	 the	 final	 disguise	 image	 for	
each	of	the	models	in	each	of	the	disguise	conditions.	All	other	images	were	disregarded,	
with	only	the	most	convincing	disguises	for	each	model	kept	in	the	FAÇADE	database	to	
satisfy	the	disguise	face	image	conditions.	I	was	then	able	to	create	face	image	pairs	for	
both	 disguise	 and	 no-disguise	 conditions	 using	 the	 database	 images.	 The	 face-matching	
task	included	the	following	face	image	pairing	conditions:	same	identity	no	disguise,	same	

	

161	
	

	
	
identity	 evasion	 disguise,	 different	 identity	 similar	 no	 disguise,	 different	 identity	 similar	
impersonation	 disguise,	 different	 identity	 random	 no	 disguise	 and	 different	 identity	
random	impersonation	disguise.		
	

Figure	5.8	Example	pairs	for	each	condition.	Top	row	shows	same	identity	pairs,	the	lower	two	rows	show	
different	 identity	 pairs.	 Pairs	 in	 the	 first	 column	 are	 in	 no	 disguise.	 Pairs	 in	 the	 second	 column	 are	 in	
disguise.	All	26	models	were	photographed	in	each	of	the	conditions.		

	
Examples	of	the	stimuli	pairs	are	shown	in	Figure	5.8.	The	image	pairs	on	the	left	show	no	
disguise	 image	 pairings.	 The	 top	 two	 rows	 in	 this	 column	 consist	 of	 pair	 types	 that	 are	
very	 similar	 to	 those	 seen	 in	 other	 standardised	face-matching	 tasks	such	 as	 the	GFMT.	
Pairs	of	images	can	show	either	the	same	person	or	two	different	people.	I	have	extended	
the	design,	through	introducing	the	third	column	to	include	two	types	of	different	person	
trial.	Similar	pairing	means	that	the	foil	here	is	the	most	similar	looking	person	from	the	
option	 of	 face	 images.	 Random	 pairing	 means	 that	 the	 foil	 is	 drawn	 at	 random	 from	

	

162	
	

	
	
images	of	models	of	the	same	sex.	Therefore,	random	pairings	can	be	rather	dissimilar	in	
appearance,	showing	two	faces	side	by	side,	for	which	one	would	not	normally	expect	the	
identities	in	the	pair	to	be	confused.	
	
This	led	to	a	very	convincing	(and	rather	confusing)	series	of	images	within	the	FAÇADE	
database	(see	Figure	5.9).	
	
	
	
	
	
	
	
	
	
	
Figure	5.9	Selection	of	images	taken	from	the	disguise	base	database	to	create	a	wheel	of	disguise.	Images	
with	 the	 same	 colour	 frame	 show	 the	 same	 identity.	 Images	 with	 different	 colour	 frames	 are	 of	 different	
identities.		
	

Familiarity		
Familiarity	has	been	an	overarching	theme	throughout	this	thesis.	It	has	been	repeatedly	
shown	that	people	who	are	familiar	with	a	face	are	better	at	matching	images	of	that	face	
than	 people	 who	 are	 unfamiliar	 with	 the	 face.	 Now	 the	 familiarity	 advantage	 is	 being	
pushed	even	further	than	before	as	I	test	whether	familiarity	with	a	face	can	help	in	the	

	

163	
	

	
	
case	of	deliberate	disguise.	In	theoretical	terms,	as	a	face	becomes	familiar,	the	normally	
accepted	range	of	variability	for	a	face	is	being	learnt.	Effective	disguise	involves	taking	a	
face	 outwith	 that	 normal	 range	 of	 accepted	 appearance.	 It	 is	 thus	 possible	 that	 in	 the	
case	of	disguise,	familiarity	with	the	face	or	faces	involved	may	not	help	much,	precisely	
because	 disguising	 a	 face	 takes	 it	 out	 of	 the	 accepted	 range	 of	 appearances	 and	 hence	
out	of	the	familiar	viewer’s	area	of	expertise.	
	
Dhamecha	 et	 al.	 (2014)	 argue	 that	 familiarity	 with	 a	 face	 does	 help	 with	 face-matching	
performance	 for	 disguised	 faces.	 In	 their	 experiment	 Dhamecha	 and	 colleagues	 (2014)	
define	 familiar	 viewers	 as	 participants	 who	 worked	 in	 the	 same	 department	 as	 the	
models	 whose	 images	 were	 used	 in	 the	 matching	 task,	 and	 unfamiliar	 participants	 as	
people	 who	 did	 not	 have	 previous	 encounters	 with	 the	 models.	 Although	 there	 is	 a	
significant	 effect	 of	 familiarity	 when	 the	 results	 of	 trials	 are	 pooled,	 the	 effect	 is	 not	
significant	 for	 both	 trial	 types	 considered	 in	 isolation.	 For	 same	 person	 trials,	 familiar	
viewers	 were	 more	 accurate	 at	 the	 task	 then	 unfamiliar	 viewers.	 However	 for	 different	
person	 trials,	 incorrect	 matches	 were	 made	 equally	 often	 by	 familiar	 and	 unfamiliar	
viewers.	Both	items	in	both	match	and	mismatch	pair	items	were	most	often	disguised.	
Based	 on	 these	 findings,	 the	 familiarity	 advantage	 seems	 uncertain	 for	 the	 case	 of	
disguise.				
	
I	expect	that	unfamiliar	viewers	will	make	more	mistakes	on	the	disguised	face	trials	than	
familiar	viewers	will,	as	their	representation	of	these	faces	are	less	finely	tuned	than	that	
of	 familiar	 viewers.	 Unfamiliar	 viewers	 frequently	 accept	 imposter	 faces	 to	 be	 a	 target	
face.	 This	 is	 demonstrated	 by	 both	 earlier	 work	 in	 this	 thesis	 (e.g.	 Chapter	 2)	 and	 the	
results	 of	 face-matching	 tasks	 such	 as	 the	 GFMT,	 which	 show	 around	 80-90%	
performance	accuracy	for	unfamiliar	viewers,	confirming	that	people	are	making	errors	in	
identity	judgment.		
	

	

164	
	

	
	
There	 are	 however	 instances	 of	 extremely	 familiar	 viewers	 being	 tricked	 by	 imposter	
scenarios.	Thompson,	(1986)	demonstrated	that	it	is	possible	to	mistakenly	reject	a	family	
member	if	context	interferers	with	the	identity	judgment.	Thompson,	(1986)	constructed	
a	scenario	where	a	daughter	walked	past	her	parents,	without	acknowledging	them,	while	
they	 were	 on	 holiday.	 The	 parents	 believed	 that	 their	 daughter	 was	 at	 home,	 in	 a	
different	location.	The	parents	concluded	that	the	identity	was	not	that	of	their	daughter,	
based	on	the	incorrect	contextual	information	that	they	held.	This	example	demonstrates	
that	familiarity	does	not	guarantee	correct	identity	judgment.	It	is	possible	that	disguising	
a	 face	 may	 have	 a	 similar	 effect,	 as	 some	 context	 may	 be	 lost	 when	 a	 face	 is	 disguised	
with	 the	 attempt	 of	 removing	 a	 face	 outwith	 its	 accepted	 range	 of	 appearances.	 For	
example,	 a	 person	 may	 normally	 be	 seen	 in	 work	 clothes	 with	 a	 professional	 and	
approachable	appearance,	but	in	disguise	their	displayed	persona	may	be	different.		
	
The	Study	–	Research	Questions	&	Predictions	
This	study	will	use	image	pairs	from	the	FAÇADE	database	to	answer	four	main	research	
questions.	 Firstly,	 I	 will	 address	 whether	 overall	 face-matching	 performance	 accuracy	 is	
worse	 for	 disguised	 than	 undisguised	 face	 pairs.	 Presumably	 it	 will	 be	 more	 difficult	 to	
match	disguised	faces	than	images	of	cooperative	stimuli.		
	
The	 second	 question	 of	 interest	 is	 whether	 all	 disguise	 manipulations	 cause	 equal	
impairment	 to	 face-matching	 performance	 or	 if	 disguise	 type	 has	 an	 effect.	 Results	 of	
past	experiments	show	that	more	errors	are	made	for	same	person	trials	than	different	
person	trials	in	face-matching	tasks.	This	demonstrates	difficulties	in	integrating	multiple	
images	 of	 the	 same	 identity	 (e.g.	 Jenkins	 et	 al.	 2011).	 Thus,	 a	 deliberate	 attempt	 to	
frustrate	 the	 integration	 of	 identity	 (the	 evasion	 condition)	 will	 likely	 make	 the	 task	 of	
‘telling	 the	 faces	 together’	 harder	 still.	 Therefore	 I	 predict	 that	 trials	 involving	 evasion	
disguise	will	be	more	error	prone	than	impersonation	disguise	trials.		
	

	

165	
	

	
	
I	will	also	investigate	whether	there	is	an	effect	of	impersonation	type,	i.e.	is	it	easier	for	
someone	 to	 pass	 themselves	 off	 as	 being	 a	 person	 who	 looks	 naturally	 similar	 in	
appearance	 to	 them	 or	 are	 they	 equally	 effective	 at	 looking	 like	 somebody	 who	 is	
selected	at	random?	Past	research	suggests	that	impersonation	similar	disguise	will	cause	
more	errors	than	impersonation	random	disguise.	Light,	Kayra-Stuart	&	Hollander	(1979)	
found	that	people	were	worse	at	remembering	faces	that	looked	similar	to	a	prototype	
face	than	those	that	looked	unusual	in	appearance,	suggesting	a	role	of	distinctiveness.	I	
thereby	expect	that	impersonation	random	disguises	will	lead	to	fewer	errors	in	the	facematching	task	than	impersonation	similar	disguises.		
	
Each	 of	 these	 questions	 will	 also	 be	 addressed	 with	 regard	 to	 familiarity	 with	 the	 faces	
viewed.	 As	 discussed	 in	 the	 familiarity	 section,	 based	 on	 theories	 of	 face	 learning,	
familiarity	is	not	guaranteed	to	help	in	case	of	disguise.	Familiar	viewers	will	however	be	
used	to	viewing	the	models’	faces	over	a	wider	range	of	appearances	than	our	unfamiliar	
viewers.	Therefore	familiar	viewers	are	predicted	to	be	better	than	unfamiliar	viewers	at	
the	face-matching	task,	although	familiar	viewers	may	also	be	somewhat	affected	by	the	
disguise	manipulations.	In	total	I	will	conduct	three	disguise	face-matching	experiments,	
each	 of	 which	 manipulate	 viewer	 group	 but	 will	 also	 address	 each	 of	 the	 research	
questions	here	outlined.	Experiment	10	will	test	face-matching	performance	of	unfamiliar	
viewers,	Experiment	11	will	test	performance	of	unfamiliar	viewers	who	are	informed	of	
the	 disguise	 manipulation	 and	 finally	 Experiment	 12	 will	 access	 face-matching	
performance	for	familiar	viewers.		
	

5.5 Experiment	10:	Unfamiliar	Viewers		
Experiment	 10	 examines	 face-matching	 performance	 for	 unfamiliar	 viewers	 on	 a	 task	
involving	disguised	and	undisguised	image	pairs.	The	experiment	will	use	the	images	from	
the	 disguise	 face	 database,	 therefore	 allowing	 an	 investigation	 of	 face-matching	
performance	 accuracy	 for	 evasion,	 impersonation	 similar	 and	 impersonation	 random	
disguise.		

	

166	
	

	
	
	
I	 predict	 that	 disguise	 will	 impair	 face-matching	 performance,	 with	 evasion	 disguise	
making	face	pairs	most	difficult	to	correctly	identify,	and	impersonation	disguise	will	also	
cause	difficulties	but	to	a	lesser	degree.	I	believe	impersonation	similar	disguises	will	be	
easier	to	execute	than	impersonation	random	disguises,	hence	more	errors	will	be	made	
when	matching	impersonation	similar	than	impersonation	random	faces.	
	
Method	
Participants	
30	 undergraduate	 students	 (M	 =	 8,	 mean	 age	 =	 23)	 from	 the	 University	 of	 York	
volunteered	as	participants	in	this	study	in	return	for	payment	or	course	credits.		
	
Design	&	Stimuli	
The	 experiment	 took	 the	 form	 of	 a	 2x3	 within	 subject	 design,	 with	 factors	 Disguise	
Condition	(levels:	Disguise	and	No	Disguise)	and	Pair	Type	(levels:	same,	different	similar	
and	 different	 random).	 The	 dependent	 variable	 was	 performance	 accuracy	 in	 the	 facematching	task	for	each	of	the	independent	variables	listed	above.			
	
Stimuli	were	the	face	image	pairs	from	the	FAÇADE	database.	Stimuli	were	kept	constant	
across	all	experiments	in	this	chapter.		
	
Procedure	
Participants	 viewed	 the	 image	 pairs	 from	 the	 FAÇADE	 face	 database	 on	 a	 computer	
screen.	The	viewing	distance	was	50cm	from	the	screen.	Image	pairs	were	presented	one	
at	a	time	as	part	of	a	self	paced	face-matching	task.	The	participants’	task	was	to	decide,	
for	 each	 image	 pair,	 whether	 the	 two	 images	 were	 of	 the	 same	 person’s	 face	 or	 were	
	

167	
	

	
	
images	of	two	different	people		–	pressing	‘z’	on	the	keyboard	for	same	pairs,	and	‘m’	for	
different.	 All	 image	 pairs	 were	 presented	 in	 a	 randomized	 order	 and	 over	 the	 whole	
experiment	the	participants	saw	each	of	the	models	in	each	of	the	disguise	conditions.	In	
total	participants	viewed	156	image	pairs.		
	
It	was	important	to	check	that	all	items	in	the	task	were	unfamiliar	to	the	participants	in	
this	experiment.		Following	the	face-matching	task,	participants	were	given	the	reference	
images	which	they	had	viewed	in	the	task,	and	asked	to	indicate	how	familiar	they	were	
with	 the	 person’s	 face	 before	 the	 experiment.	 Familiarity	 was	 measured	 using	 the	
familiarity	 scale	 that	 I	 designed	 in	 Chapter	 1	 of	 this	 thesis.	 Participants	 placed	 the	 face	
cards	on	the	familiarity	scale	that	ran	across	the	desk,	indicating	familiarity	on	the	scale	
from	0	(completely	unfamiliar)	to	100	(extremely	familiar).		
	
Result	
Participants’	responses	from	the	face-matching	task	were	broken	down	into	each	of	the	
Disguise	 Conditions	 and	 Pair	 Types,	 with	 a	 mean	 score	 of	 performance	 accuracy	
calculated	 for	 each.	 These	 means	 were	 then	 combined	 across	 all	 participants	 and	
averaged	to	calculate	the	overall	mean	for	each	condition.	These	breakdowns	of	scores	by	
disguise	condition	and	disguise	type	allowed	the	research	questions	of	whether	there	was	
an	 overall	 effect	 of	 disguise	 on	 face	 recognition	 accuracy,	 and	 whether	 disguise	 type	
influence	accuracy,	to	be	answered.		
	

	

168	
	

	
	
Performance	Accuracy	for	Unfamiliar	Viewers	
100	

Accuracy	(%)	

90	

80	

No	Disguise	
Disguise	

70	

60	

50	
Same	

Different	Similar	

Different	Random	

Pair	Type	

	
Figure	 5.10	 Performance	 accuracy	 for	 unfamiliar	 viewers	 for	 evasion,	 impersonation	 similar	 and	
impersonation	random	pairs	when	the	images	consisted	of	no	disguise	or	disguise	pairs.	Error	bars	show	the	
standard	error	of	the	mean.			

	
A	within	subjects	ANOVA	with	factors	of	Disguise	Condition	and	Pair	Type	was	conducted	
to	find	out	whether	disguise	presence	affected	face-matching	performance	and	whether	
disguise	Pair	Type	affected	matching	accuracy	(see	Figure	5.10).	As	predicted,	there	was	a	
significant	 main	 effect	 of	 Disguise	 Condition.	 Participants	 performed	 more	 poorly	 for	
Disguise	 (M	 =	 77.39	 %	 accuracy,	 SD	 =	 21.27)	 than	 No	 Disguise	 face	 pairs	 (M	 =	 95.26%	
accuracy,	SD	=	5.97)	[F(1,	29)	=	75.88,	p<.001,	 η p2 =	.72].	There	was	also	a	significant	main	
effect	 of	 Pair	 Type	 (F(2,29)	 =	 	 22.87,	 p<.001,	 η p2 =	 .44]	 and	 a	 significant	 interaction	
between	Disguise	Condition	and	Pair	Type	[F(2,29)	=	37.95,	p<.001,	 η p2 =,57].	This	showed	
that	 both	 the	 presence	 of	 disguise	 and	 also	 the	 type	 of	 trial	 affected	 performance	
accuracy	 rates,	 but	 accuracy	 was	 not	 influenced	 in	 the	 same	 way	 by	 Pair	 Type	 (Same,	
Different	similar,	or	Different	random)	in	each	of	the	disguise	conditions.	
	

169	
	

	
	
	
Simple	main	effects	were	conducted	to	find	out	where	the	significant	difference	lay.	Pair	
Type	 was	 significant	 for	 Disguised	 [F(2,116)	 =	 53.79,	 p<.001,	 η p2 =	 .48]	 but	 not	 for	 No	
Disguise	pairs	[F(2,116)	=	2.03,	p>.05,	 η p2 =.03],	demonstrating	that	participants	showed	a	
difference	in	their	accuracy	of	matching	across	different	pair	types	only	for	the	disguise	
present	 faces.	 Disguise	 presence	 significantly	 affected	 performance	 for	 each	 of	 the	 Pair	
Types;	 Same	 identity	 pairs	 [F(1,87)	 =	 151.5,	 p<.001,	 η p2 =	 .64],	 Different	 (Similar	
appearance)	 identity	 pairs	 [F(1,87)	 =	 12.78,	 p<.005,	 η p2 =	 .13]	 and	 Different	 (Randomly	
matched)	pairs	[F(1,87)=	9.19,	p<.005,	 η p2 =	.1].		
	
Tukey	 tests	 revealed	 that	 there	 were	 significant	 differences	 between	 each	 level	 of	 trial	
type	 for	 the	 disguise	 present	 faces.	 Same	 identity	 pairs	 (represented	 on	 the	 graph	 as	
Same,	 condition	 Disguise)	 caused	 most	 errors	 (M	 =	 60.38%	 accuracy,	 SD	 =	 25.69),	 then	
Different	Similar	identity	pairings	(those	on	the	graph	shown	as	Different,	and	condition	
Disguise)	(M	=	82.18%	accuracy,	SD	=	12.8),	and	finally	Different	Random	identity	pairings	
(M	=	89.62%	accuracy,	SD	=	9.32).	
	
Discussion		
This	 experiment	 found	 that	 disguise	 did	 significantly	 impair	 face-matching	 performance	
for	 unfamiliar	 viewers.	 Therefore,	 in	 relation	 to	 my	 first	 research	 question,	 and	 in	
accordance	with	the	hypothesis,	matching	disguised	faces	is	more	difficult	than	matching	
faces	 that	 are	 not	 in	 disguise.	 My	 second	 research	 question,	 which	 asked	 whether	 all	
disguise	manipulations	cause	equal	impairments	to	face	recognition,	is	also	addressed	by	
this	experiment.	The	results	were	again	as	predicted,	with	evasion	disguise	affecting	facematching	accuracy	more	than	impersonation	disguise.	With	regards	to	research	question	
three,	 there	 was	 also	 a	 significant	 effect	 within	 impersonation	 itself.	 Unfamiliar	 viewers	
made	 more	 matching	 errors	 when	 the	 Impersonations	 involved	 someone	 who	 was	 of	 a	
similar	appearance	to	the	model	than	when	the	impersonation	pairings	were	at	random.		
	

170	
	

	
	
The	 results	 from	 this	 experiment	 will	 later	 help	 to	 answer	 research	 question	 four,	
regarding	 matching	 accuracy	 and	 familiarity,	 providing	 data	 from	 unfamiliar	 viewers	
which	can	be	compared	with	that	of	familiar	viewers	when	collected.		
	
These	findings	using	the	FAÇADE	database	address	the	issue	of	disguise	face-matching	in	
more	detail	than	any	previous	study.	However,	I	want	to	continue	to	understand	disguise	
even	 further.	 I	 found	 that	 face-matching	 performance	 was	 poor	 for	 disguised	 faces	
compared	 to	 faces	 that	 were	 not	 in	 disguise,	 and	 although	 difficult	 to	 compare	 directly	
with	 other	 studies,	 performance	 for	 the	 disguised	 faces	 in	 my	 task	 was	 poorer	 than	
performance	on	the	full	version	of	the	GFMT	-	mean	performance	accuracy	was	77%	for	
the	 disguised	 face	 pairs,	 whereas	 performance	 on	 the	 full	 version	 of	 the	 GFMT	 is	 87%.		
Performance	 may	 have	 been	 particularly	 poor	 in	 my	 experiment	 because	 viewers	 were	
unaware	that	faces	they	viewed	may	have	been	disguised.	It	is	possible	that	knowledge	of	
the	possibility	of	disguise	presence	could	improve	face-matching	performance.		
	

5.6 Experiment	11:	Unfamiliar	(Informed)	Viewers	
In	 Experiment	 10	 unfamiliar	 participants’	 face-matching	 performance	 was	 impaired	 by	
disguise	presence.	Making	accurate	identity	matching	judgments	for	the	disguised	faces	
was	a	difficult	task,	but	it	is	possible	that	this	task	was	difficult	because	participants	were	
not	 expecting	 to	 see	 faces	 that	 were	 disguised.	 Expectations	 have	 been	 reported	 to	
influence	 performance	 accuracy	 in	 visual	 search	 and	 other	 cognitive	 tasks	 (e.g	 Alain	 &	
Proteau,	1980;	Kastner,	Pinsk,	De	Weerd,	Desimone	&	Ungerleider,	1999).	If	participants	
are	told	about	the	disguise	aspect	of	the	study	then	performance	may	improve.	This	will	
now	be	tested	in	Experiment	12.		
	
If	poor	performance	from	Experiment	10	is	because	participants	did	not	know	to	look	for	
disguises,	their	performance	accuracy	should	be	far	better	in	Experiment	11	than	it	was	in	
Experiment	10.	If	performance	accuracy	for	disguised	faces	is	no	better	when	participants	

	

171	
	

	
	
have	this	knowledge,	then	being	naïve	to	the	disguise	component	is	not	what	makes	the	
task	difficult.		
	
The	research	questions	remain	the	same	as	in	Experiment	10,	but	this	time	I	investigate	
whether	the	results	of	Experiment	10	remain	when	participants	are	informed	of	disguise.		
	
Method	
Participants	
Thirty	 undergraduate	 students	 from	 the	 University	 of	 York	 who	 had	 not	 taken	 part	 in	
previous	experiments	involving	the	FAÇADE	database,	volunteered	as	participants	for	this	
experiment	(mean	age	=	21,	M	=	11).		
	
Design	&	Stimuli	
The	 experiment	 took	 the	 form	 of	 a	 2x3	 within-subject	 design,	 with	 factors	 Disguise	
Condition	(levels:	Disguise	and	No	Disguise)	and	Pair	Type	(levels:	Same,	Different	Similar	
and	 Different	 Random).	 The	 dependent	 variable	 was	 performance	 accuracy	 in	 the	 facematching	 task	 for	 each	 of	 the	 independent	 variables	 listed	 above.	 This	 will	 enable	 the	
research	questions	outlined	in	the	introduction	to	be	addressed.		
	
To	 test	 whether	 performance	 differs	 for	 disguised	 faces	 between	 Experiment	 10	 and	
Experiment	11,	a	between	subjects	design	will	be	used.			
	
Stimuli	were	the	face	image	pairs	from	the	FAÇADE	database.	Stimuli	are	kept	constant	
across	all	experiments	in	this	chapter.		
	

	

172	
	

	
	
Procedure	
Experiment	 11	 was	 carried	 out	 exactly	 like	 Experiment	 10,	 the	 only	 difference	 to	 the	
method	 being	 that	 this	 time	 participants	 were	 made	 aware	 before	 beginning	 the	
experiment	that	some	of	the	face	images	they	view	may	be	disguised	to	either	look	unlike	
themselves	 or	 to	 look	 like	 another	 person.	 It	 was	 clearly	 stated	 that	 the	 face-matching	
decision	 always	 concerned	 an	 identity	 judgment	 based	 on	 the	 true	 identity	 of	 the	 face.	
Participants	were	to	decide	if	the	images	they	saw	were	really	images	of	the	same	person,	
or	if	they	were	images	of	two	different	people.		
	
Results	
Results	 were	 cacluated	 exactly	 as	 in	 Experiment	 10,	 with	 mean	 performance	 accuracy	
broken	 down	 by	 Disguise	 Condtion	 and	 Pair	 Type.	 In	 order	 to	 answer	 the	 research	
questions	regarding	an	overall	effect	of	disguise	(Disguise	Condition),	possible	differences	
between	Pair	Type,	a	within	subjects	ANOVA	was	conducted.		
	

	

173	
	

	
	

Figure	5.11	Performance	accuracy	of	unfamiliar	viewers	who	were	aware	of	the	disguise	component	of	the	
face-matching	task.	Error	bars	show	the	standard	error	of	the	mean.			

	
A	 significant	 main	 effect	 was	 found	 for	 Disguise	 Condition	 -	 participants	 were	 worse	 at	
matching	 Disguised	 faces	 (M	 =	 80.51,	 SD	 =	 18.68)	 than	 faces	 with	 No	 Disguise	 (M	 =	
95.30%	accuracy,	SD	=	6.62)	[F(1,29)=	44.25,	p<.001,	 η p2 =	.6].	There	was	also	a	significant	
difference	 main	 effect	 of	 Pair	 Type	 [F(2,29)=	 21.39,	 <.001,	 η p2 =	 .42]	 and	 a	 significant	
interaction	 between	 Disguise	 Condition	 and	 Pair	 Type	 [F(2,29)	 =	 55.52,	 <.001,	 η p2 =	 .66]	
(see	Figure	5.11).			
	
Simple	 main	 effects	 highlighted	 there	 was	 a	 significant	 effect	 of	 Disguise	 Condition	 on	
performance	 for	 each	 of	 the	 Pair	 Types	 (same	 [F(1,87)	 =	 119.56,	 p<.001, η p2 =	 .58],	
Different	 Similar	 [F(1,87)	 =	 9.45,	 p<.005,	 η p2 =.1]	 and	 Different	 Random	 [F(1,87)	 =	 11.74,	
p<.001,	 η p2 =	.12]).	There	was	also	a	significant	main	effect	of	Pair	Type	for	both	Disguised	
	

174	
	

	
	
[F(2,116)	 =	 56.21,	 p<.001,	 η p2 =	 .49]	 and	 No	 Disguise	 faces	 [F(2,116)	 =	 3.64,	 p<.01,	 η p2 =	
.06].		
	
Tukey	tests	revealed	that	there	was	a	significant	difference	in	performance	between	each	
level	of	Pair	Type	for	the	Disguise	faces.	Same	identity	pairs	(represented	on	the	graph	as	
Same,	 condition	 Disguise)	 caused	 most	 errors	 (M	 =	 67.60%	 accuracy,	 SD	 =	 20.01),	
followed	by	Different	Similar	identity	pairings	(those	on	the	graph	shown	as	Different,	and	
condition	 Disguise)	 (M	 =	 84.49%	 accuracy,	 SD	 =	 15.73),	 and	 finally	 Different	 Random	
identity	 pairings	 (M	 =	 89.36%	 accuracy,	 SD	 =	 10.09).	 For	 No	 Disguise	 faces	 the	 only	
significant	 differences	 lay	 between	 performance	 accuracy	 for	 Different	 Similar	 (M	 =	
92.31%	accuracy,	SD	=	8.86)	and	Different	Random	face	pairs	(M	=	98.08%	accuracy,	SD	=	
3.46)	
	
In	 order	 to	 test	 whether	 performance	 is	 better	 if	 participants	 know	 of	 the	 possibility	 of	
disguise	manipulations	than	when	they	are	naïve	to	the	disguise	component,	I	carried	out	
a	 between	 subjects	 ANOVA	 to	 compare	 performance	 accuracy	 for	 disguised	 faces	 in	
Experiment	 10	 with	 that	 of	 Experiment	 11.	 Interestingly	 participants	 in	 Experiment	 11,	
who	 knew	 that	 some	 of	 the	 faces	 in	 the	 experiment	 were	 disguised	 to	 not	 look	 like	
themselves	 or	 to	 look	 like	 a	 specific	 other	 person,	 were	 no	 better	 at	 the	 face-matching	
task	with	this	knowledge	of	the	disguises,	than	participants	in	Experiment	10	who	were	
not	informed	of	the	disguise	manipulations	before	the	experiment	F(1,58)	=	.625,	p	=	.43,	

η p2 =.01.	Thus,	knowing	to	look	out	for	disguise	does	not	help	to	see	through	disguise.			
	
Discussion		
The	same	pattern	of	results	was	seen	for	each	of	the	research	questions	for	Experiment	
11	(when	participants	were	aware	of	the	disguise	element)	as	for	Experiment	10	(when	
participants	 were	 not	 aware	 that	 faces	 could	 be	 in	 disguise).	 Experiment	 11	 found	 that	
informed	viewers	were	significantly	worse	at	matching	disguise	face	pairs	than	faces	that	

	

175	
	

	
	
were	 not	 in	 disguise,	 evasion	 caused	 more	 difficulty	 than	 impersonation	 disguise,	 and	
impersonation	similar	trials	were	harder	than	impersonation	random	pairs.			
	
The	unique	research	question	for	Experiment	11	was	whether	performance	accuracy	for	
disguised	faces	would	be	better	than	it	was	in	Experiment	10.		No	significant	difference	
was	found	in	performance	for	matching	disguised	face	pairs	in	each	of	these	experiments,	
therefore	knowledge	of	disguise	does	not	make	the	task	easier.	Alternatively,	it	is	possible	
that	 participants	 in	 Experiment	 10	 may	 have	 figured	 out	 the	 disguise	 manipulation	 for	
themselves	 whilst	 doing	 the	 task,	 either	 way	 performance	 remains	 poor	 for	 disguised	
faces	in	both	the	experiments,	suggesting	that	matching	disguised	faces	is	an	extremely	
challenging	task	regardless	of	holding	knowledge	that	disguises	may	be	present	amongst	
the	image	pairs.		
	

5.7 Experiment	12:	Familiar	Viewers	
There	 is	 a	 huge	 distinction	 in	 the	 face	 recognition	 literature	 between	 face-matching	
performance	 for	 familiar,	 compared	 with	 unfamiliar	 viewers.	 Familiar	 viewers	 perform	
with	impressive	levels	of	accuracy	in	matching	tasks	even	when	the	images	available	for	
comparison	 are	 of	 poor	 quality	 (Burton	 et	 al.	 1999).	 Chapters	 2	 and	 4	 show	 further	
situations	 where	 familiarity	 improved	 performance	 in	 challenging	 face-matching	 tasks.	 I	
now	 test	 familiarity	 on	 perhaps	 our	 most	 challenging	 task	 to	 date	 –	 the	 situation	 of	
deliberately	 disguised	 faces.	 Can	 familiarity	 with	 a	 face	 help	 us	 see	 through	 deliberate	
disguise	and	hence	make	the	viewer	immune	to	deception	of	disguise?	
	
Jenkins	et	al.	(2011)	highlight	that	familiar	viewers	are	able	to	easily	see	through	natural	
changes	 in	 a	 face.	 The	 distinction	 between	 familiar	 and	 unfamiliar	 face	 recognition	
performance	 may	 be	 due	 to	 experience	 of	 variability	 in	 photos	 of	 the	 same	 face.	 They	
argue	 that	 when	 someone	 is	 becoming	 familiar	 with	 a	 face	 they	 are	 learning	 all	 of	 the	
different	appearances,	which	that	face	can	take.	Any	face	that	falls	within	the	excepted	

	

176	
	

	
	
range	of	faces	for	a	specific	person	will	be	considered	a	match	for	that	identity,	any	image	
which	 does	 not	 fall	 within	 the	 range	 of	 accepted	 appearances	 for	 a	 face	 will	 not	 be	
considered	 a	 match.	 It	 is	 believed	 that	 the	 range	 of	 accepted	 faces	 for	 a	 given	 face	 is	
smaller	 for	 unfamiliar	 viewers	 and	 larger	 for	 familiar	 viewers	 who	 have	 had	 more	
experience	 with	 the	 face.	 Therefore	 the	 idea	 is	 that	 the	 acceptable	 range	 of	 facial	
appearances	for	a	given	person	becomes	more	finely	tuned	as	their	face	is	learned.	This	
explains	 why	 familiar	 viewers	 can	 see	 through	 changes	 in	 viewpoint,	 expression	 and	
lighting,	whereby	unfamiliar	viewers	can	be	impaired	by	trivial	image	changes	-	they	do	
not	have	enough	experience	with	the	face	to	accurately	know	what	appearance	the	face	
would	take	under	these	changes	(Bruce	1982;	Adini,	Mosses	&	Ullman,	1997).	However,	
this	theory	does	not	necessarily	cover	the	situation	of	disguise.		
	
Presuming	that	the	viewer	has	not	had	prior	exposure	to	an	individual	in	their	disguised	
form,	it	is	unknown	whether	the	previous	exposure	they	have	had	with	the	face	would	be	
sufficient	to	also	help	the	viewer	see	through	a	disguised	version	of	the	face.	Essentially	
this	is	the	challenge	for	creating	a	convincing	disguise.	For	an	evasion	disguise	to	fool	a	
familiar	viewer	it	must	change	the	appearance	of	the	model	so	that	the	new	appearance	
falls	 out	 with	 the	 accepted	 range	 of	 appearances	 held	 by	 the	 viewer	 for	 that	 model.	
Impersonation	 disguises	 must	 move	 outwith	 the	 accepted	 range	 of	 the	 model	 and	 into	
the	 accepted	 range	 of	 the	 reference	 photograph	 in	 order	 to	 cause	 a	 familiar	 viewer	 to	
make	 face-matching	 errors.	 If	 the	 models	 have	 been	 effective	 in	 doing	 this,	 familiarity	
with	 a	 face	 may	 not	 help	 a	 great	 deal,	 as	 the	 face	 would	 be	 taken	 outwith	 the	 familiar	
viewers	area	of	expertise.		
	
There	 are	 essentially	 two	 strands	 of	 research	 questions	 regarding	 familiarity.	 The	 first	
includes	the	familiar	participants	alone,	and	asks	whether	familiar	viewers	are	able	to	see	
through	disguise	manipulations.	This	relates	to	the	first	three	research	questions	laid	out	
in	 the	 introduction.	 If	 familiarity	 allows	 participants	 to	 see	 through	 disguise,	 then	 there	
would	be	no	difference	in	performance	accuracy	for	disguised	and	no	disguise	face	pairs.	
If	however,	familiar	viewers	are	worse	at	matching	disguised	face	pairs	than	undisguised	
	

177	
	

	
	
face	pairs,	then	even	familiar	viewers	are	affected	by	an	overall	disguise	manipulation.	It	
is	again	possible	that	some	types	of	disguise	may	affect	performance	more	than	others.	
For	 familiar	 viewers	 I	 predict	 that	 impersonation	 disguise	 will	 not	 be	 as	 effective	 as	
evasion	disguise.	To	identify	faces	in	impersonation	disguises,	participants	can	use	both	of	
the	faces	in	the	pair	to	look	for	identity	cues,	as	they	are	familiar	with	both	of	the	faces.	
Therefore	 a	 model’s	 task	 is	 harder	 as	 they	 have	 to	 look	 both	 unlike	 themselves	 and	
convincingly	 like	 the	 person	 they	 are	 trying	 to	 impersonate,	 whereas	 evasion	 involves	
moving	 from	 only	 the	 accepted	 range	 for	 a	 person’s	 own	 appearance,	 which	 can	 be	
achieved	in	many	different	ways.		
	
The	second	strand	will	answer	research	question	number	four,	of	whether	familiarity	with	
a	face	aids	disguised	face-matching.	This	will	involve	a	comparison	of	results	for	disguised	
face-matching	performance	across	all	of	the	Experiments	conducted	in	this	chapter,	thus	
comparing	 familiar	 viewers	 with	 unfamiliar	 viewers.	 As	 familiar	 viewers	 have	 more	
experience	than	unfamiliar	viewers	with	the	faces	in	the	FAÇADE	database,	I	predict	that	
participants	 will	 be	 significantly	 better	 than	 the	 unfamiliar	 viewers	 at	 matching	 disguise	
faces,	 simply	 because	 they	 have	 seen	 the	 faces	 in	 a	 wider	 range	 of	 appearances,	 and	
therefore	may	be	better	able	to	see	through	some	aspects	of	the	disguises.		
	
Methods	
Participants	
The	participants	in	this	study	were	30	individuals	who	were	colleagues	and	friends	of	the	
disguise	models	(M	=	14,	mean	age	=	27).	Participants	saw	the	models	on	almost	a	daily	
basis;	therefore	they	were	of	high	levels	of	familiarity	with	the	models’	faces.		
	
	
	

	

178	
	

	
	
Design	&	Stimuli		
The	 experiment	 took	 the	 form	 of	 a	 2x3	 within	 subjects	 design,	 with	 factors	 Disguise	
Condition	(levels:	Disguise	and	No	Disguise)	and	Pair	Type	(levels:	Same,	Different	Similar	
and	 Different	 Random).	 The	 dependent	 variable	 was	 performance	 accuracy	 in	 the	 facematching	 task	 for	 each	 of	 the	 independent	 variables	 listed	 above.	 This	 will	 enable	 the	
research	questions	outlined	in	the	introduction	to	be	addressed.		
	
To	 test	 whether	 face-matching	 performance	 for	 disguised	 faces	 differs	 as	 a	 result	 of	
familiarity,	 a	 between	 experiment	 analysis	 will	 be	 conducted	 comparing	 the	 results	 of	
Experiment	10	and	Experiment	11,	with	those	of	Experiment	12.	
	
Procedure		
Procedure	was	as	in	Experiment	10.		
	
Results	
A	2x3	ANOVA	was	conducted	to	investigate	the	effect	of	Disguise	Condition	and	Pair	Type	
on	familiar	participants’	face-matching	performance.		

	

179	
	

	
	

Figure	5.12	Performance	accuracy	in	the	face-matching	task	for	viewers	who	were	familiar	with	the	models	
whose	images	featured	in	the	task.	Error	bars	show	the	standard	error	of	the	mean.			
	

A	 significant	 main	 effect	 of	 both	 Disguise	 Condition	 (No	 Disguise	 or	 Disguise)	 [F(2,58)	 =	
20.01,	p	<.001,		 η p2 	=	.41]	and	Pair	Type	(Same,	Different	Similar	and	Different	Random)	
[F(1,29)	=	24.99,	p	<.001,	 η p2 =	.46]	were	observed.	This	shows	that	familiar	viewers	were	
worse	at	matching	disguised	faces	than	faces	that	were	not	in	disguise,	and	the	type	of	
disguise	 made	 a	 difference	 to	 performance	 accuracy.	 There	 was	 also	 a	 significant	
interaction	between	Disguise	Condition	and	Pair	Type	[F(2,	58)	=	22.23,	p	<.001,	 η p2 	=	.43],	
meaning	that	participants	were	not	equally	affected	by	disguise	presence	for	all	disguise	
pair	types	(see	Figure	5.12).		
	
To	understand	where	disguise	caused	impairments	to	face-matching	performance	simple	
main	effects	were	calculated.	These	revealed	that	the	only	significant	differences	for	Pair	
	

180	
	

	
	
Type	 were	 for	 the	 disguised	 faces	 [F	 (2,116)	 =	 40.97,	 p<.001,	 η p2 	 =	 .41),	 and	 not	 for	 No	
Disguise	 faces,	 p>.05.	 Specifically,	 Tukey	 tests	 revealed	 that	 familiar	 viewers	 were	
significantly	 worse	 at	 matching	 disguised	 evasion	 (Same	 identity)	 faces	 than	
impersonation	 (Different	 identity)	 pairs	 (M	 =	 86.67%	 accuracy,	 SD	 =	 13.54),	 with	 no	
significant	 difference	 in	 performance	 levels	 for	 matching	 Impersonation	 Similar	 (M	 =	
97.69%	 accuracy,	 SD	 =	 5.59)	 compared	 with	 Impersonation	 Random	 pairs	 (M	 =	 97.82%	
accuracy,	SD	=	4).			

Figure	5.13	Graph	showing	performance	accuracy	for	Disguise	face	pairs	for	each	of	the	3	Experiments:	U	
informed	(Experiment	10),	U	uninformed	(Experiment	11),	Familiar	(Experiment	12).		

	
The	effect	of	Familiarity	on	performance	accuracy	for	disguised	faces	can	be	examined	by	
comparing	results	across	the	three	experiments	(Experiment	10	-	Unfamiliar,	Experiment	
11	 -	 Unfamiliar	 informed	 and	 Experiment	 12	 -	 Familiar)	 (see	 figure	 5.12).	 Comparing	
performance	of	all	experiments	on	disguised	face	pair	types	showed	a	strong	significant	
main	effect	of	Experiment,	F(2,	87)	=	19.15,	p<.001,	 η p2 =.31.		
	

	

181	
	

	
	
	
Pairwise	comparisons	revealed	that	Familiar	viewers	were	significantly	better	at	matching	
disguised	 faces	 than	 both	 Unfamiliar	 viewers	 (Experiment	 10)	 and	 Unfamiliar	 Informed	
viewers	(Experiment	11)	and	reiterate	no	difference	in	performance	accuracy	in	matching	
disguised	 faces	 between	 Unfamiliar	 and	 Unfamiliar	 Informed	 viewers.	 The	 mean	
difference	 in	 performance	 accuracy	 between	 Familiar	 viewers	 and	 Unfamiliar	 viewers	
(Experiment	 10)	 was	 17%,	 CI	 =	 10.8	 –	 22.53,	 p<.001.	 And	 between	 the	 Familiar	 viewers	
and	 Unfamiliar	 Informed	 viewers	 (Experiment	 11)	 this	 was	 13.5%,	 CI	 =	 7.68	 –	 19.41,	
p<.001.	 Mean	 difference	 in	 performance	 between	 the	 two	 unfamiliar	 groups	 on	 the	
disguise	present	matching	trials	was	3%,	this	was	not	a	significant	difference,	p>.05.		
	
Familiarity	Score	Comparisons	
In	 each	 of	 the	 experiments	 in	 this	 chapter,	 participants	 indicated	 their	 familiarity	 with	
each	of	the	faces	included	in	the	face-matching	task	after	they	had	completed	the	facematching	 task.	 Scores	 were	 based	 on	 familiarity	 prior	 to	 the	 experiment.	 I	 compared	
mean	 familiarity	 score	 with	 the	 face	 items,	 for	 each	 participant,	 across	 the	 three	
experiments	using	a	one-way	between	subjects	ANOVA.	The	ANOVA	showed	a	significant	
difference	in	familiarity	between	the	groups	[F(2,87)	=314.43,	p<.001,	CI	=	22.06	–	37.08.			
Post	hoc	tests	revealed	that	the	familiar	viewers	were	significantly	more	familiar	with	the	
faces	(M	=	75.56	familiarity	rating)	than	both	the	unfamiliar	viewers	(M	=	6.81	familiarity	
rating,	mean	difference	=	68.75,	SE	=	3.19,	CI	=	60.94	–	76.55,	p<.001)	and	the	unfamiliar	
informed	 viewers	 (M	 =	 4.69	 familiarity	 rating,	 mean	 difference	 =	 70.87,	 SE	 =	 3.25,	 CI	 =	
62.92	 –	 78.81,	 p<.001).	 There	 was	 no	 significant	 difference	 in	 the	 familiarity	 ratings	 for	
the	two	unfamiliar	groups	(mean	difference	=	2.12,	SE	=	3.25,	CI	=	-5.82	–	10.06,	p>.05).		
	
Discussion		
With	 reference	 to	 research	 question	 1,	 which	 posed	 the	 question	 of	 whether	 disguise	
impairs	face-matching	performance,	familiar	viewers	were	significantly	worse	at	matching	

	

182	
	

	
	
disguised	face	pairs	than	faces	that	were	not	in	disguise.	Whereas	previous	face-matching	
tasks	 find	 ceiling	 performance	 levels	 for	 familiar	 viewers,	 in	 the	 case	 of	 disguised	 faces,	
familiarity	 can	 not	 completely	 overturn	 the	 effect	 of	 disguise.	 Research	 question	 2,	
whether	 all	 disguise	 types	 are	 equally	 challenging,	 is	 particularly	 relevant	 in	 the	 case	 of	
familiar	 viewers.	 The	 familiar	 viewers	 were	 worse	 for	 disguised	 faces,	 only	 when	 these	
were	evasion	faces,	with	no	significant	difference	in	performance	between	disguise	and	
no	 disguise	 recognition	 for	 impersonation	 faces.	 Question	 3	 considers	 differences	
between	 types	 of	 impersonation	 disguise.	 Unlike	 in	 the	 previous	 experiments,	
impersonation	pair	type	did	not	matter	–	familiar	viewers	could	see	through	both	types	of	
impersonation	disguise	equally	well.		
	
Research	 question	 4,	 searching	 for	 an	 effect	 of	 familiarity,	 was	 also	 answered	 by	 this	
experiment,	 in	 the	 between	 experiment	 analysis.	 Familiarity	 improved	 face-matching	
performance	 for	 disguised	 faces	 compared	 to	 that	 of	 unfamiliar	 viewers.	 Familiar	
participants	 (Experiment	 12)	 were	 15%	 better	 than	 unfamiliar	 participants	 (averaged	
result	 of	 Experiment	 10	 &	 Experiment	 11)	 at	 correctly	 identifying	 same	 person	 and	
different	person	disguise	image	pairs.	Familiar	viewers	were	11%	better	than	the	average	
result	 of	 the	 two	 unfamiliar	 viewer	 experiments	 at	 matching	 impersonation	 faces	 and	
23%	better	at	matching	evasion	face.	These	results	suggest	that	when	identity	judgments	
involving	 disguise	 need	 to	 be	 made,	 recruiting	 familiar	 viewers	 to	 make	 the	 identity	
judgment	would	likely	lead	to	a	more	accurate	judgment	of	identity.			
	
These	 findings	 fit	 with	 my	 theoretical	 interpretations	 of	 both	 familiarity	 and	 effective	
disguise.		It	has	been	argued	that	when	people	become	familiar	with	a	face	what	they	are	
learning	is	all	the	different	ways	that	that	face	can	look	(Jenkins	et	al.,	2007).	For	familiar	
viewers	then,	they	will	be	familiar	with	a	far	greater	range	of	appearances	that	any	one	of	
our	models’	face	can	take,	than	the	unfamiliar	viewer	group	will	be.	This	may	mean	that	
some	of	the	changes	to	appearance,	perhaps	expression	and	pose	changes	for	instance,	
are	learnt	appearances	of	the	face	in	question	for	the	familiar	viewers,	and	hence	would	
not	 disguise	 the	 face	 as	 it	 may	 for	 unfamiliar	 viewers.	 Previous	 exposure	 of	 familiar	
	

183	
	

	
	
viewers	to	some	of	the	facial	changes	used	by	the	models	to	achieve	their	disguises,	likely	
explains	 some	 of	 the	 difference	 in	 performance	 between	 familiar	 and	 unfamiliar	 viewer	
groups.	Any	experience	that	a	familiar	viewer	had	with	any	of	these	model’s	appearance	
changes	would	be	have	already	been	stored	within	the	accepted	range	of	appearances	for	
that	 model’s	 face.	 Thus,	 any	 variation,	 which	 a	 familiar	 viewer	 had	 been	 previously	
exposed	to	for	the	faces	in	disguise,	would	not	act	as	an	effective	disguise	manipulation	
for	a	familiar	viewer,	but	it	would	for	an	unfamiliar	viewer.	
	
Although	familiar	viewers	were	somewhat	impaired	by	evasion	disguise,	they	could	easily	
see	through	imposter	disguise.	For	an	imposter	disguise	to	be	effective	the	model	must	
have	moved	both	outwith	their	own	face	and	into	the	face	space	of	the	person	they	are	
trying	to	impersonate.	Direction	of	disguise	is	limited	to	moving	only	towards	(and	into)	
the	face	space	of	the	other	person,	so	disguise	is	automatically	more	difficult	than	in	the	
case	of	evasion	whereby	appearance	can	be	changed	in	any	direction	that	brings	the	face	
image	outwith	its	normally	accepted	face	space.	The	familiar	viewers	were	familiar	with	
both	the	imposter	and	target	person	who	featured	in	each	trial.	This	allowed	viewers	to	
approach	the	task	from	two	angles	–	the	imposter	face	could	be	encoded	to	be	the	real	
person	behind	the	imposter,	or	it	could	be	believed	to	be	not	a	good	enough	match	to	the	
target	face	to	convincingly	fall	within	their	face	space.	Unfamiliar	viewers	were	limited	to	
using	 only	 the	 second	 of	 these	 strategies.	 These	 factors	 help	 to	 explain	 why	 familiar	
viewers	 perform	 with	 higher	 accuracy	 than	 unfamiliar	 viewers	 for	 both	 evasion	 and	
impersonation	 disguise	 and	 also	 highlight	 reasons	 for	 evasion	 disguise	 causing	 familiar	
viewers	difficulty	whereas	impersonation	disguise	did	not.		
	

5.8 General	Discussion	
This	chapter	saw	the	creation	of	the	FAÇADE	database,	which	consists	of	images	of	male	
and	female	human	faces	in	disguise	(evasion	and	impersonation)	and	no	disguise	image	
conditions.	Disguises	were	all	model	led,	meaning	that	the	models	were	free	to	disguise	
themselves	as	they	wished	and	disguises	were	not	limited	to	the	use	of	props.	Image	pairs	

	

184	
	

	
	
from	the	FAÇADE	database	were	then	used	to	test	face-matching	performance	accuracy	
for	both	familiar	and	unfamiliar	viewers.	In	Experiments	10,	11	and	12	deliberate	disguise	
had	an	effect	on	overall	face-matching	performance	accuracy	–	participants	made	more	
identity	judgment	errors	on	disguise	pairs	than	no	disguise	image	pairs.	Disguise	pair	type	
also	 affected	 performance	 accuracy.	 Evasion	 disguises	 resulted	 in	 more	 errors	 than	
impersonation	 disguise,	 and	 for	 unfamiliar	 viewers	 only	 (Experiments	 10	 &	 11),	
impersonation	 similar	 trials	 were	 more	 error	 prone	 than	 impersonation	 random	 trials.	
These	 results	 show	 that	 disguise	 is	 more	 convincing	 when	 looking	 unlike	 yourself	 than	
when	 looking	 like	 a	 specific	 other	 person.	 For	 the	 case	 of	 unfamiliar	 viewers,	
impersonation	disguises	are	more	effective	if	the	target	looks	similar	to	the	model	prior	
to	disguise.	In	Experiment	2,	I	found	that	unfamiliar	viewers	were	no	better	at	matching	
disguised	faces	when	they	had	been	informed	before	completing	the	face-matching	task	
that	some	of	the	images	would	involve	disguise,	than	they	were	when	they	were	naïve	to	
the	 disguise	 element	 of	 the	 study	 (Experiment	 10).	 In	 Experiment	 12,	 a	 familiarity	
advantage	 was	 found.	 Familiar	 viewers	 were	 better	 at	 correctly	 identifying	 match	 or	
mismatch	 disguise	 face	 pairs	 than	 unfamiliar	 viewers	 (Experiments	 10	 &	 11).	 Familiar	
viewers	 easily	 saw	 through	 impersonation	 disguises,	 but	 even	 for	 familiar	 viewers,	
evasion	disguise	caused	significant	impairment	to	face-matching	performance.			
	
Face	matching	performance	for	disguised	face	images	was	much	lower	than	performance	
from	the	GFMT,	which	is	known	as	the	standardised	test	of	face-matching	performance.	
Viewers	performed	with	around	89%	accuracy	on	the	long	version	of	the	GFMT	and	83%	
on	the	short	version	whereas	here	mean	performance	for	all	disguised	faces	was	77%	for	
unfamiliar	 (uninformed)	 and	 80%	 for	 unfamiliar	 informed	 viewers.	 Furthermore,	
familiarity	has	led	to	ceiling	levels	of	performance	in	past	face-matching	tasks	(e.g.	Burton	
et	 al.	 1999),	 however	 performance	 of	 familiar	 viewers	 for	 evasion	 disguise	 was	 even	
lower	 than	 that	 of	 unfamiliar	 viewers	 for	 cooperative	 stimuli	 in	 the	 GFMT.	 These	
numerical	 comparisons	 highlight	 how	 severely	 disguise	 impairs	 face-matching	
performance,	with	evasion	disguise	impairing	even	familiar	viewers.	
	

	

185	
	

	
	
This	 study	 overcame	 the	 main	 limitations	 of	 past	 disguise	 research	 through	 use	 and	
creation	 of	 the	 FAÇADE	 database	 and	 also	 the	 experimental	 design.	 For	 example,	
Patterson	 and	 Baddeley	 (1997)	 Experiment	 1,	 tested	 face	 memory	 for	 disguised	 images	
against	exact	image	memory	performance.	I	test	performance	based	on	a	disguise	versus	
no	 disguise	 face-matching	 accuracy	 comparison.	 The	 images	 included	 in	 the	 no	 disguise	
condition	were	not	the	same	image	of	a	face.	Additionally,	the	images	used	in	Patterson	
&	Baddeley’s	Experiment	1	disguise	condition	could	have	shown	two	different	disguises	
for	the	same	identity	–	one	at	learning	and	a	different	disguise	at	test.	I	always	compare	a	
disguise	to	no	disguise	image	matching	comparison	rather	than	introducing	two	different	
disguises.	I	believe	having	only	one	of	the	comparison	images	in	disguise	is	better	suited	
to	real	world	application.		
	
The	 overarching	 result	 of	 this	 chapter	 is	 that	 disguise	 presence	 impaired	 face-matching	
performance	 accuracy.	 This	 finding	 is	 in	 line	 with	 all	 previous	 research	 on	 disguise	 that	
was	discussed	in	the	introduction	of	this	chapter.	Disguise	had	previously	been	shown	to	
impair	face	memory	performance	(Patterson	&	Baddeley,	1977;	Terry,	1994;	Righi	et	al.	
2012).	 Dhamecha	 et	 al.	 (2014)	 also	 reported	 that	 face-matching	 accuracy	 for	 disguised	
faces	was	lower	than	accuracy	levels	reported	for	the	GFMT,	which	is	a	standard	test	of	
cooperative	 face-matching.	 Dhamecha	 et	 al.	 (2014)	 tested	 matching	 for	 image	 pairs	
whereby	 both	 of	 the	 images	 in	 the	 pair	 were	 generally	 disguised.	 My	 studies	 show	 an	
effect	 of	 disguise	 on	 face-matching	 when	 just	 one	 of	 the	 images	 in	 the	 pair	 was	 in	
disguise,	showing	how	difficult	disguise	can	make	the	task.	Additionally,	the	stimuli	used	
by	Dhamecha	and	colleagues	(2014)	included	occlusion	of	features.	Occlusion	has	been	a	
common	 feature	 of	 past	 disguise	 databases,	 and	 it	 seems	 obvious	 that	 face-matching	
performance	 would	 be	 hampered	 when	 parts	 of	 the	 face	 are	 obscured.	 Testing	 facematching	performance	using	the	FAÇADE	database	showed	that	that	even	when	disguises	
do	not	occlude	facial	features,	disguise	makes	matching	faces	a	more	difficult	task	than	
when	the	same	identities	are	presented	undisguised.		
	

	

186	
	

	
	
This	 chapter	 advances	 upon	 previous	 disguise	 research	 through	 the	 investigation	 of	
impersonation	as	well	as	evasion	disguise.	All	past	research	discussed	has	looked	at	only	
the	case	of	evasion	disguise	(Patterson	&	Baddeley,	1977;	Terry,	1994;	Righi	et	al.	2012;	
Dhamecha	 et	 al.	 2014).	 I	 show	 that	 evasion	 and	 impersonation	 disguise	 both	 cause	
matching	 difficulties,	 but	 evasion	 and	 impersonation	 do	 not	 cause	 equal	 levels	 of	
difficulty.	I	was	able	to	break	these	results	down	further	as	my	design	incorporated	two	
types	of	impersonation	–	similar	and	random.	Unfamiliar	viewers	were	better	at	matching	
random	 disguise	 pairings	 than	 pairings	 where	 the	 model	 looked	 naturally	 similar	 to	 the	
target	 they	 impersonated.	 Familiar	 viewers	 could	 see	 through	 both	 types	 of	
impersonation.	 These	 findings	 highlight	 the	 importance	 of	 investigating	 impersonation	
disguise	and	evasion	disguise.	The	different	results	for	each	confirm	that	disguise	cannot	
be	treated	as	one.		
	
The	findings	from	this	chapter	are	also	in	agreement	with	previous	face	variability	studies.	
Past	 research	 has	 shown	 that	 integrating	 multiple	 images	 of	 the	 same	 identity	 (‘telling	
people	together’)	is	a	difficult	task	(Jenkins	et	al.	2011).	I	have	demonstrated	that	people	
are	worse	at	matching	evasion	face	images	(telling	people	together)	than	impersonation	
faces	(telling	people	together).	As	people	are	poor	at	integrating	multiple	images	of	a	face	
when	 there	 is	 not	 a	 deliberate	 attempt	 to	 make	 the	 identity	 look	 different	 across	
photographs,	 it	 is	 not	 surprising	 then	 that	 performance	 is	 even	 poorer	 for	 same	 person	
trials	when	there	is	a	deliberate	attempt	to	evade	identity.		
	
A	difference	however	between	the	previous	findings	on	face	variability	and	the	findings	of	
this	 chapter,	 relates	 to	 the	 effect	 of	 familiarity.	 Jenkins	 and	 colleagues’	 (2011)	 familiar	
viewers	 could	 easily	 group	 together	 multiple	 images	 of	 the	 same	 identity.	 In	 my	 facematching	 task	 familiarity	 did	 not	 completely	 overcome	 the	 difficulty	 of	 matching	
disguised	 faces.	 Participants	 were	 significantly	 worse	 at	 matching	 face	 pairs	 where	 the	
model	evaded	their	own	identity	than	they	were	at	matching	faces	where	there	was	no	
disguise.	Levels	of	performance	for	evasion	disguise	face	pairs	were	also	far	lower	than	in	
previous	 studies	 that	 have	 tested	 cooperative	 face-matching	 performance	 of	 familiar	
	

187	
	

	
	
viewers	(Burton	et	al.	1999).	As	familiarity	did	not	eliminate	errors	for	evasion	in	my	facematching	 task,	 this	 finding	 suggests	 that	 disguise	 took	 the	 faces	 outwith	 the	 area	 of	
expertise	that	familiar	viewers	held	for	the	faces	involved.		
	
It	 would	 have	 been	 interesting	 to	 test	 for	 graded	 effects	 of	 familiarity	 in	 this	 Chapter,	
much	like	in	Chapter	1.	However,	due	to	the	participant	samples	that	we	used,	familiarity	
was	 skewed	 towards	 the	 extreme	 ends	 of	 our	 familiarity	 spectrums	 for	 each	 of	 our	
experiments.	 Over	 80%	 of	 the	 unfamiliar	 viewers	 rated	 the	 face	 items	 as	 completely	
unfamiliar,	whereas	over	80%	of	our	familiar	viewers	were	of	very	or	extreme	familiarity	
with	 the	 faces	 in	 the	 task.	 In	 addition	 to	 this,	 the	 scales	 were	 likely	 not	 directly	
comparable	as	familiar	viewers	were	more	conservative	on	their	familiarity	ratings	than	
unfamiliar	 viewers	 due	 to	 their	 baseline	 level	 of	 familiarity	 with	 the	 faces.	 All	 familiar	
viewers	 were	 at	 least	 somewhat	 familiar	 with	 all	 of	 the	 images	 as	 the	 models	 were	 of	
their	 colleagues.	 This	 meant	 that	 those	 faces	 that	 qualified	 as	 extremely	 familiar	 were	
usually	 who	 the	 viewers	 had	 an	 exceptionally	 high	 level	 of	 interaction	 with	 (e.g.	 in	 the	
same	 office	 with,	 close	 friends	 with)	 where	 as	 people	 ranked	 lower	 down	 on	 the	 scale	
would	 also	 be	 very	 well	 known,	 and	 likely	 classified	 as	 extremely	 familiar	 in	 many	
scenarios,	but	were	less	familiar	in	relation	to	the	other	faces	in	the	task.	It	is	therefore	
possible	that	even	if	a	graded	analysis	was	attempted	on	the	small	numbers	in	the	lower	
familiarity	bands,	that	the	graded	result	may	not	be	found	in	this	experiment	given	that	
all	familiar	viewers	were	of	a	baseline	high	familiarity	with	the	faces	concerned.		
	
This	 thesis	 has	 focused	 exclusively	 on	 conducting	 studies	 of	 face-matching	 performance	
for	 various	 challenging	 situations.	 Face-matching	 is	 a	 current	 and	 common	 security	
scenario,	thus	warranting	thorough	investigation,	however	it	would	also	be	interesting	to	
assess	 the	 situation	 of	 face	 memory	 for	 disguised	 and	 undisguised	 faces.	 Past	 studies	
attempted	to	address	this	question	but	used	image	matching	comparison	scenarios	and	
questionable	disguise	stimuli.	It	would	be	interesting	to	find	out	whether	face	memory	is	
worse	for	disguised	faces	in	our	database.	The	case	of	impersonation	disguise	would	be	
particularly	 thought	 provoking	 as	 this	 has	 not	 received	 any	 past	 investigation	 and	 it	 is	
	

188	
	

	
	
unclear	 whether	 participants	 would	 be	 confused	 by	 seeing	 the	 target	 face	 or	 model	 in	
undisguised	form	and	be	able	to	make	links	to	the	impersonator	from	the	impersonation	
disguise	 image	 or	 if	 they	 would	 incorrectly	 identify	 the	 target	 face	 as	 the	 face	 seen	
before.		
	
Future	research	could	test	human	versus	machine	performance	for	the	FAÇADE	database	
pairs.	Previous	research	has	tested	machine	performance	for	recognising	disguised	faces,	
generally	testing	the	performance	of	several	different	algorithms	against	each	other	(e.g.	
Singh	et	al.,	2009).	Dhamecha	and	colleagues	(2014)	showed	that	machines	performed	at	
a	level	similar	to	the	unfamiliar	human	viewers	for	their	face	pairs	that	included	faces	in	
disguise,	 but	 looked	 at	 only	 the	 case	 of	 evasion	 disguise	 and	 for	 an	 unrealistic	 set	 of	
disguise	 faces.	 My	 database	 would	 allow	 a	 direct	 comparison	 between	 unfamiliar	 and	
familiar	 human	 viewer	 face-matching	 performance	 with	 performance	 of	 computer	
algorithms	 for	 evasion,	 impersonation	 similar	 and	 impersonation	 random	 faces.	 Human	
performance	 has	 been	 successfully	 fused	 with	 algorithms	 in	 past	 studies	 (O’Toole	 et	 al.	
2007)	 therefore	 if	 machine	 performance	 was	 tested	 on	 the	 FAÇADE	 database,	 fusing	
performance	 from	 machines	 and	 humans	 may	 be	 a	 method	 which	 could	 be	 used	 to	
improve	disguise	face	recognition	performance	also.		
	
Implications	 of	 this	 research	 are	 that	 care	 needs	 to	 be	 taken	 over	 disguise	 faces	 and	
methods	 need	 to	 be	 explored	 to	 find	 out	 ways	 to	 improve	 disguise	 face	 recognition	
performance.	It	would	be	interesting	to	test	the	performance	of	super-recognisers	on	the	
task	as	super-recognisers	have	been	found	to	outperform	controls	on	other	difficult	tasks	
of	face-matching	(Robertson,	Noyes,	Jenkins	&	Burton,	2016).	Disguised	face	recognition	
performance	 is	 poorer	 than	 performance	 for	 faces	 that	 are	 not	 disguised,	 hence	 more	
errors	 are	 likely	 to	 be	 made	 in	 real	 life	 deliberate	 disguise	 fraud	 situations	 than	 if	
someone	 is	 trying	 to	 pass	 themselves	 off	 to	 be	 someone	 else	 but	 has	 not	 made	 any	
deliberate	attempt	to	change	their	appearance	to	look	more	alike.	Poor	performance	for	
evasion	 disguise	 faces	 does	 however	 provide	 hope	 for	 undercover	 police	 investigations.	
Police	 Scotland	 had	 addressed	 concerns	 about	 their	 undercover	 police	 officers’	 identity	
	

189	
	

	
	
being	recognised,	but	our	findings	suggest	that	evasion	disguise	is	effective	especially	for	
unfamiliar	viewers.	However	the	converse	of	this	finding	is	that	criminals	who	evade	their	
own	identity	provide	a	difficult	face	recognition	scenario	for	the	police.		
	
I	 created	 a	 disguise	 face	 database,	 which	 is	 the	 first	 to	 include	 free	 evasion,	 and	
impersonation	 disguises	 on	 real	 female	 and	 male	 faces.	 This	 has	 allowed,	 and	 will	
continue	 to	 allow,	 disguise	 face	 recognition	 to	 be	 explored	 in	 more	 detail	 than	 ever	
before.	 Disguise	 impaired	 face	 recognition	 performance,	 with	 the	 type	 of	 disguise	
(evasion,	 impersonation	 similar	 or	 impersonation	 random)	 affecting	 face-matching	
accuracy.	 Familiar	 viewers	 outperformed	 unfamiliar	 viewers	 at	 the	 disguise	 matching	
task,	 however	 even	 familiar	 viewers	 were	 significantly	 affected	 by	 the	 presence	 of	
evasion	 disguise.	 Implications	 of	 these	 findings	 are	 that	 familiar	 viewers	 will	 likely	 be	
more	 accurate	 at	 tasks	 involving	 identity	 judgement	 than	 unfamiliar	 viewers.	 Poor	
disguise	 face-matching	 is	 advantageous	 in	 terms	 of	 undercover	 policing	 but	 also	 poses	
serious	security	threats	if	either	evasion	or	impersonation	disguises	are	used	in	criminal	
situations.			
	
Performance	accuracy	on	the	face-matching	task	has	now	been	established,	but	it	is	not	
yet	 known	 how	 people	 disguise	 themselves	 and	 which	 disguise	 manipulations	 work.	
Disguised	face	images	in	the	FAÇADE	database	were	created	from	photographing	models	
posed	 in	 evasion,	 impersonation	 similar	 and	 impersonation	 random	 image	 conditions.	
Importantly	 for	 these	 new	 research	 questions,	 models	 were	 free	 in	 their	 disguise.	 This	
means	that	it	was	the	models	themselves	who	decided	upon	how	to	create	each	of	the	
disguises	 and	 props	 were	 provided	 and	 arranged	 on	 request	 of	 the	 models	 rather	 than	
the	 experimenter.	 This	 free	 disguise	 element	 is	 unique	 to	 my	 database	 and	 allows	 an	
investigation	of	what	changes	people	naturally	make	to	disguise	themselves.	Additionally,	
to	 try	 and	 uncover	 the	 steps	 necessary	 to	 create	 an	 effective	 disguise,	 disguise	 facematching	 data	 from	 this	 chapter	 can	 be	 combined	 with	 written	 reports	 from	 both	 the	
models	and	viewers	regarding	disguise	manipulations	and	effectiveness.	These	questions	
will	be	explored	in	Chapter	6.		
	

190	
	

	
	

Chapter	6	–	Understanding	Disguise	
6.1 Chapter	Summary	
In	 order	 to	 further	 explore	 disguise	 -	 specifically	 how	 people	 disguised	 themselves,	
whether	effective	disguises	could	be	identified,	and	which	disguises	worked	-	data	from	
the	 matching	 task	 (Chapter	 5)	 was	 analysed	 along	 with	 models’	 records	 of	 how	 they	
created	their	disguises	and	effectiveness	ratings	from	an	unfamiliar	viewer	group.		
	
This	 chapter	 points	 to	 clear	 distinctions	 between	 evasion	 and	 impersonation	 disguise	 –	
both	 in	 terms	 of	 what	 changes	 people	 make,	 and	 also	 what	 makes	 a	 disguise	 effective.	
Unfamiliar	viewers	could	accurately	predict	disguise	effectiveness	from	viewing	the	target	
and	reference	image	side	by	side.	
	
Previous	studies	on	disguise	investigated	the	effect	of	adding	props	to	a	face,	which	often	
occluded	 facial	 features.	 I	 found	 that	 when	 models	 were	 free	 to	 create	 their	 own	
disguises	they	used	far	more	methods	other	than	simply	adding	props	to	do	this.	Evasion	
disguise	revolved	around	creating	differences	with	a	target	face	in	terms	of	internal	and	
external	 features,	 through	 the	 use	 of	 makeup,	 clothing	 and	 hairstyle	 change	 sometime	
through	 wigs,	 and	 also	 using	 techniques	 such	 as	 expression	 and	 lighting	 change.	
Successful	 evasion	 disguise	 was	 also	 linked	 to	 creating	 differences	 in	 social	 inferences.	
Impersonation	disguise	on	the	other	hand	involved	creating	and	focusing	on	similarities	
with	 a	 target	 face,	 but	 these	 similarities	 were	 related	 to	 physical	 changes	 to	 a	 face,	 for	
example	internal	and	external	features.	Social	inferences	did	not	change	to	match	those	
of	 the	 person	 being	 impersonated.	 It	 is	 evident	 that	 free	 disguise	 creation,	 especially	 in	
the	case	of	effective	disguise,	is	far	more	complex	than	purely	the	addition	of	props.	It	is	
important	 that	 disguise	 research	 acknowledges	 distinctions	 between	 evasion	 and	
impersonation	disguise.		

	
	

191	
	

	
	

6.2 Introduction	
As	 explained	 and	 demonstrated	 in	 Chapter	 5,	 I	 have	 created	 a	 disguise	 face	 database	
consisting	 of	 26	 models,	 disguised	 to	 evade	 their	 own	 identity	 and	 to	 impersonate	 the	
identity	of	two	reference	individuals	–	one	of	naturally	similar	appearance	to	the	model	
and	 the	 other	 selected	 at	 random,	 from	 images	 of	 other	 models	 of	 the	 same	 gender.	
These	models	were	highly	motivated,	aided	by	the	incentive	of	a	performance	based	cash	
reward,	 to	 create	 extremely	 convincing	 disguises.	 Models	 were	 free	 to	 disguise	
themselves	as	they	wished	and	could	request	props	to	assist	their	efforts.	I	provided	no	
guidance	on	how	the	models	should	disguise	themselves,	although	I	did	alert	them	to	the	
different	manipulations	they	could	make	using	camera	angle	and	distance	(e.g.	findings	of	
Chapter	 4).	 The	 only	 limitations	 on	 disguise	 were	 that	 the	 end	 result	 must	 look	 like	 a	
realistic	I.D.	photograph	rather	than	a	person	in	a	fancy	dress	costume	and	that	any	props	
that	 would	 have	 to	 be	 removed	 in	 a	 passport	 security	 check	 were	 disallowed	 (e.g.	 hats	
and	 sunglasses).	 As	 demonstrated	 in	 Chapter	 5	 this	 led	 to	 the	 creation	 of	 a	 much	 more	
sophisticated	disguise	database	than	previous	studies	have	used.	It	is	of	interest	to	better	
understand	 disguise	 -	 what	 the	 models	 did	 to	 disguise	 themselves	 in	 each	 condition,	
which	 disguises	 were	 believed	 to	 be	 the	 best	 and	 what	 actually	 makes	 for	 the	 best	
disguise	(with	reference	to	items	that	caused	most	difficulty	in	the	matching	task).	I	have	
matching	 performance	 accuracy	 rates	 for	 the	 disguise	 face	 database	 (Chapter	 5).	 I	 also	
have	 information	 on	 what	 the	 models	 did	 to	 create	 their	 disguise	 and	 ratings	 from	 an	
independent	viewer	group	for	the	effectiveness	of	each	disguise.	This	data	can	be	used	to	
help	to	gain	the	desired	better	understanding	of	disguise.			
	
As	reviewed	in	Chapter	5,	past	research	on	disguise	is	limited	in	that	it	has	not	looked	at	
impersonation	 disguise,	 and	 models	 whose	 images	 provided	 the	 disguise	 stimuli	 used	
simple	 disguises,	 adding	 props	 to	 occlude	 features.	 Perhaps	 the	 closest	 to	 background	
literature	as	a	starting	point	to	predicting	what	may	make	the	best	disguise,	is	to	look	at	
what	image	changes	are	already	known	to	affect	face	recognition	performance.	Previous	
memory	studies	focus	on	the	case	of	same	person	face	recognition,	therefore	apply	more	

	

192	
	

	
	
to	evasion	disguise	rather	than	impersonation.	Most	previous	studies	have	focused	on	the	
effect	of	changing	expression,	viewpoint	or	lighting	on	face	recognition	performance.			
	
Factors	that	Increase	the	Difficulty	of	Face	Recognition		
The	literature	paints	a	clear	picture	that	face	recognition	performance	accuracy	is	worse	
when	 viewpoint	 or	 expression	 is	 changed	 between	 presentation	 and	 test.	 Bruce	 (1982)	
found	that	participants	performed	with	90%	accuracy	for	correctly	identifying	an	identical	
picture	at	test	as	one	that	they	had	viewed	15	minutes	earlier.	If	at	test	the	participants	
saw	an	image	of	the	same	identity	that	they	had	viewed	before,	but	head	angle	or	pose	
was	 changed	 at	 the	 test	 phase,	 recognition	 accuracy	 dropped	 to	 76%.	 Changes	 to	 head	
angle	 and	 expression,	 rather	 than	 changes	 to	 just	 one	 or	 the	 other,	 led	 to	 further	
impairments	in	recognition	performance,	with	performance	falling	to	61%	accuracy.	The	
effect	of	change	in	image	between	exposure	and	test	has	been	shown	to	be	so	strong	that	
even	familiar	face	recognition	slows	when	viewpoint	is	changed	(Bruce,	1982).		
	
O’Toole,	 Edelman	 &	 Bulthoff	 (1998)	 report	 a	 similar	 effect.	 They	 demonstrated	 that	
participants	 were	 less	 likely	 to	 recognise	 an	 image	 they	 viewed	 at	 test	 as	 an	 identity	
previously	 seen	 if	 the	 conditions	 of	 the	 image	 view	 (full,	 3/4	 or	 profile)	 had	 changed	
between	 the	 learning	 and	 test	 phase.	 When	 faces	 were	 learnt	 in	 full	 view,	 recognition	
performance	 was	 highest	 for	 test	 items	 shown	 in	 full	 view.	 However,	 when	 faces	 were	
presented	in	full	view	during	the	learning	stage,	there	was	an	advantage	for	recognition	
from	3/4	views	over	profile	at	test.	This	suggests	that	certain	changes	in	viewpoint	might	
be	more	detrimental	to	recognition	performance	than	others.		
	
The	 studies	 mentioned	 above	 highlight	 that	 changes	 in	 viewpoint	 and	 expression	
influence	 face	 recognition	 performance.	 Other	 studies	 have	 shown	 that	 these	 image	
manipulations	 also	 affect	 unfamiliar	 face-matching	 accuracy	 (Bruce,	 Henderson,	
Greenwood	 et	 al.	 1999).	 Bruce	 and	 colleagues	 (1999)	 found	 that	 face-matching	

	

193	
	

	
	
performance	was	negatively	affected	by	any	change	between	the	target	and	comparison	
face	 images	 -	 this	 could	 include	 a	 change	 in	 viewpoint	 or	 change	 in	 expression.	 The	
procedure	took	the	form	of	a	line	up	scenario.	Participants	compared	a	still	image,	taken	
from	video	footage,	to	10	face	images	in	a	line	up	array	format	shown	below	the	target	
video	image.	The	task	was	to	determine	whether	the	target	image	was	present	in	the	line	
up	array,	and	if	so,	identify	the	correct	match.	Images	in	the	line	up	array	were	always	full	
face	and	generally	neutral	in	expression.	The	still	video	comparison	images	showed	either	
the	same	or	different	expression	type	(neutral	or	smiling)	and	were	of	either	the	same	or	
different	view	point	(full	or	30	degree	view)	to	the	target	image.	Accuracy	was	worse	for	
the	 trials	 where	 viewpoint	 or	 expression	 differed	 between	 target	 and	 comparison	 array	
images.	 Viewpoint	 change	 was	 found	 to	 reduce	 performance	 accuracy	 more	 than	 a	
change	in	expression.		
	
Bruce	 et	 al.	 (1999)	 found	 that	 viewpoint	 could	 also	 influence	 performance	 on	 a	 paired	
matching	task.	Participants	made	more	errors	when	comparing	a	target	face	to	a	similar	
distractor	face	if	the	viewpoint	of	the	two	images	was	different,	than	in	situations	where	
viewpoint	of	the	target	and	distractor	were	the	same.	Hill	and	Bruce	(1996)	mention	that	
when	faces	are	matched	across	different	viewpoints,	performance	accuracy	is	increased	if	
the	faces	are	lit	from	the	same	direction,	specifically	if	the	images	are	both	lit	from	above.	
This	 is	 consistent	 with	 work	 of	 Johnston	 et	 al.	 (1992)	 which	 reported	 that	 lighting	 from	
below	 made	 it	 harder	 to	 recognise	 familiar	 faces.	 Hill	 and	 Bruce	 (1996)	 reported	 that	
changes	 in	 lighting	 alone	 reduced	 matching	 performance;	 as	 did	 changes	 in	 view	 point.	
Together,	 these	 findings	 can	 be	 taken	 as	 evidence	 that	 a	 change	 in	 viewpoint,	 and	 also	
expression	 or	 lighting,	 between	 target	 and	 comparison	 faces	 increases	 the	 difficulty	 of	
unfamiliar	face-matching.	Any	consistencies	held	across	the	images	can	aid	the	matching	
effort	(e.g.	Hill	&	Bruce,	1996).		
	
	
	

	

194	
	

	
	
Changes	to	Internal	Versus	Changes	to	External	Features		
A	 second	 question	 of	 interest	 on	 the	 topic	 of	 which	 changes	 matter,	 is	 whether	 people	
pay	different	amounts	of	attention	to	internal	and	external	features	when	making	identity	
judgments.	If	they	do,	this	may	give	us	some	clues	to	predict	whether	internal	or	external	
changes	 matter	 more	 for	 effective	 disguise.	 The	 answer	 appears	 to	 depend	 on	 whether	
the	 viewer	 is	 familiar	 or	 unfamiliar.	 Unfamiliar	 viewers	 value	 internal	 and	 external	
features	as	equally	important	when	matching	a	face	(Ellis	et	al.,	1979;	Young,	et	al.,	1985).	
Familiar	viewers	on	the	other	hand	give	more	weight	to	internal	features.	Notably	these	
are	the	less	changeable	features	of	a	face	(Ellis	et	al.,	1979;	Young	et	al.	1985;	Tanaka	&	
Farah,	 1993;	 Toseeb,	 Keeble	 &	 Bryant,	 2012).	 While	 hairstyle	 can	 change	 frequently,	
specific	features	of	a	face	are	generally	more	constant.	Young	et	al.	(1985)	found	these	
results	 by	 testing	 how	 quickly	 participants	 made	 correct	 matches	 for	 a	 full	 face	 image	
alongside	 only	 the	 internal	 features,	 compared	 to	 matching	 a	 full	 face	 image	 with	 an	
image	 cropped	 to	 contain	 only	 external	 features.	 Features	 themselves	 (internal	 and	
external)	may	be	easier	to	compare	across	images	taken	from	the	same	viewpoint	(Bruce	
et	al.	1999).		
	
O’Donnell	&	Bruce	(2001)	report	a	slightly	different	but	interesting	finding	for	familiarity	
and	attention	to	internal	and	external	features.	They	artificially	edited	images	of	faces	so	
that	 participants	 would	 see	 either	 two	 identical	 images,	 or	 the	 original	 image	 and	 an	
altered	image	of	that	same	face	side	by	side.	Changes	could	be	either	to	the	hair	or	chin	
(external	features)	or	the	eyes	or	mouth	(internal	features).	Participants	had	to	work	out	
whether	 the	 images	 in	 the	 pair	 were	 identical	 or	 differed	 physically	 in	 some	 way.	
Unfamiliar	viewers	(untrained)	detected	changes	to	the	hair	with	highest	accuracy	(in	line	
with	findings	of	Bruce	et	al.	1999).	Familiar	viewers	(trained)	were	very	good	at	detecting	
changes	to	the	hair,	but	were	also	highly	attuned	to	detecting	changes	made	to	the	eyes.	
This	 experiment	 made	 artificial	 changes	 to	 images	 resulting	 in	 an	 image	 matching	 type	
task	 rather	 than	 face-matching	 using	 disguise	 manipulations.	 In	 conclusion,	 familiar	
viewers	 are	 consistently	 relying	 on	 internal	 features	 of	 a	 face	 to	 aid	 their	 identity	
decisions.	 It	 is	 uncertain	 whether	 unfamiliar	 viewers	 are	 using	 internal	 and	 external	

	

195	
	

	
	
features	equally	(Ellis	et	al.,	1979,	Young	et	al.	1985),	or	relying	more	on	external	features	
(Bruce	et	al.	1999;	2001).	Evidence	can	be	found	for	both	accounts	in	the	literature.		
	
Past	Research	on	What	Makes	an	Effective	Disguise		
Patterson	 &	 Baddeley	 (1977)	 begin	 to	 answer	 the	 exact	 question	 of	 interest,	 of	 which	
disguises	 work.	 As	 is	 the	 case	 for	 all	 previous	 disguise	 research,	 this	 situation	 was	
investigated	 for	 evasion	 disguise	 only.	 Unlike	 the	 findings	 above,	 they	 reported	 that	
recognition	performance	was	not	greatly	affected	by	small	changes	in	appearance.	They	
specifically	 examined	 changes	 to	 pose	 and	 expression	 between	 learning	 and	 test	 items.	
Although	 participants	 showed	 a	 lower	 overall	 hit	 rate	 when	 the	 test	 items	 were	 of	
changed	view	and	expression	than	identical	view	and	expression,	their	false	positive	rate	
was	 also	 lower,	 negating	 any	 overall	 effect	 of	 changing	 viewpoint	 and	 expression.	 The	
study	also	tested	recognition	accuracy	for	disguised	faces.	Many	recognition	errors	were	
made	for	the	disguise	stimuli	–	hit	dropped	rate	from	.98	for	no	change	photographs	to	a	
hit	 rate	 of	 .45	 (around	 chance	 level)	 for	 the	 disguise	 images.	 Disguise	 images	 in	 this	
experiment	 in	 Patterson	 &	 Baddeley’s	 study	 were	 taken	 from	 actor	 photographs	 that	
showed	actors	in	a	range	of	different	appearances	depending	on	the	roles	they	had	taken	
on.	 Evasion	 or	 impersonation	 disguise	 itself	 was	 not	 directly	 the	 aim	 of	 the	 change	 in	
appearance.	Actors	had	changed	their	appearance	to	suit	a	job	role.	I	return	to	the	issue	
of	 changing	 character	 later	 in	 this	 chapter.	 Active	 disguise	 manipulations,	 whereby	
someone	is	specifically	trying	to	evade	or	impersonate	identity,	may	be	more	challenging	
for	face	recognition.		
	
Due	to	the	type	of	disguise	stimuli	used	in	Patterson	and	Baddeley’s	Experiment	1,	they	
were	unable	to	categorize	the	exact	changes	made	to	appearance	across	images.	Like	me,	
Patterson	&	Baddeley	wanted	to	understand	better	what	appearance	changes	made	for	
an	effective	disguise.	This	was	addressed	in	Experiment	2	of	the	paper.	In	Experiment	2	
stimuli	 were	 amateur	 dramatic	 students	 who	 modelled	 for	 disguise	 and	 no	 disguise	
photographs.	However,	unlike	in	my	FAÇADE	database,	their	disguises	were	standardised,	

	

196	
	

	
	
and	were	limited	to	the	addition	or	removal	of	wigs,	beards	and	glasses.	All	models	were	
male.	 Participants	 had	 been	 informed	 that	 the	 appearance	 of	 the	 models	 might	 have	
been	changed	(through	disguise)	in	the	test	images,	although	in	Experiment	11,	I	saw	that	
knowing	to	look	for	disguise	did	not	improve	performance	compared	to	those	who	were	
not	informed	of	the	disguise	manipulation.	The	effect	of	each	of	disguise	was	analysed,	
with	the	findings	as	follows:	matching	performance	was	greater	when	hairstyle	remained	
the	same	than	when	it	was	changed;	performance	was	better	when	a	face	was	unchanged	
with	regard	to	presence	or	absence	of	a	beard;	performance	was	poorest	when	multiple	
changes	 to	 appearance	 were	 made.	 The	 presence/absence	 of	 glasses	 interacted	 with	
changes	in	hair.	The	general	finding	was	that	disguise	made	it	harder	to	recognise	faces,	
and	multiple	disguise	manipulations	(more	props)	increased	the	difficulty.	It	is	a	recurrent	
theme	that	multiple	alterations	to	a	face	between	learning	and	test	images	have	greater	
effect	 on	 performance	 accuracy	 than	 just	 one	 change,	 or	 fewer	 changes	 (Patterson	 &	
Baddeley	1977;	Bruce	et	al.	1982;	Bruce	et	al.	1999).		
	
Dhamecha	 et	 al.	 (2014)	 were	 interested	 in	 specific	 elements	 of	 disguise	 that	 cause	
problems	for	face-matching	performance.	As	explained	in	the	introduction	of	Chapter	5,	
Dhamecha	 and	 colleagues	 looked	 at	 the	 effect	 of	 occluding	 facial	 features	 through	 the	
addition	 of	 props	 and	 then	 cropped	 the	 images	 so	 that	 the	 focus	 was	 exclusively	 on	
internal	 features	 of	 a	 face.	 Following	 collection	 of	 results	 from	 their	 matching	 task,	
Dhamecha	 and	 colleagues	 (2014)	 analysed	 performance	 accuracy	 for	 each	 segment	 or	
segments	 of	 the	 face	 covered.	 This	 was	 achieved	 by	 dividing	 the	 face	 into	 four	 regions:	
lips,	 nose,	 eyes	 and	 forehead.	 The	 occlusion	 (presence	 of	 a	 prop)	 of	 just	 one	 of	 any	 of	
these	four	regions	was	linked	to	high	error	rates	in	the	face-matching	task.	However	it	is	
important	 to	 remember	 that	 due	 to	 the	 design	 type,	 a	 one-region	 disguise	 pair	 could	
consist	of	one	face	where	the	eyes	are	not	visible	due	to	sunglasses	next	to	a	face	where	
the	 lips	 were	 not	 visible	 as	 they	 were	 covered	 by	 a	 medical	 mask	 (see	 Figure	 6.1).	 This	
would	 leave	 very	 little	 of	 the	 face	 available	 for	 comparison	 across	 images,	 so	 it	 is	 little	
wonder	 that	 this	 is	 a	 highly	 error	 prone	 task.	 The	 presence	 of	 multiple	 props	 (covered	
areas)	 led	 to	 some	 further	 reduction	 in	 performance,	 but	 the	 greatest	 reduction	 in	

	

197	
	

	
	
performance	 came	 from	 the	 introduction	 of	 any	 single	 occlusion	 (i.e.	 from	 zero	 to	 one	
occluding	disguise	element).			
	
	
	
	
	
	
Figure	6.1	Figure	demonstrates	two	different	disguise	images	for	the	same	identity.	The	image	on	the	left	
occludes	top	part	of	the	face,	and	the	image	of	the	right	occludes	the	bottom	part	of	the	face.		

	
All	of	this	previous	work	has	covered	only	situations	of	evasion	disguise	–	manipulations	
that	 make	 a	 person	 look	 less	 like	 themselves.	 The	 situation	 of	 impersonation	 disguise	 -	
taking	on	the	identity	of	a	specific	other	person	-	has	not	been	explored	previously,	thus	I	
will	be	examining	what	makes	for	an	effective	impersonation	disguise	for	the	first	time.	
Previous	 work	 has	 reported	 that	 distinguishing	 features	 are	 important	 for	 face	
recognition,	with	identities	in	a	line	up	task	being	more	likely	to	be	mistaken	for	the	true	
culprit	if	they	share	a	distinguishing	feature	(Wells,	et	al.,	1993;	Flow	&	Ebbesen,	2007).	
Therefore	 this	 might	 provide	 a	 clue	 for	 creating	 convincing	 impersonation.	 Other	 than	
this,	 all	 that	 is	 predicted	 for	 impersonation	 disguise	 is	 that	 effective	 impersonation	
disguises	will	be	those	that	make	a	face	look	most	similar	to	the	target	face.		
	
In	summary,	so	far	it	has	been	established	that	any	change	between	images	presented	at	
learning	 and	 test	 causes	 difficulties	 for	 recognition	 performance,	 with	 disguise	
manipulations	 (here	 seen	 as	 adding	 or	 removing	 props	 such	 as	 wigs,	 beards	 or	 glasses)	
causing	more	difficulty	than	more	subtle	changes	such	as	expression	and	pose	(Patterson	
&	 Baddeley,	 1977;	 Bruce	 et	 al,	 1992,	 1999;	 O’Toole	 et	 al.	 1998).	 Expression	 and	 pose	

	

198	
	

	
	
differences	 between	 images	 have	 been	 found	 to	 impair	 matching	 performance	 (Bruce,	
1999),	 suggesting	 that	 these	 changes	 could	 assist	 the	 success	 of	 disguise.	 Disguise,	 in	
terms	of	occluding	facial	features	(Dhamecha,	2014),	and	also	free	disguise	(Chapter	5),	
leads	to	severe	difficulties	in	face-matching.			
	
In	 Chapter	 5,	 I	 applied	 theories	 to	 both	 evasion	 and	 impersonation	 disguise.	 For	 an	
evasion	 disguise	 to	 be	 successful	 an	 individual	 must	 change	 their	 appearance	 so	 that	 it	
becomes	 outwith	 the	 accepted	 range	 of	 appearances	 for	 that	 face.	 In	 the	 case	 of	
impersonation	 disguise	 the	 challenge	 is	 to	 change	 appearance	 to	 get	 inside	 a	 specific	
other	 person’s	 accepted	 range	 of	 appearances.	 It	 thus	 seems	 likely	 that	 the	 approach	
taken	 to	 create	 each	 of	 these	 types	 of	 disguise	 (evasion	 and	 impersonation)	 may	 be	
different.	 It	 is	 less	 clear	 whether	 differences	 will	 emerge	 between	 the	 approaches	 to	
creating	 the	 different	 types	 of	 impersonation	 disguises	 –	 impersonating	 someone	 of	
similar	 appearance	 compared	 to	 impersonating	 an	 identity	 of	 the	 same	 selected	 at	
random,	 but	 of	 the	 same	 sex.	 It	 is	 possible	 that	 natural	 similarities	 and	 differences	
between	specific	faces	may	guide	each	individual’s	approach,	leading	to	a	very	individual	
method	 for	 disguise,	 or	 alternatively	 there	 may	 be	 a	 common	 approach	 taken	 by	 our	
disguise	models	to	create	impersonation	disguises,	for	example	models	might	consistently	
focus	on	hair	or	facial	expression.	I	will	approach	two	key	questions	–	how	people	disguise	
themselves	and	which	approaches	are	effective.		
	

6.3 How	do	People	Disguise	Themselves?	
Method	
Participants	
Three	 independent	 raters	 were	 instructed	 of	 the	 coding	 system	 and	 any	 coding	
discrepancies	discussed	until	agreement	was	met.		
	

	

199	
	

	
	
Design	
Word	 clouds	 were	 created	 for	 the	 three	 different	 disguise	 conditions	 –	 Evasion,	
Impersonation	Similar	and	Impersonation	Random.	
	
Procedure	
The	26	models	recorded	the	changes	that	they	made	to	their	appearance	by	writing	down	
on	a	piece	of	paper	what	they	did	to	disguise	themselves	in	each	condition	and	what	they	
hoped	 each	 manipulation	 would	 achieve.	 This	 was	 recorded	 whilst	 the	 models	 created	
their	disguises.		
	
The	 disguise	 manipulations	 and	 purposes	 were	 typed	 up	 and	 coded	 into	 categories	 to	
make	 sure	 the	 same	 changes,	 described	 by	 different	 words,	 were	 captured	 under	 the	
same	 word	 to	 allow	 a	 word	 analysis	 to	 be	 accurately	 conducted.	 For	 example,	 if	 a	
participant	dyed	their	hair,	this	was	categorised	as	a	change	of	hair-colour.	Categorisation	
was	 conducted	 by	 3	 raters	 who	 had	 been	 instructed	 of	 the	 coding	 system.	 Any	
discrepancies	were	discussed	amongst	the	raters	until	agreement	was	met.		
	
Hyphenations	were	used	to	make	sure	that	meaningful	phrases,	such	as	hair	colour,	were	
kept	as	one	word.	Otherwise	the	word	hair	for	example,	would	have	been	counted	across	
several	 different	 hair	 changes	 -	 such	 as	 hair	 colour	 and	 hair	 length.	 Hyphenating	 words	
that	contributed	to	a	phrase	ensured	accurate	word	categorisation.		
	
The	 categorised	 body	 of	 text	 was	 entered	 into	 wordle	 (http://www.wordle.net)	 –	
software	 that	 creates	 word	 clouds	 based	 on	 word	 frequency.	 Words	 are	 given	
prominence	 within	 their	 cloud	 based	 on	 the	 frequency	 of	 their	 occurrence.	 The	 more	
often	a	word	occurs	in	the	inputted	text,	the	larger	the	word	will	be	displayed	in	the	word	

	

200	
	

	
	
cloud.	These	word	clouds	may	help	in	gaining	knowledge	of	the	manipulations	made	by	
the	models	for	each	of	the	disguise	conditions.					
	
Results	and	Discussion	
Evasion		

Figure	6.2	Word	cloud	showing	the	most	frequently	stated	words	for	creating	an	Evasion	disguise.		

	
All	 changes	 in	 the	 Evasion	 condition	 related	 to	 creating	 differences	 in	 appearance	
compared	 to	 the	 model’s	 own	 reference	 photograph.	 Models	 frequently	 changed	 the	
appearance	of	their	own	features,	skin-tone,	hairstyle	or	hair	colour	and	clothes	in	order	
to	 look	 make	 themselves	 look	 physically	 different	 to	 their	 normal	 (own	 reference	
photograph)	appearance	(see	Figure	6.2).	The	word	cloud	includes	ways	that	participants	
tried	to	change	their	identity	and	shows	that	this	was	largely	done	through	props,	such	as	
use	of	the	words		‘wig’,	‘glasses’	and	‘clothes’.	Props	were	however	not	the	only	method	
used	 to	 create	 facial	 change.	 Other	 techniques	 used	 included	 the	 use	 of	 makeup	 to	
facilitate	changes	to	features	and	skin	tone	and	changes	in	camera	angle	to	change	face	
shape.		

	

201	
	

	
	
	
Impersonation	Similar	

	
Figure	 6.3	 Word	 cloud	 showing	 the	 most	 frequently	 stated	 words	 for	 creating	 an	 Impersonation	 Similar	
disguise.	All	words	represent	similarities	with	the	target	face	except	where	specified	as	differences.		

	
To	create	Impersonation	Similar	disguises	models	focused	on	looking	similar	to,	or	even	
matching	the	appearance	of,	the	reference	photograph.	This	was	attempted	in	a	variety	
of	ways	including	focusing	on	creating	similarities	in	the	appearance	of	specific	features	
especially	hairstyle	but	also	eyes	and	lips	(see	Figure	6.3).	Beards	were	grown	or	shaved	
where	applicable	to	match	the	condition	of	the	reference	photo	(see	Figure	6.4).	Copying	
the	 eyebrows	 of	 a	 face	 was	 a	 particularly	 common	 impersonation	 similar	 disguise	
technique	 (see	 Figure	 6.5).	 This	 was	 done	 by	 changing	 the	 eyebrow	 colour,	 covering	 up	
eyebrows	 with	 foundation,	 or	 drawing	 on	 eyebrows	 in	 a	 different	 style	 or	 shape	 to	 the	
model’s	 own	 to	 create	 a	 better	 eyebrow	 match	 with	 the	 reference	 photo.	 Copying	 the	
facial	 expression	 of	 that	 in	 the	 reference-photo	 was	 also	 a	 very	 popular	 impersonation	
technique	(as	in	both	Figure	6.4	&	6.5).		
	
	
	

202	
	

	
	
	
	
	
	
	
	
	
	
	
Figure	6.4	The	model	(shown	right	in	this	image	pair)	shaved	his	beard	to	better	match	the	appearance	of	
his	target	(left).		

	
	
	
	
	
	
	
	
	
	
Figure	 6.5	 The	 model	 (right)	 has	 copied	 the	 eyebrows	 of	 the	 target	 (left)	 using	 makeup	 to	 alter	 eyebrow	
shape.	

	
	
	

203	
	

	
	
Impersonation	Random		

	
Figure	 6.6	 Word	 cloud	 showing	 the	 most	 frequently	 stated	 words	 for	 creating	 an	 Impersonation	 Random	
disguise.	All	words	represent	similarities	with	the	target	face	except	where	specified	as	differences.	

	
For	the	Impersonation	Random	condition,	models	took	a	roughly	similar	approach	to	their	
disguise	 as	 they	 did	 in	 the	 Impersonation	 Similar	 condition.	 Models	 again	 focused	 on	
creating	similarities	with	the	reference-photo	and	matching	on	a	range	of	features.	It	is	
interesting	 to	 note	 that	 hair,	 an	 external	 feature,	 became	 the	 predominant	 focus	 of	
change,	 and	 there	 was	 slightly	 less	 focus	 on	 internal	 features	 (see	 Figure	 6.6).	 The	
changes	to	internal	and	external	features	were	somewhat	more	balanced	in	the	case	of	
impersonation	 similar	 disguises.	 Perhaps	 models	 felt	 it	 harder	 to	 manipulate	 internal	
features	 when	 there	 were	 less	 natural	 similarities,	 additionally	 there	 could	 be	 more	
differences	between	hair	style	in	the	random	matching	to	account	for.	There	was	also	a	
little	less	focus	on	copying	facial	expression	than	there	was	in	the	impersonation	similar	
condition.	 This	 may	 be	 due	 to	 expression	 matching	 resulting	 in	 a	 picture	 matching	 type	
effect,	whereby	the	impersonation	random	models	could	be	revealed	as	an	impersonator	
if	viewers	spot	slight	differences	in	the	copied	pose.		
	
The	 most	 common	 approach	 to	 impersonation	 disguise	 fits	 with	 the	 theoretical	
framework	adopted	in	this	thesis.	Impersonation	disguises	rely	on	successfully	creating	a	
face	 image	 that	 would	 be	 accepted	 as	 falling	 within	 the	 other	 person’s	 face	 space.	 The	
	

204	
	

	
	
models	 make	 a	 clear	 effort	 to	 do	 this,	 as	 they	 focus	 on	 creating	 similarities	 with	 the	
reference	 photograph.	 It	 is	 unknown	 at	 this	 stage	 whether	 the	 changes	 that	 were	 most	
common	 were	 actually	 those	 that	 caused	 most	 difficulties	 for	 matching.	 This	 will	 be	
looked	at	in	section	6.5.	First	I	will	discuss	whether	viewers	can	tell	by	eye	which	disguises	
will	be	effective.		
	

6.4 Can	Viewers	Predict	by	Eye	which	Disguises	will	be	Effective?	
The	next	question	of	interest	is	whether	the	images	rated	to	be	the	best	disguises	were	
those	 that	 caused	 most	 difficulties	 in	 the	 matching	 task.	 This	 investigation	 will	 help	 to	
understand	if	viewers	can	determine	disguise	effectiveness	simply	through	a	side	by	side	
comparison	 of	 the	 target	 and	 disguise	 face	 images	 (evasion,	 impersonation	 similar,	
impersonation	random).		
	
Method	
Participants	
The	independent	viewer	group	(first	mentioned	in	Chapter	5),	who	chose	the	final	stimuli	
for	the	FAÇADE	database,	also	provided	the	participant	group	for	this	study.	
	
Procedure	
The	 independent	 viewer	 group	 provided	 disguise	 effectiveness	 ratings	 for	 each	 of	 the	
selected	database	disguise	images	on	a	scale	from	1-7	with	1	being	a	very	poor	disguise	
and	7	an	extremely	convincing	disguise.		
	
	
	

	

205	
	

	
	
Design	
To	 find	 out	 whether	 the	 viewer	 group	 could	 accurately	 predict	 disguise	 effectiveness,	 I	
compared	the	effectiveness	ratings	for	each	of	the	disguise	images	as	scored	by	the	group	
of	 four	 unfamiliar	 viewers,	 to	 performance	 accuracy	 for	 each	 of	 the	 items	 on	 the	 facematching	 task.	 This	 was	 conducted	 for	 each	 of	 the	 three	 disguise	 conditions	 –	 Evasion,	
Impersonation	Similar	and	Impersonation	Random.			
	
Results		
Pearson	correlation	coefficients	between	ratings	of	by	disguise	effectiveness	of	the	face	
images	 and	 performance	 accuracy	 for	 the	 disguised	 images	 on	 the	 face	 matching	 were	
calculated	to	find	out	whether	effectiveness	ratings	predicted	matching	accuracy.			

Figure	6.7	Graph	showing	correlation	between	effectiveness	rating	and	percentage	of	errors	made	for	each	
Evasion	disguise	item.	Data	points	are	spread	horizontally	if	they	would	otherwise	overlap.	
	

	

206	
	

	
	

Figure	6.8	Graph	showing	correlation	between	effectiveness	rating	and	percentage	of	errors	made	for	each	
Impersonation	Similar	disguise	Item.	Data	points	are	spread	horizontally	if	they	would	otherwise	overlap.	
	
	
	
	
	
	

	

	

	

	
	
	
	
Figure	6.9	Graph	showing	correlation	between	effectiveness	rating	and	percentage	of	errors	made	for	each	
Impersonation	Random	disguise	item.	Data	points	are	spread	horizontally	if	they	would	otherwise	overlap.		

	

207	
	

	
	
Disguise	ratings	were	strongly	correlated	with	performance	on	the	face-matching	task	for	
both	Evasion	disguises	r	=.66,	p<.01	(see	Figure	6.7)	and	Impersonation	Random	disguise	r	
=	 .65,	 p<.01	 (see	 Figure	 6.9).	 For	 these	 conditions	 the	 higher	 the	 effectiveness	 rating	 of	
the	 disguise,	 the	 more	 face-matching	 errors	 were	 made.	 Correlation	 levels	 were	 almost	
significant	in	the	disguise	impersonation	similar	condition,	r	=	.39,	p=.05	(see	Figure	6.8).	
The	Impersonation	Similar	condition	is	naturally	a	more	difficult	condition	for	the	raters	
to	 judge	 based	 on	 the	 similarity	 of	 the	 target	 and	 reference	 photograph.	 There	 was	
however	 a	 highly	 significant	 correlation	 for	 impersonation	 when	 collapsing	 across	
impersonation	type,	r	=	.54,	p<.001.		
	
These	results	suggest	that	the	effectiveness	of	a	disguise	can	be	judged	quite	accurately	
by	simply	showing	a	group	of	viewers	both	the	reference	photograph	and	disguised	image	
side	by	side	and	asking	them	to	rate	how	good	the	disguise	is.		
	

6.5 What	do	Viewers	Believe	Makes	for	an	Effective	Disguise?	
Now	I	have	shown	that	viewers	can	accurately	predict	which	disguises	would	be	effective,	
I	 believe	 it	 is	 interesting	 to	 explore	 what	 it	 was	 about	 a	 disguise	 which	 the	 viewers	
thought	made	it	effective.	It	is	believed	that	as	the	viewers	were	accurate	at	making	their	
disguise	 effectiveness	 decisions,	 their	 insight	 into	 what	 made	 an	 effective	 disguise	 will	
also	be	useful.	Obtaining	an	understanding	of	what	makes	an	effective	disguise	may	make	
it	easier	to	create	successful	disguises	in	the	future;	this	could	have	useful	applied	value	
such	as	aiding	the	disguise	of	undercover	police	officers.		
	
	
	
	
	
	

208	
	

	
	
Method	
Participants	
The	 participants	 were	 again	 the	 4	 raters	 who	 made	 the	 stimuli	 selection	 decisions	
discussed	 in	 Chapter	 5,	 and	 also	 gave	 the	 disguise	 effectiveness	 estimates	 used	 in	 the	
study	above.	
		
Design	
Effective	 disguise	 changes	 were	 categorised	 and	 counted	 for	 each	 of	 the	 three	 disguise	
conditions.		
	
Procedure	
As	part	of	the	stimuli	selection	phase,	four	unfamiliar	viewers	worked	collaboratively	to	
decide	upon	the	best	match	or	mismatch	image	from	a	range	of	options	for	each	model	
for	each	disguise	condition.	During	this	process	I	asked	the	viewers	to	rate	how	good	they	
thought	that	each	of	the	disguises	were	for	each	of	the	selected	images	on	a	scale	from	17.	A	score	of	1	indicated	that	the	disguise	was	very	poor,	whereas	a	score	of	7	meant	that	
the	disguise	was	extremely	effective.	These	were	the	ratings	used	for	the	correlations	in	
section	the	study	above.	In	addition	to	these	ratings,	the	viewers	provided	comments	on	
what	they	thought	made	the	chosen	disguise	images	effective	in	each	of	the	conditions.		
	
The	 comments	 made	 by	 the	 viewers	 were	 coded	 into	 seven	 categories	 of	 change	 –	
internal	 features,	 external	 features,	 expression,	 skin-tone,	 social	 inferences,	 face	 shape	
and	other.		
	
	
	
	

209	
	

	
	
Results	&	Discussion	

	
Figure	 6.10	 Bar	 graph	 showing	 the	 most	 frequent	 forms	 of	 disguise	 for	 Evasion	 (blue),	 Impersonation	
Similar	 (red)	 and	 Impersonation	 Random	 (grey).	 Evasion	 changes	 capture	 differences	 in	 appearance	 with	
the	reference	photograph	whereas	Impersonation	changes	represent	similarities.		

	
Figure	 6.10	 shows	 the	 frequency	 of	 effective	 changes	 by	 category	 for	 each	 of	 the	 three	
disguise	types.	From	this	figure	it	is	evident	that	effective	Evasion	disguise	was	created	by	
many	 different	 types	 of	 changes.	 Changes	 to	 Internal	 (specific	 features	 of	 the	 face)	 and	
External	changes	(changes	to	the	hair	and	also	any	addition	of	props,	including	clothing)	
were	made	to	the	face	(specifically	changes	to	nose	shape,	eyes	and	hair),	but	changes	in	
Expression,	 Skin-tone,	 Personality,	 Face-shape	 and	 Other	 changes	 (e.g.	 changes	 in	
lighting)	 also	 contributed	 greatly	 to	 effective	 Evasion	 disguise.	 Whereas	 the	 models	
tended	to	focus	mostly	on	changing	hair	and	makeup	when	creating	their	disguise	(Figure	
6.2),	 interestingly	 the	 viewer	 group	 also	 picked	 up	 on	 the	 personality	 differences	 and	
changes	to	face-shape	that	these	changes	evoked	and	found	these	to	be	effective	factors	

	

210	
	

	
	
of	 change	 to	 create	 evasion	 disguise.	 For	 example,	 viewers	 described	 a	 disguise	 as	
effective	 if	 they	 thought	 that	 the	 images	 represented	 different	 characters,	 such	 as	 one	
image	 showing	 a	 quiet	 and	 studious	 person	 and	 another	 showing	 a	 more	 outgoing	 and	
party	loving	person	(see	Figure	6.11).	
	

	

	

	

	
	

	

	

	
	
	
Figure	6.11	Image	example	where	social	inferences	were	reported	to	differ	between	the	reference	model	
image	(left)	and	the	model	in	evasion	disguise	(right).		

Using	 the	 numerical	 information	 entered	 into	 the	 bar	 chart	 above,	 a	 binomial	 test	 was	
conducted	 to	 find	 out	 whether	 there	 was	 a	 significant	 difference	 in	 the	 proportion	 of	
Feature	 to	 Non-Feature	 changes	 for	 Evasion	 disguise	 compared	 to	 Impersonation	
disguise.	The	test	proportion	was	calculated	from	the	proportion	of	Feature	(represented	
as	internal	features	and	external	on	the	graph)	and	Non-Feature	(all	other	categories	of	
change	from	the	graph)	changes	made	for	Evasion	disguise,	and	this	was	compared	with	
the	 same	 proportions	 for	 Impersonation	 disguises.	 The	 binomial	 test	 indicated	 that	 for	
Impersonation	Similar	disguises	the	proportion	of	.69	feature	responses	was	higher	than	
the	 expected	 .34,	 p	 <.001.	 This	 same	 pattern	 of	 results	 was	 found	 for	 Impersonation	
Random	 disguises,	 with	 the	 proportion	 of	 .67	 features	 being	 higher	 than	 the	 expected	
.34,	p<.001.		
	
These	 binomial	 results	 highlight	 that	 whereas	 effective	 Evasion	 disguise	 encompassed	
many	 different	 changes,	 across	 many	 different	 categories,	 with	 other	 changes	

	

211	
	

	
	
outweighing	 internal	 and	 external	 changes,	 effective	 Impersonation	 Similar	 and	
Impersonation	 Random	 disguises	 were	 dominated	 by	 the	 internal	 and	 external	 feature	
changes.	The	bar	graph	shows	that	internal	featural	similarities	were	the	most	common	
effective	change	for	impersonation	similar	disguises	(similarities	between	the	nose,	ears,	
eyes,	forehead,	eyebrows	&	mouth).	Changes	to	external	features	followed,	these	were	
generally	related	to	hair.	Personality	and	expression	were	much	less	noted	for	creating	an	
effective	impersonation	than	they	were	for	evasion.	These	results	suggest	that	to	create	
an	 effective	 impersonation	 similar	 disguise	 the	 focus	 should	 be	 on	 creating	 similarities	
between	 the	 internal	 features	 of	 a	 face.	 The	 effective	 Impersonation	 Similar	
manipulations	 were	 very	 similar	 to	 the	 effective	 Impersonation	 Random	 manipulations.	
Internal	 features	 again	 came	 out	 as	 most	 important,	 followed	 by	 hairstyle	 changes	
(External).	 For	 the	 impersonation	 conditions,	 models	 sometimes	 tried	 to	 copy	 any	
distinguishing	 feature	 of	 the	 person	 they	 were	 trying	 to	 impersonate.	 This	 made	 the	
disguise	more	effective	and	was	captured	under	the	category	‘Other’	(see	Figure	6.12	for	
example).	The	presence	of	distinguishing	features	has	been	found	to	affect	face	matching	
accuracy	in	past	studies	of	facial	recognition	(Wells,	et	al.,1998).	
	
	
	
	
	
	
	
Figure	6.12	The	model	(left)	copied	the	distinguishing	feature	(mole	[on	the	left	side	of	the	image	under	the	
mouth])	of	the	target	(right)	by	using	make	up.		

	

	

212	
	

	
	
More	 changes,	 and	 categories	 of	 change	 contributed	 towards	 effective	 Evasion	 disguise	
than	Impersonation	disguise.	Effective	Impersonation	disguises	generally	revolved	around	
changes	to	internal	and	external	features	of	a	face	whereas	Evasion	disguise	also	included	
changes	in	perceived	face-shape	and	personality,	as	well	as	‘other’	manipulations,	which	
were	 those	 not	 captured	 by	 the	 specifically	 labelled	 categories.	 It	 is	 important	 to	
remember	that	previous	research	has	been	limited	to	looking	at	effective	Evasion	disguise	
only,	 and	 even	 within	 that	 Evasion	 disguise	 was	 investigated	 in	 relation	 to	 the	 effect	 of	
specific	props	and	occlusions	of	certain	areas	of	the	face.	No	features	were	occluded	in	
my	disguise	database.	Previous	studies	have	missed	most	of	the	action	by	focusing	only	
on	 the	 category	 that	 I	 named	 External.	 My	 results	 show	 that	 far	 more	 goes	 on	 in	 the	
creation	of	free	disguise,	for	both	Evasion	and	Impersonation,	than	simply	the	addition	of	
props,	such	as	wigs	and	glasses.		
	

6.6 Experiment	13	-	Do	Social	Inferences	Change	for	Disguise?	
The	 graph	 and	 binomial	 analysis	 above	 demonstrated	 that	 the	 viewers’	 disguise	
effectiveness	verdicts	were	influenced	not	only	by	the	featural	types	of	change	explored	
in	previous	research,	but	also	by	aspects	relating	to	social	inferences.	This	was	especially	
true	 in	 the	 case	 of	 evasion	 disguise.	 I	 believe	 this	 is	 an	 interesting	 topic	 to	 investigate	
further	–	most	research	on	disguise	has	used	props	to	change	appearance.	Never	before	
in	 the	 literature	 has	 disguise	 been	 studied	 with	 the	 intention	 of	 investigating	 disguise	
related	changes	in	perceived	personality	traits.		
	
Patterson	 &	 Baddeley	 (1977)	 showed	 that	 encoding	 faces	 in	 terms	 of	 social	 inferences	
was	a	powerful	tool	to	aid	facial	recognition.	In	their	experiment	participants	viewed	face	
images	 during	 a	 learning	 phase	 and	 were	 instructed	 to	 make	 judgments	 on	 either	 the	
features	or	personality	of	the	person	in	the	image.	Participants	who	were	assigned	to	the	
feature	 condition	 made	 ratings	 of	 nose	 size	 (small	 –	 large),	 lip	 size	 (thin	 –	 full),	 width	
between	eyes	(close	together	–	far	apart),	face	shape	(round	–	long).	Those	instructed	to	
make	 personality	 judgments	 rated	 niceness	 (nice	 –	 nasty),	 reliability	 (reliable	 –	

	

213	
	

	
	
unreliable),	intelligence	(intelligent	–	dull)	and	liveliness	(lively	–	stolid)	of	the	individuals	
shown	 in	 the	 images.	 In	 both	 of	 the	 conditions	 participants	 were	 informed	 that	 making	
these	judgments	of	the	face	might	help	them	to	remember	the	faces.	It	was	found	that	
those	 participants	 who	 made	 personality	 trait	 inferences	 for	 the	 faces	 were	 better	 (but	
not	 significantly	 so)	 at	 recognising	 faces	 at	 test	 than	 those	 who	 made	 feature	 related	
judgments.	 This	 trend	 provides	 an	 interesting	 hint	 that	 personality	 perception	 may	
influence	 the	 encoding	 of	 faces,	 but	 cannot	 be	 taken	 as	 evidence	 as	 a	 significant	 result	
was	 not	 found.	 The	 sample	 size	 was	 fairly	 small	 at	 18	 participants	 in	 each	 condition;	 a	
larger	sample	size	may	have	provided	a	significant	result.	Combined	with	the	comments	
received	from	my	viewer	group	regarding	what	makes	a	good	disguise,	it	is	possible	that	
changes	in	social	inferences	may	have	influenced	performance	on	the	face-matching	task	
of	Chapter	5.			
	
Trait	 perception	 research	 has	 generally	 considered	 whether	 accurate	 social	 inferences	
about	 a	 person	 can	 or	 cannot	 be	 acquired	 from	 viewing	 a	 photograph	 of	 that	 person’s	
face.	 Those	 studies	 have	 typically	 considered	 natural	 images,	 meaning	 that	 the	
photographed	person	has	not	been	asked	to	express	any	particular	trait,	but	instead	may	
or	may	not	reflect	aspects	of	their	true	personality	unintentionally	in	photograph.	To	find	
out	 whether	 viewers’	 judgments	 of	 personality	 are	 accurate,	 scores	 from	 personality	
questionnaires	taken	by	the	models	are	generally	compared	with	viewers’	trait	scorings	of	
the	 photograph.	 Several	 studies	 have	 found	 that	 above	 chance	 level	 judgments	 can	 be	
made	 in	 this	 way.	 For	 example	 extraversion	 was	 accurately	 judged	 from	 face	 images	
viewed	for	as	little	time	as	50ms	(Borkenau,	Brecke,	Möttig	&	Paelecke,	2009).	This	social	
judgment	 was	 linked	 to	 the	 display	 of	 joyful	 facial	 expressions	 such	 as	 smiling	 (Kenny,	
Horner,	 Kashy	 &	 Chu,	 1992;	 Borkeneau	 et	 al.	 2009).	 Personality	 judgments	 recorded	 by	
Naumann,	 Vazire,	 Rentfrow	 &	 Gosling	 (2009)	 were	 accurate	 in	 both	 an	 uninstructed	
natural	photograph	and	a	neutral	pose	photograph	condition,	with	the	authors	reporting	
that	 expression	 was	 not	 the	 only	 factor	 to	 aid	 social	 inferences	 from	 a	 photograph.	
Factors	 such	 as	 clothing	 were	 reported	 to	 also	 lead	 decisions,	 as	 well	 as	 general	
appearance	and	the	way	people	held	themselves.	Rule	&	Ambady,	2011	even	suggest	that	
earlier	 life	 photographs	 can	 suggest	 success	 in	 later	 life,	 with	 power	 inferences	 from	
	

214	
	

	
	
college	 yearbooks	 predicting	 later	 leadership	 success.	 These	 findings	 portray	 social	
inferences	as	a	stable	concept	attached	to	an	individual,	which	holds	across	photograph	
and	time.	If	this	is	the	case,	then	social	inferences	could	actually	help	in	identity	judgment	
tasks.		
	
Whereas	most	past	research	looked	at	the	ability	to	take	accurate	social	inferences	from	
photographs	 and	 perceive	 these	 inferences	 as	 a	 stable	 judge	 of	 character,	 in	 some	
situations	 people	 may	 want	 others	 to	 infer	 inaccurate	 social	 inferences	 from	 their	
photographs.	 For	 example	 appearing	 to	 be	 a	 better	 candidate	 for	 a	 job	 or	 enhancing	
likeability	on	a	social	media	account	or	online	dating	profile.	Leikas,	Verkas	&	Lonnqvist	
(2012)	found	that	it	was	possible	for	the	same	person	to	have	different	social	inferences	
made	 about	 them	 across	 different	 photographs.	 Models	 were	 asked	 to	 change	 in	
appearance	to	match	high	and	low	ends	of	the	spectrum	on	each	of	the	Big	5	Personality	
Traits,	 and	 photographed	 in	 each	 of	 these	 conditions	 –	 therefore	 the	 study	 focused	 on	
deliberate	 changes	 in	 appearance	 rather	 then	 incidental	 change	 between	 different	
photographs	of	the	same	face.	Each	of	the	Big	5	Personality	traits	was	described	by	two	
adjectives	 that	 the	 models	 could	 use	 as	 a	 guide	 for	 creating	 their	 appearance	 in	 the	
corresponding	 photograph.	 For	 example	 ‘anxious	 and	 stressed’	 described	 the	 trait	
neurotic,	 whereas	 in	 the	 stable	 condition	 models	 were	 asked	 to	 make	 themselves	 look	
‘stable	 and	 quiet’.	 Models	 were	 limited	 in	 the	 ways	 that	 they	 could	 achieve	 these	
personas,	with	the	paper	stating	that	‘targets	were	not	allowed	to	add,	change	or	remove	
clothing,	hairbands	or	decorative	items,	to	remove	or	add	makeup,	or	to	groom	their	hair	
between	 conditions’	 (Leikas	 et	 al.	 2012).	 Extroversion	 could	 be	 convincingly	 changed	
across	 photographs,	 and	 to	 a	 lesser	 degree	 neuroticism	 and	 conscientious.	 Agreeability	
was	not	successfully	changed	across	posed	photographs.	These	findings	suggest	that	it	is	
possible	to	deliberately	control	social	inferences	in	a	photograph.	I	may	see	even	greater	
changes	in	the	disguise	experiment	as	my	disguise	models	were	given	far	greater	freedom	
regarding	what	they	could	do	to	change	appearance,	and	as	shown	in	the	section	above,	
the	 models	 used	 a	 great	 range	 of	 strategies	 to	 change	 their	 appearance	 in	 addition	 to	
purely	pose	and	expression.		

	

215	
	

	
	
	
It	is	possible,	and	seems	viable	based	on	the	findings	of	section	6.5,	that	social	inference	
may	 change	 as	 a	 result	 of	 disguise	 manipulations.	 Some	 incidental	 change	 is	 expected	
across	multiple	undisguised	images	of	the	same	face.	In	this	next	experiment,	I	will	test	
whether	 social	 inference	 judgments	 differ	 more	 for	 disguise	 change	 than	 naturally	 as	 a	
result	 of	 incidental	 change.	 The	 traits	 that	 will	 be	 explored	 are	 Trustworthiness,	
Attractiveness	and	Dominance.	These	traits	have	been	chosen	as	they	are	the	3	traits	that	
people	 spontaneously	 characterise	 faces	 on,	 according	 to	 PCA	 analysis	 (Todorov,	 Said,	
Engell	&	Oosterhof,	2008;	Sutherland	et	al.	2013).		
	
For	 the	 case	 of	 evasion	 it	 is	 predicted	 that	 there	 will	 be	 a	 greater	 difference	 in	 social	
inferences	 for	 disguise	 related	 change	 (different	 in	 social	 inference	 judgments	 between	
the	 reference	 photograph	 and	 disguise	 photo)	 than	 for	 incidental	 change	 (difference	 in	
social	 inferences	 between	 two	 undisguised	 images	 of	 the	 model).	 As	 models	 creating	
evasion	 disguise	 are	 trying	 not	 to	 look	 like	 themselves	 in	 their	 evasion	 disguise	 image,	
creating	a	situation	where	their	social	inferences	are	changed	in	relation	to	those	made	
for	a	no	disguised	reference	image	may	help	to	hide	their	true	identity.	For	impersonation	
disguise	models	tried	to	look	more	similar	to	the	target,	this	may	have	been	reflected	in	
similar	 social	 inferences.	 For	 the	 case	 of	 impersonation	 disguise	 it	 is	 predicted	 that	
differences	in	social	traits	will	be	larger	for	incidental	change	(the	target	photograph	and	
undisguised	model	image)	than	between	the	impersonation	disguise	image	and	the	image	
of	the	target	person).		
	
Methods	
Participants	
30	undergraduate	students	(M	=	12,	mean	age	=	20.4)	from	the	University	of	York	(who	
had	 not	 taken	 part	 in	 any	 previous	 experiments	 involving	 the	 FAÇADE	 data	 set)	

	

216	
	

	
	
volunteered	 as	 participants	 in	 this	 experiment	 in	 exchange	 for	 a	 small	 cash	 reward	 or	
course	credit.		
Stimuli	&	Design	
The	 stimuli	 were	 all	 the	 images	 from	 the	 FAÇADE	 image	 database,	 presented	 one	 at	 a	
time.	For	each	model	the	following	images	were	shown:	reference,	no	disguise,	evasion	
disguise,	impersonation	similar	disguise,	target	similar	person’s	reference,	impersonation	
random	 disguise	 and	 the	 target	 random	 person’s	 disguise	 reference	 image.	 All	 images	
were	presented	in	a	random	order.		
	
This	is	a	within	subjects	design	whereby	each	participant	rates	each	face	image	on	each	of	
the	three	Traits	(Trustworthiness,	Attractiveness	and	Dominance).	Each	of	the	Traits	were	
rated	in	a	random	order	for	each	face	image.		
	
Procedure	
Participants	viewed	the	stimuli	(one	image	at	a	time)	on	a	computer	screen.	Underneath	
each	 image	 appeared	 the	 question,	 either	 ‘How	 trustworthy	 is	 this	 person’,	 ‘How	
dominant	 is	 this	 person?’	 or	 ‘How	 attractive	 is	 this	 person?’	 The	 image	 remained	 on	
screen	 until	 all	 3	 of	 these	 questions	 had	 been	 answered	 for	 the	 face,	 just	 one	 question	
was	shown	at	a	time	and	the	questions	were	always	asked	in	a	random	order.	Participants	
indicated	 their	 response	 to	 each	 question	 by	 using	 the	 computer	 mouse	 to	 select	 an	
answer	on	a	rating	scale	between	1	(low	score	on	the	trait)	and	7	(high	score	on	the	trait).			
		
Results	and	Discussion		
Social	 inference	 distance	 change	 was	 always	 measured	 according	 to	 the	 square	 root	
distance	 between	 face	 images	 rather	 than	 the	 distance	 between	 means.	 Square	 root	
distance	took	into	account	the	fact	that	some	people	naturally	looked	more	similar	than	
others	 did	 to	 the	 person	 they	 were	 trying	 to	 impersonate	 on	 certain	 traits.	 The	 square	

	

217	
	

	
	
root	 distance	 was	 calculated	 for	 disguise	 to	 target	 face	 images,	 and	 compared	 to	 the	
incidental	change	for	the	corresponding	no	disguise	images.		
	

	
Figure	 6.13	 Example	 illustration	 of	 the	 distance	 calculations	 made	 for	 the	 Evasion	 disguise	 condition.	
Distance	moved	for	incidental	change	was	compared	with	distance	moved	for	disguise	change	for	each	of	
the	3	disguise	conditions	(Evasion,	Impersonation	Similar,	Impersonation	Random).	
	

In	 cases	 of	 evasion	 people	 moved	 significantly	 further	 from	 their	 perceived	 personality	
traits	 (reference	 image)	 compared	 to	 the	 change	 in	 trait	 perception	 for	 another	 no	
disguise	image	of	that	person,	t	(25)	=	7.71,	p	<.001,	CI	=	1.19	to	2.06. The	mean	distance	
moved	in	trait	perception	between	reference	photo	and	no	disguise	image	was	0.63,	SD	=	
0.38,	SE	=	0.08.	The mean	distance	moved	in	trait	perception	between	reference	photo	
and	evasion	photo	was	2.25,	SD	=	0.92,	SE	=	0.18.		
	
To	investigate	where	these	differences	in	traits	lay,	results	were	analysed	as	above,	but	
this	 time	 for	 each	 social	 inference	 (Trustworthiness,	 Dominance	 and	 Attractiveness)	 in	
turn.	There	was	a	significant	difference	in	perceived	trustworthiness	scores	between	the	
reference	and	no	disguise	self	images	M	=	.34,	compared	with	the	difference	between	the	
reference	 and	 evasion	 images	 M	 =	 1.43,	 t(25)	 =	 6.54,	 SE	 =.16,	 CI	 =	 0.74-1.43,	 p<.001.	
	

218	
	

	
	
There	 was	 also	 differences	 in	 perceived	 dominance	 scores	 between	 reference	 and	 no	
disguise	self	images	M	=	.24,	compared	with	the	difference	in	perceived	personality	traits	
for	the	reference	and	evasion	images	M	=	.60,	t(25)	=	3.55,	SE	=.10,	CI	=	0.14-0.56.	p<.005.	
Finally,	there	was	a	significant	difference	in	perceived	attractiveness	scores	between	the	
reference	 and	 no	 disguise	 self	 images	 M	 =	 0.36,	 compared	 with	 the	 difference	 in	
perceived	personality	traits	for	the	reference	and	evasion	images	M	=	1.42,	t(25)	=	5.63,	
SE	 =.19,	 CI	 =	 0.67-1.44,	 p<.001.	 These	 results	 show	 that	 the	 there	 were	 significant	
differences	in	social	inferences	for	each	inference	in	turn	as	well	as	an	overall	effect.	This	
was	 as	 expected	 based	 on	 the	 results	 of	 the	 effective	 disguise	 section	 earlier,	 where	
personality	change	was	listed	as	an	effective	form	of	disguise.			
	
Impersonation	Similar	
For	impersonation	similar	faces	there	was	not	a	significant	difference	between	perceived	
social	inferences	as	rated	as	a	whole	for	the	reference	and	impersonation	images	(Mean	
distance	 =	 1.19),	 compared	 with	 the	 difference	 in	 perceived	 personality	 traits	 for	 the	
reference	 and	 no	 disguise	 self	 images	 (Mean	 distance	 	 =	 1.24),	 t(25)	 =	 .38,	 CI	 =	 -0.31	 –	
0.21,	p	=>.05,	SE	=	.13.	This	was	the	case	for	each	inference	individually	and	overall	(see	
Table	6.1).		
	
	

Distance	between	

Distance	between	

	

target	reference	

target	reference	

image	&	

image	&	model’s	

impersonation	

reference	image	

	

Significance	Level	

image	
All	traits	combined	

1.19	

1.24	

p>.05	

Attractiveness	

.73	

.82	

p>.05	

Dominance	

.38	

.42	

p>.05	

Trustworthiness	

.67	

.60	

p>.05	

Table	6.1	Social	inference	comparisons	for	impersonation	similar	images.	

	

219	
	

	
	
	
Impersonation	Random		
The	 pattern	 of	 results	 for	 impersonation	 random	 items	 was	 similar	 to	 that	 for	
impersonation	 similar	 items	 –	 no	 results	 were	 significantly	 different	 at	 either	 an	 overall	
level	or	trait	breakdown	(see	Table	6.2).		
	
	

Distance	between	

Distance	between	

	

target	reference	

target	reference	

	

image	&	

image	&	model’s	

impersonation	

reference	image	

Significance	Level	

image	
All	traits	combined	

1.31	

1.24	

p>.05	

Attractiveness	

.75	

.97	

p>.05	

Dominance	

.51	

.55	

p>.05	

Trustworthiness	

.75	

.66	

p>.05	

Table	6.2.	Impersonation	random	disguises,	means	and	median	results	for	each	analysis.		

	
The	 results	 for	 impersonation	 disguises	 were	 not	 as	 hypothesised	 –	 impersonation	
disguises	 did	 not	 move	 a	 target	 face	 significantly	 closer	 to	 the	 trait	 perceptions	 of	 the	
model.	This	is	perhaps	not	entirely	surprising,	as	similar	personality	was	not	picked	out	as	
particularly	important	in	the	effective	disguise	section	above.	It	seems	that	in	the	case	of	
impersonation,	 other	 factors,	 such	 as	 featural	 change	 may	 be	 more	 important	 than	
changes	of	social	inference.		
	
Discussion	
Social	 inferences	 differed	 significantly	 in	 the	 case	 of	 evasion	 disguise,	 suggesting	 that	 a	
change	in	these	perceptions	occurs	when	people	are	trying	to	not	look	like	themselves.	
Social	inference	must	be	based	on	physical	appearance.	The	physical	appearance	changes	
	

220	
	

	
	
made	 to	 create	 Evasion	 disguise	 influenced	 social	 inferences.	 Judge	 of	 character	 is	
generally	 believed	 to	 be	 accurate,	 but	 people	 are	 fooled	 by	 social	 inference	 change	 in	
terms	of	identity	judgment	in	the	case	of	evasion	disguise.		
	
Impersonation	disguise	on	the	other	hand	does	not	rely	on	the	match	of	social	inferences	
between	a	model	in	impersonation	disguise	and	an	image	of	the	target	person	that	the	
model	was	trying	to	impersonate.	In	this	scenario	other	factors	of	similarity	seem	to	be	
more	important	e.g.	specific	and	distinguishing	feature	match.	Instead	of	trying	to	mimic	
the	 target	 image	 itself,	 e.g.	 the	 exact	 pose	 and	 expression,	 models	 may	 instead	 be	
attempting	to	portray	another	way	that	the	target	identity	could	appear,	e.g.	a	different	
pose	and	expression.	It	has	already	been	established	that	it	is	harder	to	match	identities	
over	 changes	 in	 expression	 (Bruce	 et	 al.	 1982),	 than	 in	 unchanged	 expression.	 This	 may	
work	to	an	impersonator’s	advantage	in	the	case	of	disguise,	and	also	explain	why	social	
inferences	 are	 not	 matched	 for	 impersonation	 similar	 disguise.	 If	 for	 example,	 a	 model	
chooses	to	try	and	look	like	the	target	person	when	they	are	in	a	happier	or	angrier	mood,	
the	 social	 inferences	 drawn	 from	 the	 face	 may	 differ	 even	 if	 changing	 the	 expression	
makes	the	images	look	more	similar	in	identity	overall.		
	

6.7 General	Discussion	
In	summary,	the	disguise	models	applied	many	different	manipulations	to	create	each	of	
their	disguises.	Viewers	were	able	to	predict,	by	looking	at	a	target	image	and	disguised	
comparison	 image,	 whether	 the	 disguise	 was	 effective	 or	 not	 (with	 relation	 to	 the	
percentage	 of	 errors	 made	 for	 that	 face	 in	 the	 face-matching	 task	 Experiment	 10).	
Furthermore,	manipulations	that	make	an	effective	evasion	disguise	are	different	to	those	
that	 make	 an	 effective	 impersonation	 disguise.	 Whilst	 all	 disguises	 involve	 either	 the	
similarity	 or	 difference	 in	 internal	 and	 external	 features,	 evasion	 disguise	 also	
encompassed	 more	 non-feature	 factors	 including	 changes	 in	 expression	 and	 perceived	
personality.	Finally,	Experiment	13	confirmed	that	social	inference	change	occurred	only	
in	the	case	of	evasion	disguise.		

	

221	
	

	
	
	
Evasion	disguises	involved	internal	features	and	external	changes,	with	internal	changes	
often	 being	 achieved	 through	 the	 use	 of	 makeup	 and	 also	 changes	 in	 expression.	
Impersonation	 similar	 changes	 involved	 creating	 similarities	 with	 the	 target	 face	 which	
included	similarities	in	hairstyle,	internal	features,	clothes	and	expression.	
	
Impersonation	 random	 disguises	 focussed	 mostly	 on	 creating	 similarities	 in	 external	
features.	 The	 most	 important	 message	 here,	 with	 relation	 to	 previous	 work,	 is	 that	
models	 did	 far	 more	 to	 create	 their	 disguises	 than	 simply	 the	 addition	 of	 props.	 Prop	
additions	and	occlusion	of	features	with	props	have	been	the	most	common	method	of	
creating	 disguise	 stimuli	 for	 disguise	 investigation	 in	 previous	 research.	 My	 research	
shows	that	when	given	the	freedom	to	create	their	own	disguises,	models	use	many	more	
disguise	techniques	other	than	simply	the	addition	of	props.	If	disguise	research	is	to	have	
real	 world	 relevance,	 then	 free	 disguise	 manipulations	 should	 be	 allowed	 when	
constructing	model	stimuli.		
	
Additionally,	this	chapter	reiterates	the	differences	between	Evasion	and	Impersonation	
disguise.	 Chapter	 5	 highlighted	 that	 Evasion	 caused	 more	 matching	 difficulties	 than	
Impersonation.	 This	 chapter	 goes	 further	 by	 showing	 that	 the	 approach	 taken	 to	 these	
disguises,	 and	 the	 factors	 which	 make	 them	 effective,	 differ	 for	 evasion	 and	
impersonation.	Whereas	both	disguise	types	include	featural	manipulations	made	by	the	
models,	 the	 independent	 viewer	 group	 picked	 out	 more	 changes	 in	 personality	 and	
expression	 as	 factors	 that	 made	 a	 disguise	 effective	 for	 Evasion	 disguise	 than	 in	 either	
case	of	Impersonation.	In	line	with	this	finding,	social	inference	change	occurred	only	for	
Evasion	disguise.	These	results	further	highlight	the	importance	of	a	distinction	between	
evasion	and	impersonation	when	investigating	disguise.		
	

	

222	
	

	
	
One	interpretation	of	these	findings	could	be	that	large	differences	in	appearance	tend	to	
make	 ‘same’	 responses	 unlikely	 and	 be	 accompanied	 by	 changes	 in	 social	 inferences.	
Because	 Evasion	 disguises	 are	 less	 constrained	 than	 Impersonation	 disguises,	 they	 can	
result	in	major	image	differences.	That	leads	to	higher	error	rates	(as	seen	in	Chapter	5)	
and	distinct	social	attributions.		
	
It	seems	that	effective	disguise	does	not	revolve	around	a	simple	recipe	that	would	work	
for	 all	 faces.	 Lots	 of	 different	 changes	 are	 taking	 place,	 and	 some	 of	 these	 changes	 will	
work	better	for	some	individuals	than	others	depending	on	the	natural	appearance	of	a	
face.	What	is	evident	is	that	there	is	more	to	disguise	than	simply	the	addition	of	props	
and	furthermore	Evasion	and	Impersonation	disguise	are	achieved	in	different	ways.		
	
	

	

	

223	
	

	
	

Chapter	7	–	General	Discussion	
7.1 Overview	of	Findings	
The	research	reported	in	this	thesis	investigated	face	recognition	in	challenging	situations.	
The	 introductory	 chapter	 outlined	 that	 the	 critical	 task	 of	 face-matching	 is	 difficult	 for	
unfamiliar	viewers,	yet	trivial	for	familiar	viewers.	These	findings	came	from	past	studies	
that	 used	 cooperative	 stimuli,	 meaning	 that	 for	 same	 face	 pairs,	 the	 person	 who	 was	
photographed	 made	 no	 deliberate	 attempt	 to	 change	 their	 own	 appearance	 across	
multiple	images,	and	images	were	taken	under	constant	conditions	with	extremely	short	
time	 intervals	 between	 photographs.	 In	 cooperative	 stimuli	 tasks,	 different	 face	 trials	
paired	 the	 most	 similar	 faces	 from	 a	 small	 pool	 of	 available	 images	 (Burton	 et	 al,	 1999;	
Bruce	et	al.	1999).	In	this	thesis	I	argued	that	performance	is	likely	even	worse	for	images	
of	 a	 challenging	 nature	 –	 images	 that	 include	 incidental	 or	 deliberate	 face	 variation	 or	
across	 identity	 similarities	 and	 reduced	 image	 quality.	 I	 explain	 that	 there	 may	 even	 be	
limits	to	the	familiarity	advantage	(times	when	familiarity	can	not	completely	compensate	
for	 poor	 performance).	 I	 investigated	 face-matching	 situations	 where	 the	 same	 face	
image	 pairs	 incidentally	 looked	 different	 due	 to	 within	 person	 variation	 across	 ambient	
images,	and	also	where	different	identity	pairs	were	of	extremely	similar	appearance	due	
to	 natural	 facial	 similarities	 between	 celebrity	 and	 lookalike	 faces	 (Chapter	 2).	
Investigation	 continued	 for	 unintentional	 appearance	 change	 due	 to	 change	 of	 camerato-subject	 distance	 (Chapter	 4).	 And	 finally	 I	 examined	 matching	 performance	 for	
deliberate	 disguise,	 where	 an	 individual	 deliberately	 made	 changes	 to	 evade	 their	 own	
identity	or	to	impersonate	someone	else	(Chapter	5).	The	manipulations	made	to	create	
the	 disguise	 face-matching	 stimuli	 were	 explored	 (Chapter	 6).	 I	 also	 tested	 ways	 of	
improving	performance	for	challenging	face	images	(Chapter	3).	In	total,	16	studies	were	
conducted	 with	 the	 aim	 of	 furthering	 understanding	 of	 face	 recognition	 in	 challenging	
situations.		
	
The	 investigation	 began	 by	 testing	 face-matching	 performance	 for	 images	 in	 which	
different	identity	pairs	consisted	of	faces	that	were	extremely	similar	to	each	other	and	
	

224	
	

	
	
same	 identity	 pairs	 included	 naturally	 captured	 variation	 within	 a	 face	 (Chapter	 2).	
Celebrity	 lookalike	 images	 were	 used	 as	 naturally	 occurring	 imposter	 faces,	 and	 these	
images	 allowed	 the	 creation	 of	 difficult	 different	 face	 pairs	 (one	 image	 of	 the	 celebrity	
and	one	image	of	a	lookalike	for	that	celebrity)	which	appeared	in	a	face-matching	task	
along	with	same	identity	pairings	(two	different	[ambient]	images	of	the	same	celebrity’s	
face).	Participants	were	instructed	to	make	same	or	different	identity	judgments	for	each	
of	 the	 image	 pairs.	 Experiment	 1	 demonstrated	 a	 graded	 effect	 of	 familiarity	 for	 the	
lookalike	task	-	unfamiliar	viewers	made	many	identity	matching	errors,	whereas	familiar	
viewers	 performed	 with	 near	 perfect	 accuracy.	 My	 pattern	 of	 findings	 in	 terms	 of	
familiarity	and	performance	demonstrated	lower	accuracy	on	my	celebrity	lookalike	task	
(mean	 performance	 accuracy	 =	 72%)	 than	 performance	 for	 unfamiliar	 viewers	 on	 the	
GFMT,	 a	 standardised	 face-matching	 task,	 which	 contained	 cooperative	 stimuli	 (mean	
performance	accuracy	long	version	=	89.9%,	short	version	[hardest	40	items]	=	81.2%).	To	
model	 the	 applied	 problem	 of	 varied	 image	 quality,	 Experiments	 2	 and	 3	 proceeded	 to	
make	the	task	harder	still	by	reducing	image	quality	through	pixelation.	The	images	were	
degraded	 making	 them	 challenging	 but	 also	 realistic	 of	 the	 image	 type	 often	 acquired	
from	 zoomed	 in	 digital	 images.	 A	 graded	 familiarity	 advantage	 survived	 through	
Experiment	2,	however	for	the	highest	pixelation	level	(Experiment	3)	performance	was	
around	 chance	 for	 all	 but	 the	 extremely	 familiar	 viewers.	 These	 findings	 highlight	 firstly	
that	 performance	 for	 challenging	 stimuli	 image	 matching	 is	 even	 worse	 than	 the	 poor	
performance	 already	 established	 for	 cooperative	 face	 image	 matching.	 Additionally,	 the	
findings	 highlight	 familiarity	 as	 a	 graded	 concept.	 Finally	 my	 findings	 suggest	 that	 the	
familiarity	 advantage	 has	 limits	 –	 only	 extreme	 familiarity	 with	 a	 face	 could	 help	 in	 the	
case	 of	 coarse	 images,	 and	 this	 advantage	 remained	 only	 for	 same	 identity	 trials	
(Experiment	3).	
	
I	found	that	image	quality	affected	overall	face-matching	performance,	and	the	influence	
that	 familiarity	 could	 have	 on	 performance.	 Matching	 accuracy	 was	 lower	 for	 degraded	
(mid-pixelated	 and	 coarsely	 pixelated)	 versions	 of	 the	 celebrity	 and	 lookalike	 images	
(Experiments	 2	 &	 3)	 than	 for	 the	 un-manipulated	 (fine	 quality)	 versions	 of	 the	 images	
(Experiment	 1).	 In	 forensic	 investigations,	 the	 images	 that	 are	 available	 for	 comparison	
	

225	
	

	
	
often	have	this	pixelated	appearance.	It	was	therefore	important	to	investigate	ways	of	
improving	 identification	 accuracy	 for	 these	 poor	 quality	 images	 (Chapter	 3).	 Chapter	 3	
explored	 whether	 techniques	 that	 have	 been	 successful	 in	 improving	 cooperative	 facematching	 accuracy,	 could	 also	 improve	 accuracy	 for	 challenging	 images.	 Blurring	 the	
pixelated	 images	 (Experiment	 4),	 pooling	 judgments	 via	 crowd	 analysis	 (Experiment	 5)	
and	 superior	 ability	 of	 super-recognisers	 (Experiment	 6)	 all	 improved	 face-matching	
performance	for	challenging	images.	I	also	found	that	some	of	these	techniques	could	be	
combined	for	additional	benefits.		
	
I	next	looked	at	a	naturally	occurring	image	manipulation	–	change	of	camera-to-subject	
distance	–	to	investigate	how	this	change	affected	facial	appearance	for	individuals	across	
photographs,	 and	 whether	 such	 changes	 in	 appearance	 caused	 difficulties	 for	 facematching.	 Thus,	 this	 was	 an	 investigation	 of	 an	 unintentional	 change	 to	 appearance,	
which	 may	 in	 turn	 influence	 identity	 judgment.	 Measuring	 face	 images	 (Experiment	 7)	
showed	 that	 changing	 camera-to-subject	 distance	 resulted	 in	 non-linear	 changes	 in	
distances	 between	 features	 of	 a	 face,	 such	 that	 ratio	 measurements	 between	 features	
were	not	preserved	when	camera-to-subject	distance	was	altered	between	photographs.	
Experiment	8	showed	that	these	differences	in	facial	configurations	across	images	caused	
matching	 difficulty	 for	 unfamiliar	 viewers.	 Performance	 for	 same	 identity	 pairs	 was	
poorer	when	matching	across	images	of	varied	camera-to-subject	differences,	compared	
to	 performance	 for	 images	 taken	 from	 the	 same	 camera-to-subject	 distance.	 Familiar	
viewers	 were	 unaffected	 by	 the	 camera-to-subject	 manipulation.	 I	 also	 showed	 that	
viewers	 compensate	 for	 changes	 to	 camera-to-subject	 distance	 when	 distance	 cues	 are	
available	(Experiment	9),	implying	a	high-level	perceptual	constancy	for	face	shape.		
	
The	 focus	 of	 investigation	 moved	 next	 to	 intentional	 appearance	 changes	 –	 deliberate	
disguise.	 I	 created	 the	 FAÇADE	 image	 database	 as	 a	 resource	 that	 includes	 evasion	 and	
impersonation	 disguise	 stimuli	 as	 well	 as	 undisguised	 comparison	 images	 of	 the	 same	
identities.	 These	 images	 made	 it	 possible	 to	 perform	 a	 direct	 comparison	 between	
disguised	 and	 undisguised	 face-matching	 performance.	 I	 found	 that	 deliberate	 disguise	
	

226	
	

	
	
impaired	 face-matching	 for	 unfamiliar	 viewers	 (Experiment	 10),	 even	 when	 participants	
were	 informed	 of	 the	 disguise	 manipulations	 (Experiment	 11).	 Not	 all	 disguises	 caused	
equal	 difficulty.	 Evasion	 impaired	 performance	 more	 than	 impersonation.	 Moreover	
matching	accuracy	was	higher	when	the	target	and	model’s	impersonations	were	based	
on	 random	 matching	 than	 when	 they	 were	 of	 naturally	 similar	 appearance	 prior	 to	 the	
impersonation	 disguise.	 Interestingly,	 familiar	 viewers	 performed	 better	 than	 unfamiliar	
viewers	overall	(Experiment	12),	but	they	too	were	worse	at	matching	evasion	faces	than	
undisguised	face	pairs.	Chapter	6	was	more	exploratory,	and	investigated	how	the	models	
in	the	FAÇADE	database	disguised	themselves	and	what	made	for	an	effective	disguise.	A	
social	inference	experiment	concluded	the	investigation	to	find	out	whether	the	disguise	
manipulations	 applied	 affected	 the	 social	 inferences	 made	 for	 the	 individual.	 Viewers	
made	 significantly	 different	 social	 trait	 inferences	 about	 the	 models,	 when	 they	 viewed	
an	undisguised	image	of	the	model’s	face	than	when	they	viewed	the	same	model’s	face	
in	 evasion	 disguise.	 However,	 for	 the	 impersonation	 scenario,	 social	 inference	 ratings	
were	not	significantly	more	similar	between	images	of	the	target	face	and	impersonation	
face,	than	between	the	undisguised	model	image	and	impersonation	image.	This	suggests	
that	whereas	social	inference	related	changes	are	important	for	creating	evasion	disguise,	
other	factors	are	more	important	for	creating	impersonation	disguise.		
	
In	 summary,	 the	 results	 of	 this	 thesis	 show	 that	 face-matching	 performance	 can	 be	
impaired	 both	 by	 incidental	 changes	 in	 appearance	 (Chapters	 2	 &	 4)	 and	 by	 deliberate	
changes	 in	 appearance	 (Chapter	 5).	 There	 are	 several	 methods	 of	 improving	 this	 poor	
performance	(Chapter	3).	In	some	instances	natural	solutions	can	be	exploited,	including	
taking	 performance	 from	 those	 with	 high	 face	 recognition	 aptitude	 (super-recognisers,	
Experiment	 6)	 and	 high-level	 perceptual	 constancy	 can	 account	 for	 camera-to-subject	
distance	related	face	changes	(Chapter	4).	Familiarity	improved	performance	throughout.	
I	 explored	 what	 people	 did	 to	 disguise	 themselves,	 and	 what	 works.	 Disguise	 is	 more	
complex	than	previous	studies	have	allowed	for.	For	example	evasion	and	impersonation	
result	 in	 different	 levels	 of	 difficulty	 and	 the	 disguises	 themselves	 were	 created	 using	
different	 techniques	 and	 facial	 manipulations.	 Effective	 disguise	 involves	 more	 than	 the	
simply	the	addition	of	props,	but	it	is	not	the	case	that	there	is	a	simple	disguise	formula	
	

227	
	

	
	
which	 can	 be	 applied	 to	 all	 –	 some	 disguise	 manipulations	 and	 combinations	 work	 for	
some	 individuals	 but	 not	 for	 others.	 Not	 all	 questions	 relating	 to	 disguise	 could	 be	
answered	within	this	work.		
	

7.2 Relation	to	Previous	Research	
Previous	research	had	already	established	that	unfamiliar	face-matching	performance	is	
poor.	 All	 of	 my	 experiments	 support	 this,	 as	 I	 consistently	 found	 poor	 performance	 for	
unfamiliar	 viewers	 (Chapter	 2,	 Chapter	 4,	 Chapter	 5).	 My	 findings	 add	 to	 this	 past	
research	by	showing	that	accuracy	for	challenging	(but	realistic)	image	conditions	is	lower	
than	for	cooperative	face-matching,	something	that	was	already	known	to	be	poor	(e.g.	
Burton	 et	 al.	 1999).	 In	 support	 of	 previous	 findings,	 my	 results	 show	 much	 better	
performance	 for	 familiar	 viewers	 than	 unfamiliar	 viewers	 in	 each	 of	 my	 face-matching	
tasks	(Chapter	2,	Chapter	4,	Chapter	5).	I	also	found	a	graded	familiarity	effect	for	facematching	 performance	 similar	 to	 that	 demonstrated	 by	 Clutterbuck	 &	 Johnston	 (2001,	
2003),	and	implemented	a	new	familiarity	scale	to	capture	this	concept	(Experiment	1).	
There	 are	 also	 some	 important	 differences	 between	 my	 findings	 and	 past	 familiarity	
advantages.	In	past	research	familiarity	has	generally	led	to	performance	to	be	at	ceiling	
(e.g.	 Burton	 et	 al.	 1999;	 Bruce	 et	 al.	 2001),	 and	 this	 has	 made	 it	 difficult	 to	 quantify	
changes	 in	 performance.	 The	 challenging	 nature	 of	 my	 tasks	 took	 familiar	 viewer	
performance	off	ceiling	level	(Chapter	2,	Chapter	5),	revealing	important	gradations	at	the	
end	of	the	range.	Furthermore,	limits	of	familiarity	began	to	become	clear	in	Experiment	
3,	 when	 familiarity	 was	 tested	 to	 destruction.	 The	 graded	 nature	 of	 the	 familiarity	
advantage	ceased	to	exist	for	the	coarse	image	version	of	the	lookalike	task.	However	the	
advantage	did	not	break	down	completely;	extremely	familiar	viewers	outperformed	the	
unfamiliar	 viewers	 at	 the	 task,	 but	 this	 was	 only	 true	 for	 same	 person	 trials	 and	 may	
reflect	 the	 greater	 variation	 of	 appearances	 held	 for	 familiar	 faces,	 accessed	 when	 the	
identity	 of	 the	 target	 celebrity	 is	 recognised	 by	 the	 viewer.	 Familiar	 viewers	 performed	
the	same	task	with	high	levels	of	accuracy	for	both	same	and	different	face	image	pairs	
when	the	image	quality	was	good		(Experiment	1)	or	slightly	degraded	(Experiment	2)	and	
past	studies	report	that	familiar	viewers	remain	unaffected	by	image	degradation	(Burton	

	

228	
	

	
	
et	al.	1999).	These	results	thus	suggest	that	the	combination	of	the	similarity	of	faces	and	
severely	degraded	image	quality	made	the	task	difficult	for	familiar	viewers,	rather	than	
either	of	these	factors	taken	in	isolation.		
	
Although	I	began	to	find	limits	of	familiarity	in	Experiment	3,	it	was	in	Experiment	12	that	
I	 saw	 the	 performance	 of	 familiar	 viewers	 suffer	 due	 solely	 to	 the	 model	 images	 being	
challenging,	 rather	 than	 due	 to	 challenging	 images	 combined	 with	 degraded	 image	
quality.	 In	 Experiment	 12,	 I	 found	 that	 for	 evasion	 disguise	 faces,	 familiarity	 improved	
performance	 compared	 to	 performance	 for	 unfamiliar	 viewers,	 but	 familiarity	 did	 not	
completely	 compensate	 for	 the	 effect	 of	 disguise.	 Familiar	 viewers	 made	 more	 errors	
when	matching	evasion	disguise	face	pairs	than	undisguised	versions	of	these	pairs.	The	
results	 of	 Experiments	 10,	 11	 &	 12	 demonstrate	 that	 the	 evasion	 of	 identity	 created	 a	
more	 difficult	 face-matching	 scenario	 than	 impersonation	 of	 identity.	 The	 scenario	 in	
Experiment	12	where	familiar	viewers	performed	with	lower	accuracy	for	disguised	and	
undisguised	 faces	 occurred	 for	 evasion	 disguise	 only.	 Indeed	 evidence	 from	 other	
experiments	in	this	thesis	is	consistent	with	the	deliberate	disguise	findings.	The	celebrity	
lookalike	 images	 presented	 in	 Experiment	 1,	 for	 which	 familiar	 participants	 performed	
with	very	few	errors,	was	reflective	of	an	impersonation	scenario	as	lookalikes	could	be	
considered	as	impersonators	of	the	celebrity.	Therefore	results	are	agreeing	that	familiar	
viewers	 are	 generally	 able	 to	 perform	 highly	 for	 matching	 faces	 that	 involve	
impersonation	 (Experiments	 1	 &	 12)	 but	 evasion	 disguise	 can	 cause	 familiar	 viewers	
matching	difficulty	(Experiment	12).				
	
Familiar	 viewers	 outperformed	 unfamiliar	 viewers	 in	 all	 tasks	 (including	 the	 evasion	
condition	 described	 above).	 However	 there	 are	 many	 applied	 situations	 e.g.	 passport	
security	checks	and	monitoring	of	CCTV	images,	where	it	is	not	possible	for	viewers	to	be	
familiar	with	the	faces	concerned.	With	this	in	mind,	I	was	keen	to	investigate	methods	
that	had	previously	been	found	to	be	effective	in	improving	face-matching	performance	
and	apply	them	to	my	tasks	which	involved	more	challenging	stimuli	in	this	thesis.	I	found	
that	 both	 image	 manipulation	 (blurring	 of	 pixelated	 images)	 and	 crowd	 analysis	 could	
	

229	
	

	
	
improve	 accuracy	 for	 pixelated	 face-matching	 (Experiments	 4	 &	 5).	 I	 also	 found	 that	
super-recognisers	made	more	accurate	judgments	than	comparison	observers.	This	result	
demonstrates	 that	 super-recognisers’	 superior	 face	 recognition	 ability	 extends	 beyond	
good	quality	images	and	implies	that	their	high	performance	does	not	rely	solely	on	fine	
scale	information	in	face	images.	This	finding	therefore	supports	the	notion	of	recruiting	
super-recognisers	 for	 face	 recognition	 roles,	 and	 extends	 the	 past	 evidence	 for	 this	
proposal	by	showing	that	super-recognisers	also	hold	a	superior	ability	for	matching	very	
challenging	 image	 pairs	 as	 well	 as	 cooperative	 image	 pairs	 (e.g.	 White	 et	 al.,	 2015).	
Additionally,	 for	 the	 specific	 case	 of	 improving	 performance	 on	 images	 with	 changed	
camera-to-subject	 distance,	 providing	 accurate	 distance	 cues	 is	 a	 means	 of	 boosting	
performance	(Experiment	9).		
	
In	 terms	 of	 past	 research	 on	 deliberate	 disguise,	 my	 findings	 support	 the	 most	 basic	
finding	 of	 previous	 face	 recognition	 research:-	 disguise	 presence	 impaired	 face	
recognition.	None	of	the	present	findings	go	against	those	of	previous	studies,	however	
my	 research	 builds	 upon	 previous	 disguise	 research	 in	 several	 important	 ways.	 Past	
research	 had	 studied	 disguise	 with	 a	 focus	 solely	 on	 evasion.	 A	 key	 finding	 from	 the	
experiments	 presented	 here	 is	 that	 disguise	 cannot	 be	 understood	 as	 a	 unitary	
manipulation.	I	found	consistent	differences	between	evasion	and	impersonation	disguise	
in	terms	of	matching	accuracy	(higher	accuracy	for	impersonation)	and	also	ways	in	which	
the	 disguises	 were	 executed	 by	 the	 models	 (higher	 relevance	 on	 external	 features	 for	
impersonation).	 Additionally	 I	 showed	 that	 there	 is	 more	 to	 disguise	 than	 occlusion	 of	
facial	 features.	 Past	 research	 relied	 almost	 entirely	 on	 occlusion	 manipulations	 such	 as	
glasses	and	facial	hair.	Those	manipulations	tell	us	rather	little	about	disguise.	Any	form	
of	 occlusion	 is	 apt	 to	 reduce	 matching	 accuracy	 as	 it	 reduces	 the	 available	 information.	
Moreover	occlusion	also	would	not	be	an	effective	disguise	in	many	security	scenarios,	as	
the	 props	 that	 occluded	 features	 may	 have	 to	 be	 removed	 for	 identity	 decisions	 to	 be	
reached.	 Here	 I	 have	 shown	 that	 disguise	 can	 impair	 face	 recognition	 even	 without	
occluding	 features.	 In	 addition	 to	 this,	 I	 highlighted	 the	 importance	 of	 free	 disguise.	 By	
giving	 participants	 the	 freedom	 to	 disguise	 themselves	 as	 they	 wished,	 rather	 than	 as	
prescribed	 by	 the	 experimenters,	 I	 was	 able	 to	 show	 that	 people	 naturally	 use	 many	
	

230	
	

	
	
different	methods	to	disguise	themselves	and	that	some	methods	are	more	effective	than	
others.	 Models	 did	 far	 more	 than	 simply	 add	 props	 in	 creating	 their	 disguise	
manipulations.	 Investigation	 of	 what	 made	 a	 successful	 disguise	 showed	 that	 there	 was	
no	 simple	 disguise	 recipe.	 Contrary	 to	 the	 implicit	 assumption	 of	 previous	 studies,	
approaches	 to	 disguise	 are	 rather	 idiosyncratic	 -	 some	 disguise	 manipulations	 (e.g.	
manipulating	 hairstyle,	 copying	 or	 changing	 expression)	 were	 effective	 in	 the	 case	 of	
some	identities	but	not	for	others.	Overall,	effective	Evasion	involved	more	non-feature	
based	 changes,	 whereas	 effective	 impersonation	 was	 created	 through	 internal	 and	
external	featural	manipulations.		
	

7.3 Theoretical	Implications	
Up	 until	 now	 this	 discussion	 has	 focussed	 upon	 the	 practical	 advances	 gained	 from	 my	
research,	 realistic	 challenging	 images,	 performance	 enhancements	 and	 insights	 into	
deliberate	 disguise.	 The	 experiments	 that	 I	 conducted	 also	 resulted	 in	 implications	 for	
theory.	Throughout	the	thesis	I	argued	that	my	findings	support	a	theoretical	standpoint	
on	face	learning	that	has	within	person	variability	at	its	heart:	what	viewers	are	learning	
when	they	become	familiar	with	a	face	is	all	of	the	different	ways	which	that	face	can	look	
(Jenkins	et	al.,	2011).	The	range	of	possible	appearance	can	be	constructed	as	a	form	of	
face	 space	 that	 is	 specific	 for	 that	 identity.	 Figure	 7.2	 depicts	 this	 face	 space	 as	 a	
multidimensional	volume,	encapsulating	the	range	of	accepted	appearances	for	that	face.	
Any	face	image	that	falls	inside	the	volume	is	accepted	as	that	individual,	and	any	image	
that	falls	outwith	this	volume	is	not.	Exposure	to	a	person’s	face	helps	viewers	to	refine	
the	 face	 space	 for	 that	 individual.	 Less	 familiar	 viewers	 also	 hold	 their	 face	 space	
representations	 of	 individuals	 but	 their	 face	 space	 is	 less	 refined	 as	 a	 result	 of	 their	
limited	 exposure	 to	 the	 face	 and	 limited	 experience	 of	 appearances	 that	 the	 face	 can	
take,	making	unfamiliar	viewers	more	likely	to	make	identity	judgment	errors.	Face	space	
is	refined	as	exposure	increases,	therefore	refinement	of	face	space	reflects	the	graded	
familiarity	 advantage	 found	 in	 Experiments	 1	 &	 2.	 I	 found	 that	 performance	 on	 my	
celebrity	 lookalike	 task	 improved	 as	 a	 result	 of	 increasing	 familiarity	 of	 the	 faces	
concerned.	For	viewers	who	were	unfamiliar	with	the	celebrity	faces,	the	lookalike	face	

	

231	
	

	
	
image	 likely	 fell	 within	 the	 accepted	 range	 of	 faces	 for	 the	 celebrity	 leading	 to	 an	
incorrect	identity	match	decision.	As	familiarity	increased,	less	of	these	errors	occurred,	
with	the	lookalike	image	more	often	falling	outside	the	real	celebrity’s	face	space.		
	
My	 findings	 also	 suggest	 arguments	 against	 both	 the	 configural	 and	 featural	 account	 of	
face	recognition.	Whilst	configural	and	featural	details	of	a	face	may	be	used	to	aid	face	
recognition	in	some	ways,	I	suggest	that	neither	configural	nor	featural	information	is	the	
key	 to	 face	 recognition.	 Experiment	 7	 showed	 that	 configurations	 of	 a	 face,	 as	
represented	 in	 images,	 are	 not	 constant	 (e.g.	 Kleinberg	 et	 al.	 2007).	 In	 Experiment	 7,	
distances	 between	 features	 underwent	 non-linear	 changes	 as	 a	 result	 of	 change	 in	
camera-to-subject	 distance.	 But	 only	 unfamiliar	 viewers	 were	 affected	 by	 the	 distance	
manipulation.	 Familiar	 viewers	 performed	 highly	 no	 matter	 the	 camera-to-subject	
distance,	 supporting	 the	 theory	 that	 familiar	 viewers	 have	 expertise	 of	 all	 the	 different	
ways	that	a	face	can	look.	If	they	have	previously	seen	the	face	across	a	range	of	distance	
and	 image	 conditions,	 then	 photographs	 of	 the	 same	 person	 taken	 from	 different	
distances	would	still	fall	into	the	accepted	range	of	faces	that	a	familiar	viewer	holds	for	
the	face,	but	outside	the	accepted	range	of	faces	held	by	an	unfamiliar	viewer.		
	
	
	
	
	
	
Figure	7.1	Schematic	representations	of	the	disguise	manipulations	with	regards	to	face	space.	Each	bubble	
represents	one	individual’s	face	space.		
	

	

232	
	

	
	
The	theoretical	standing	also	fits	alongside	the	experimental	results	for	disguise.	Evasion	
disguise	 led	 to	 more	 errors	 than	 impersonation	 disguise.	 With	 relation	 to	 face	 space,	
there	are	more	ways	that	a	person	can	leave	their	own	face	space	(evade	identity),	than	
enter	someone	else’s	(impersonate	identity),	for	example	changing	hair	colour	may	make	
someone	 look	 unlike	 themselves,	 but	 changing	 hair	 colour	 would	 only	 help	 for	
impersonation	if	the	hair	colour	was	changed	to	match	that	of	the	person	that	you	want	
to	look	like.	Impersonation	involves	moving	appearance	outwith	the	accepted	face	space	
for	your	own	face,	and	into	the	accepted	space	of	someone	else.	Impersonation	change	is	
thus	 limited	 to	 one	 direction	 (changes	 that	 take	 you	 towards,	 and	 ideally	 into,	 the	 face	
space	of	the	person	who	is	to	be	impersonated)	whereas	evasion	can	involve	change	in	
any	direction	as	long	as	the	face	is	removed	out	of	its	own	face	space	(see	Figure	7.1).		
	
Exploration	 and	 Experiment	 13	 in	 Chapter	 6	 suggested	 a	 more	 complex	 component	 to	
face	space	with	relation	to	disguise.	Many	different	techniques	were	used	by	my	models,	
and	in	various	combinations,	to	create	the	various	evasion	and	impersonation	disguises.	
Some	disguises	worked	for	some	identities	but	not	so	well	for	others.	It	is	possible	that	
certain	 faces	 were	 more	 unique	 or	 generic,	 hence	 making	 it	 easier	 or	 harder	 for	 the	
models	to	move	out	of	their	own	face	space	or	into	another’s.	Making	changes	to	a	face,	
which	lead	to	a	change	in	social	inference,	was	found	to	be	an	effective	way	of	moving	a	
face	outwith	its	own	face	shape	in	the	case	of	evasion	disguise.	
	
Evasion	disguise	was	linked	to	change	in	social	inference	whereas	impersonation	disguise	
was	 not.	 Familiar	 viewers	 were	 impaired	 only	 by	 evasion	 of	 identity,	 not	 impersonation	
disguise.	Notably	this	is	the	only	disguise	change	that	was	linked	to	significant	changes	in	
inferred	 social	 inferences.	 Familiar	 viewers	 had	 an	 additional	 advantage	 for	 identifying	
impersonation	faces	over	evasion	faces.	In	impersonation	trials	the	familiar	viewers	were	
familiar	 with	 both	 the	 impersonator	 and	 the	 person	 being	 impersonated	 –	 therefore	
there	were	two	ways	that	they	could	approach	the	matching	task.	The	viewer	could	either	
perceive	 the	 true	 identity	 of	 the	 impersonator,	 or	 decide	 that	 the	 impersonator	 falls	
outside	the	accepted	range	for	the	person	being	impersonated.	In	future	studies	it	would	
	

233	
	

	
	
be	possible	to	dissociate	these	possibilities	by	conducting	a	study	in	which	viewers	were	
familiar	with	only	one	of	the	faces	in	each	pair.	In	any	case	it	is	clear	that	impersonation	is	
harder	to	accomplish	than	evasion	because	the	direction	of	change	is	constrained.	
	
What	is	not	yet	known	is	exactly	what	it	is	that	people	are	learning	in	a	face	when	they	
are	 becoming	 familiar	 with	 it.	 Face	 space	 is	 proposed	 as	 a	 multidimensional	 space.	 An	
identity	may	change	in	appearance	across	many	of	these	dimensions,	but	for	recognition	
to	occur	the	match	must	presumably	fall	within	the	expected	range	for	at	least	one	of	the	
dimensions.	 I	 have	 talked	 about	 face	 space	 in	 a	 rather	 abstract	 representation	 as	
described	by	Burton	et	al.	(2015).	Dimensions	that	are	relatively	unaffected	by	disguise,	
could	 be	 identified	 by	 applying	 Principal	 Components	 Analysis	 (PCA)	 to	 multiple	 images	
(e.g.	 20	 undisguised	 images)	 of	 the	 model	 in	 order	 to	 establish	 a	 face	 space	 for	 the	
identity.	New	images	of	the	identity	could	then	be	entered	into	this	face	space	along	with	
the	disguised	images	to	see	where	each	of	these	images	fall.	This	would	show	if	there	are	
any	 dimensions	 that	 remain	 constant	 within	 the	 disguise	 face	 images,	 and	 also	 perhaps	
reveal	which	dimensions	of	change	are	necessary	in	the	creation	of	effective	evasion	and	
impersonation	 disguises.	 As	 it	 stands	 it	 is	 unclear	 what	 it	 is	 that	 people	 are	 recognizing	
when	they	are	successfully	seeing	through	disguise	to	recognizing	a	face.	It	may	be	that	
critical	 information	 is	 the	 same	 for	 every	 case	 of	 disguise,	 although	 it	 is	 likely	 to	 be	
different	 for	 evasion	 and	 impersonation.	 This	 interpretation	 is	 based	 on	 the	 differences	
found	between	these	types	of	disguises	so	far.	It	is	also	possible	that	different	things	are	
used	to	see	through	each	disguise	created	by	each	model	because	the	underlying	face	of	
each	 of	 these	 models	 is	 different	 to	 begin	 with.	 This	 type	 of	 investigation	 for	 disguise	
could	 also	 aid	 the	 understanding	 of	 undisguised	 (normal)	 face	 recognition.	 Whatever	
familiar	 viewers	 can	 ‘see	 through’	 in	 the	 case	 of	 disguise	 cannot	 be	 what	 is	 critical	 for	
identification.	
	
	
	

	

234	
	

	
	

7.4 Practical	Implications	
Several	 practical	 implications	 follow	 from	 my	 research.	 First	 of	 all	 there	 is	 a	 need	 to	
acknowledge	 that	 face-matching	 involving	 challenging	 images	 is	 less	 reliable	 than	
cooperative	face-matching.	Moreover,	merely	acknowledging	this	problem	is	not	enough.	
In	 a	 striking	 demonstration	 of	 this	 (Experiment	 11),	 being	 aware	 that	 a	 face	 may	 be	
disguised	 did	 not	 improve	 matching	 performance.	 Instead	 specific	 methods	 such	 as	
blurring	 of	 pixelated	 images	 (Experiment	 4),	 crowd	 analysis	 of	 identity	 judgments	
(Experiment	5)	and	personnel	selection	(Experiment	6)	will	likely	be	required.	Even	then	
errors	will	not	be	eliminated	entirely.			
	
Of	 all	 the	 manipulations	 in	 this	 thesis,	 familiarity	 improved	 performance	 most.	 Familiar	
viewers	are	thus	recommended	to	make	identity	judgments	for	similar	challenging	images	
in	 forensic	 situations.	 As	 the	 graded	 effects	 in	 Experiments	 1-4	 emphasise,	 a	 little	
familiarity	 with	 a	 face	 is	 better	 than	 none	 at	 all,	 and	 a	 lot	 is	 better	 than	 a	 little.	 In	
summary,	the	more	familiar	a	person	is	with	the	face	or	faces	concerned,	the	more	likely	
the	correct	identity	judgment	will	be	reached.		
	
In	 Experiment	 8,	 I	 found	 that	 photographic	 conditions	 –	 specifically	 camera-to-subject	
distance	 –	 could	 impair	 identity	 judgments	 for	 unfamiliar	 faces.	 My	 Experiments	 2	 &	 3	
show	a	similar	decrease	in	performance	for	pixelated	faces.	However	in	Experiment	8,	I	
confirmed	that	changing	the	distance	from	which	a	photograph	is	taken	from	can	make	it	
difficult	to	compare	identity	across	images.	This	finding	highlights	that	using	photographs	
for	 identity	 comparisons	 is	 problematic	 even	 when	 image	 quality	 is	 high.	 Distance	 cues	
can	aid	recognition	when	these	are	available,	but	cues	are	not	readily	available	from	most	
photograph	images,	especially	images	used	for	identity	confirmation,	as	these	images	are	
normally	cropped	around	the	face,	removing	any	background	related	cues	that	may	have	
indicated	subject	to	camera	distance.	I	found	that	images	are	more	accurately	matched	
when	 taken	 from	 the	 same	 distance.	 A	 practical	 implication	 from	 this	 would	 be	 that	
where	possible,	consistency	should	be	applied	when	photographing	individuals	in	security	

	

235	
	

	
	
and	forensic	scenarios.	For	example,	police	could	photograph	all	suspects	from	the	same	
distance	 across	 all	 photos	 of	 an	 individual.	 This	 standardisation	 would	 make	 it	 easier	 to	
identify	the	same	person	across	multiple	images	and	also	make	it	less	likely	that	a	suspect	
would	 be	 mistaken	 for	 another	 person	 in	 a	 side-by-side	 image	 comparison	 scenario.	
Camera-to-subject	 distance	 changes	 would	 however	 remain	 problematic	 for	 archived	
images	 and	 some	 CCTV	 footage,	 but	 introducing	 standardisation	 to	 station	 captured	
images	may	aid	some	identity	scenarios.		Using	familiar	viewers	to	identify	faces	taken	of	
different	 distances	 will	 largely	 overcome	 the	 problem	 as	 they	 are	 far	 less	 affected	 by	
camera-to-subject	distance	change.	
	
My	 disguise	 research	 has	 implications	 for	 applied	 settings.	 I	 show	 that	 people	 are	 very	
poor	 at	 matching	 disguised	 face	 images,	 suggesting	 that	 it	 is	 rather	 easy	 to	 carry	 out	 a	
successful	 disguise.	 Disguises	 that	 involve	 evading	 identity	 are	 more	 likely	 to	 attract	
errors	 than	 disguises	 that	 involve	 the	 impersonation	 of	 somebody	 else.	 There	 are	
practical	 implications	 here	 in	 terms	 of	 both	 criminal	 cases	 of	 identity	 fraud	 and	 also	
undercover	 policing.	 In	 terms	 of	 criminal	 disguise	 activity,	 my	 research	 highlights	 that	
cases	 of	 impersonation	 and	 evasion	 may	 go	 undetected.	 Successful	 evasion	 disguise	 is	
particularly	concerning	as	disguise	could	make	it	particularly	difficult	to	catch	a	suspect.	
Impersonation	 disguise	 carries	 many	 security	 threats	 in	 terms	 of	 identity	 fraud,	 which	
could	 have	 catastrophic	 results	 particularly	 in	 terms	 of	 allowing	 a	 dangerous	 or	
unauthorised	person	into	a	country	or	providing	them	with	access	to	information,	which	
their	true	identity	should	not	be	granted.	Disguise	is	problematic	for	face	recognition,	and	
errors	 can	 pose	 danger.	 Therefore	 steps	 discussed	 above	 should	 be	 taken	 to	 try	 and	
minimise	 fraud	 and	 identity	 evasion	 through	 disguise.	 The	 disguise	 results	 do	 however	
suggest	that	undercover	police	can	successfully	keep	their	true	identity	hidden	by	using	
evasion	disguise.		
	
There	 are	 also	 some	 important	 implications	 from	 this	 thesis	 in	 relation	 to	 the	
experimental	 psychology	 research	 practice	 for	 the	 study	 of	 face	 recognition.	 First	 my	
findings	 reiterate	 the	 importance	 of	 a	 distinction	 between	 familiar	 and	 unfamiliar	 face	

236	
	

	
	
matching,	and	extend	this	to	provide	further	evidence	for	a	graded	familiarity	advantage	
in	 face	 recognition.	 Past	 studies	 have	 tested	 face-matching	 performance	 using	
cooperative	face-matching	stimuli.	I	suggest	that	measuring	performance	for	cooperative	
stimuli	 images	 does	 not	 capture	 face-matching	 performance	 levels	 for	 uncooperative	
stimuli.	 I	 quantified	 the	 performance	 costs	 for	 various	 challenging	 situations	 and	 found	
lower	 accuracy	 levels	 for	 the	 images	 that	 I	 tested,	 than	 those	 reported	 by	 previous	
studies.	It	is	important	to	study	uncooperative	image	performance,	especially	when	face	
recognition	 performance	 is	 so	 closely	 linked	 to	 many	 security	 systems	 or	 included	 in	
witness	testimonials	for	crimes.	Suspects	may	deliberately	make	their	identification	effort	
difficult.	I	also	highlight	important	distinctions	within	disguise.	The	previous	research	on	
disguise	was	based	on	evasion.	I	have	extended	upon	this	in	two	ways;	i)	by	investigating	
evasion	disguise	as	being	more	than	just	the	addition	of	prescribed	props	which	occlude	
facial	features,	and	ii)	by	looking	at	impersonation	which	had	been	completely	ignored	by	
past	 studies	 of	 disguise.	 Individuals	 spontaneously	 use	 many	 different	 types	 of	
manipulations	 to	 disguise	 themselves,	 and	 these	 disguises	 impair	 identity	 performance	
even	when	there	is	no	occlusion	of	internal	facial	features.	My	research	has	found	clear	
distinctions	 between	 evasion	 and	 impersonation,	 both	 in	 terms	 of	 face-matching	
performance	and	the	kinds	of	disguise	manipulations	applied.	Future	research	will	have	to	
acknowledge	 these	 distinctions	 if	 we	 are	 to	 arrive	 at	 a	 complete	 understanding	 of	
deliberate	disguise.			
	

7.5 Future	Directions	
This	 thesis	 expands	 the	 research	 for	 face	 recognition	 in	 challenging	 situations,	 however	
there	 are	 areas	 of	 investigation	 that	 I	 believe	 warrant	 future	 study.	 Firstly,	 it	 would	 be	
interesting	 to	 test	 face	 memory	 performance	 for	 the	 FAÇADE	 database	 images.	 I	 found	
poorer	 performance	 accuracy	 for	 matching	 disguised	 faces	 than	 undisguised	 faces.	
Matching	 faces	 is	 presumably	 an	 easier	 task	 than	 face	 recognition	 tasks	 that	 involve	 a	
memory	component	(Megreya	&	Burton,	2008).	Previous	memory	studies	have	tested	for	
recognition	 of	 disguised	 faces	 in	 direct	 comparison	 to	 performance	 for	 memory	 of	
identical	 images	 (Patterson	 &	 Baddeley,	 1977).	 My	 FAÇADE	 database	 includes	 different	

	

237	
	

	
	
images	 of	 the	 same	 faces	 undisguised	 and	 in	 disguise,	 and	 evasion	 and	 impersonation	
disguise	 images.	 This	 database	 would	 thus	 allow	 a	 more	 thorough	 and	 experimentally	
sound	test	of	face	memory	performance	for	disguised	versus	undisguised	faces,	and	also	
allow	 performance	 for	 evasion	 disguise	 to	 be	 compared	 with	 impersonation	 disguise.	
Performance	could	be	even	worse	for	disguise	in	a	memory	task,	as	in	the	matching	task	
both	images	could	be	compared	in	side-by-side	comparison.	It	is	not	obvious	whether	the	
differences	 in	 performance	 found	 for	 evasion	 and	 impersonation	 would	 remain	 in	 a	
memory	 scenario.	 I	 believe	 this	 to	 a	 be	 a	 particularly	 interesting	 comparison,	 as	
impersonation	 could	 trick	 viewers	 to	 incorrectly	 ‘remember’	 the	 real	 target	 face	 and	
reject	 an	 image	 presented	 of	 the	 true	 identity	 undisguised.	 There	 are	 many	 reports	 of	
erroneous	 eyewitness	 memory	 for	 studies	 that	 do	 not	 include	 impersonation	 disguise	
(Bruce,	 1988),	 therefore	 impersonation	 disguise	 may	 increase	 error	 rates.	 Additionally,	
evasion	 disguise	 may	 result	 in	 rejection	 of	 the	 true	 identity	 face	 presented	 in	 an	
undisguised	form.		
	
It	 would	 also	 be	 interesting	 to	 test	 whether	 super-recognisers	 may	 also	 be	 able	 to	
improve	 performance	 accuracy	 in	 the	 case	 of	 disguise	 images.	 This	 would	 need	 to	 be	
confirmed	by	future	research.	If	super-recognisers	do	perform	better	than	our	unfamiliar,	
or	 even	 possibly	 familiar	 controls,	 then	 super-recognisers	 could	 be	 of	 great	 help	 in	
identity	 efforts	 involving	 disguise.	 It	 would	 be	 important	 to	 explore	 whether	 superrecognisers	 or	 familiar	 viewers	 provide	 a	 more	 accurate	 viewer	 group	 to	 call	 on	 for	
making	identity	decisions.	Based	on	the	only	directly	comparable	data	in	this	thesis	for	an	
unfamiliar	group	of	super-recognisers	(Experiment	6	[mean	performance	accuracy	for	this	
group	=	76%])	and	the	highest	familiarity	group	of	comparison	participants	(Experiment	2	
[mean	performance	accuracy	for	this	group	=	76%]),	I	would	predict	that	unfamiliar	superrecognisers	and	familiar	comparison	viewers	are	equally	good	at	matching	faces.		
	
Another	interesting	investigation	for	the	future	research	would	be	to	explore	the	effect	of	
methods	 of	 improving	 performance	 (Experiments	 4,	 5	 and	 6)	 in	 the	 combinations	 that	
were	not	addressed	in	this	thesis.	I	found	that	when	I	tested	methods	in	combination,	this	
	

238	
	

	
	
led	to	further	improvements	in	performance	than	any	of	the	methods	used	alone.	Due	to	
the	limited	opportunity	to	work	with	super-recognisers	I	was	unable	to	test	all	of	these	
combinations.	Specifically,	it	is	currently	unknown	whether	blurring	the	pixelated	images	
would	 have	 further	 improved	 the	 performance	 of	 super-recognisers,	 and	 also	 whether	
combining	 all	 three	 methods	 could	 result	 in	 improvements	 greater	 than	 those	 from	
combining	any	two	methods.		
	
A	key	future	experiment	will	be	the	Principal	Component	Analysis	(PCA)	of	disguised	faces	
outlined	in	the	theoretical	advances	section	in	this	discussion.	The	proposed	experiment	
will	 allow	 theoretical	 advancement	 in	 terms	 of	 what	 it	 is	 that	 allows	 viewers	 to	 see	
through	 disguise.	 This	 has	 particular	 relevance	 to	 the	 theory	 of	 face	 space,	 and	 the	
refinement	of	face	space	with	familiarity.	The	familiarity	component	of	this	theory	can	be	
explored	 through	 manipulating	 the	 number	 of	 reference	 face	 images	 entered	 into	 the	
PCA.	 This	 will	 help	 to	 establish	 what	 it	 is	 about	 a	 face	 which	 stays	 constant	 in	 disguise,	
and	whether	there	are	reliably	constant	factors	for	it	at	all,	or	if	factors	differ	by	identity.			
	
Finally,	I	believe	it	would	be	interesting	to	investigate	machine	performance	for	disguised	
faces.	 Computer	 algorithms	 are	 now	 being	 used	 in	 many	 security	 situations	 including	
passport	 security,	 but	 as	 with	 human	 performance,	 machine	 performance	 has	 generally	
been	 tested	 using	 cooperative	 stimuli.	 Attempts	 to	 test	 machine	 performance	 for	
disguised	 faces	 have	 many	 of	 the	 same	 limitations	 as	 previous	 human	 disguise	
investigations,	 namely	 testing	 has	 been	 limited	 to	 evasion	 disguise	 where	 images	 were	
disguised	through	occlusion	of	features.	The	props	which	cause	occlusion	disguise	would	
be	removed	in	most	security	scenarios,	therefore	testing	machine	performance	on	these	
images	 is	 of	 limited	 use	 for	 security	 algorithms	 used	 in	 passport	 control.	 Real	 cases	 of	
identity	fraud	more	often	include	impersonation	or	the	morphing	of	a	new	holder’s	face	
with	the	face	of	the	true	passport	owner.	Impersonation	is	therefore	particularly	relevant	
in	 terms	 of	 security	 related	 identity	 fraud,	 but	 has	 not	 yet	 been	 tested	 in	 terms	 of	
machine	 performance.	 My	 FAÇADE	 database	 provides	 stimuli	 void	 of	 occlusion	 of	
features,	and	includes	evasion	and	impersonation	disguise.	Future	studies	could	therefore	
	

239	
	

	
	
use	 this	 database	 to	 more	 fully	 explore	 machine	 performance	 for	 disguised	 faces	 and	
compare	this	performance	for	undisguised	versions	of	the	same	faces.	As	data	has	already	
been	 collected	 for	 both	 unfamiliar	 and	 familiar	 human	 viewers,	 machine	 performance	
could	then	be	directly	compared	to	performance	for	each	of	these	human	viewer	groups,	
helping	 to	 answer	 the	 question	 of	 whether	 humans	 or	 machine	 are	 better	 at	 matching	
disguised	faces.	Greater	performance	accuracy	could	perhaps	be	achieved	through	fusing	
the	performance	of	humans	and	machine,	this	has	been	a	successful	method	of	improving	
face	recognition	accuracy	in	past	studies	(O’Toole	et	al.	2007).			
	
In	summary,	previous	estimates	of	identification	accuracy	are	likely	to	be	overestimates	if	
the	people	who	were	being	identified	were	cooperating	with	their	identity	effort.	Applied	
cases	 of	 face	 recognition	 will	 more	 likely	 include	 images	 with	 incidental	 or	 deliberate	
differences	 to	 their	 own	 appearance	 or	 similarities	 with	 another	 person,	 as	 have	 been	
investigated	by	this	thesis.	Familiar	viewers	are	generally	considered	to	be	exceptionally	
good	at	identifying	faces	and	this	thesis	has	shown	that	this	familiarity	advantage	indeed	
extends	to	performance	for	both	incidental	and	deliberate	appearance	change.	However,	
challenging	 situations	 start	 to	 reveal	 limits	 of	 this	 advantage.	 For	 example,	 familiar	
viewers’	 face-matching	 performance	 is	 impaired	 by	 evasion	 disguise	 –	 this	 raises	
interesting	 questions	 about	 what	 appearance	 change	 a	 familiar	 viewer	 will	 allow	 to	
classify	an	individual	as	still	being	the	same	person.	The	results	of	this	thesis	suggest	that	
familiar	viewers	are	better	than	unfamiliar	viewers	for	the	experience	of	a	face	that	they	
have	 learnt.	 There	 are	 also	 many	 applied	 implications	 of	 this	 research.	 For	 example,	
identity	situations	in	the	real	world	are	often	linked	to	impersonation	disguise,	however	
there	 has	 been	 a	 great	 deal	 of	 disconnect	 between	 lab	 experiments	 and	 the	 real	 world	
problem	 of	 identity	 fraud,	 as	 impersonation	 had	 not	 been	 addressed	 by	 previous	 lab	
experiments.	I	found	that	impersonation	is	harder	to	pull	off	than	evasion	disguise.	Poor	
performance	for	matching	evasion	disguise	faces	is	problematic	for	capturing	criminals	or	
missing	 people	 on	 police	 watch	 lists,	 but	 suggests	 that	 evading	 identity	 of	 undercover	
police	 officers	 will	 likely	 be	 effective.	 There	 is	 still	 a	 lot	 more	 to	 learn	 about	 face	
recognition	 in	 challenging	 situations,	 especially	 for	 the	 case	 of	 disguise,	 and	 I	 hope	 the	
FAÇADE	database	will	continue	to	aid	this	investigation.	
	

240	
	

	
	

References	
Alain,	 C.,	 &	 Proteau,	 L.	 (1980).	 Decision	 making	 in	 sport.	 In	 C.H.	 Nadeau,	 K.M.	 Halliwell,	
K.M.	 Newell,	 &	 G.C.	 Roberts	 (Eds.),	 Psychology	 of	 motor	 behaviour	 and	 sport.	 465-477.	
Champaign,	IL:	Human	Kinetics.		
Andrews,	 T.	 J.,	 &	 Ewbank,	 M.	 P.	 (2004).	 Distinct	 representations	 for	 facial	 identity	 and	
changeable	 aspects	 of	 faces	 in	 the	 human	 temporal	 lobe.	 NeuroImage,	 23(3),	 905–913.	
doi:10.1016/j.neuroimage.2004.07.060	
Aristotle,	Jowett,	B.,	&	Davis,	H.	W.	C.	(1920).	Aristotle's	Politics.	Oxford:	At	the	Clarendon	
Press.	
Bachmann,	T.	(1991).	Identification	of	spatially	quantised	tachistoscopic	images	of	faces:	
How	many	pixels	does	it	take	to	carry	identity?	European	Journal	of	Cognitive	Psychology,	
3,	85-103.	
Behrmann	 M,	 Avidan	 G	 2005	 Congenital	 prosopagnosia:	 face-blind	 from	 birth.	 Trends	
Cogn.	Sci.	9,	180–187	
Bindemann,	M.,	Attard,	J.,	Leach,	A.	M.	Y.,	&	Johnston,	R.	A.	(2013).	The	Effect	of	Image	
Pixelation	on	Unfamiliar-Face	Matching,	717(November),	707–717.	
Bindemann,	M.,	Burton,	A.M.,	Leuthold,	H.,	&	Schweinberger,	S.R.	(2008).	Brain	potential	
correlates	 of	 face	 recognition:	 Geometric	 distortions	 and	 the	 N250r	 brain	 response	 to	
stimulus	repetitions.	Pscyhophysiology,45(4),	535-544.		
Bobak,	 A.	 K.,	 Dowsett,	 A.	 J.,	 &	 Bate,	 S.	 (2016).	 Solving	 the	 border	 control	 problem:	
Evidence	 of	 enhanced	 face	 matching	 in	 individuals	 with	 extraordinary	 face	 recognition	
skills.	PLoS	ONE,	11(2),	1–13.	doi:10.1371/journal.pone.0148148	
Bonner,	 L.,	 Burton,	 A.M.	 &	 Bruce,	 V.	 (2003).	 Getting	 to	 know	 you:	 how	 we	 learn	 new	
faces.	Visual	Cognition.	10,	527-536.		

	

241	
	

	
	
Borkenau,	 P.	 Brecke,	 S.	 Mottig,	 C.,	 &	 Paelaecke,	 M.	 (2009).	 Trait	 inferences:	 Sources	 of	
validity	at	zero	acquaintance.	Journal	of	Personality	and	Social	Psychology,	62,	645-657.		
Bradshaw,	J.L.,	&	Wallace,	G.	(1971).	Models	for	the	processing	and	identification	of	faes.	
Perception	and	Psychophysics,	9,	443-448.		
Brigham,	 J.C.&	 Bothwell,	 R.K.	 (1983).	 The	 ability	 of	 prospective	 juroros	 to	 estimate	 the	
accuracy	of	eyewitness	identification.	Law	and	Human	Behaviour.	7,	19-30.			
Bruce,	V.	(1982).		Changing	faces:	visual	and	non-visual	coding	in	face	recognition.	British	
Journal	of	Psychology.	73,	105-116.		
Bruce,	 V.	 (1986).	 Influences	 of	 familiarity	 on	 the	 processing	 of	 faces.	 Perception,	 15(4),	
387–97.	Retrieved	from	http://www.ncbi.nlm.nih.gov/pubmed/3822723	
Bruce,	V.	(1988)	Recognising	Faces.	London:	Erlbaum.		
Bruce,	 V.	 (1994).	 Stability	 from	 variation:	 The	 case	 of	 face	 recognition	 the	 M.D.	 Vernon	
memorial	 lecture.	 The	 Quarterly	 Journey	 of	 Experimental	 Psychology	 Section	 A:	 Human	
Experimental	Psychology/abstract	content.	47(1),	5-28.		
Bruce,	 V.,	 Doyle,	 T.,	 Dench,	 N.	 &	 Burton,	 M.	 (1991).	 Remembering	 facial	 configurations.	
Cognition,	38(2),	109-144.		
Bruce,	 V.,	 Henderson,	 Z,	 Greenwood,	 K.,	 Hancock,	 P.J.B.	 &	 Burton,	 A.M.	 (1999).	
Verification	 of	 face	 identities	 from	 images	 captured	 on	 video.	 Journal	 of	 Experimental	
Psychology	–	Applied,	5,	339-360.		
Bruce,	 V.,	 Henderson,	 Z.,	 Newman,	 C.,	 Burton,	 A.M.	 (2001).	 Matching	 Identities	 of	
Familiar	 and	 Unfamiliar	 Faces	 Caught	 on	 CCTV.	 Journal	 of	 Experimental	 Psychology:	
Applied,	7,	207-218.		
Bruce,	 V.	 &	 Langton,	 S.	 (1994).	 The	 use	 of	 pigmentation	 and	 shading	 information	 in	
recognising	the	sex	and	identity	of	faces.	Perception,	23,	803-822.	

	

242	
	

	
	
Bruyer,	 R.	 &	 Coget,	 M.	 C.	 (1987).	 Features	 of	 laterally	 displayed	 faces:	 saliency	 or	 top	
down	processing?	Acta	Psychologica,	66,	103-114.	
Bryan,	 R.,	 Perona,	 P.	 &	 Adolphs,	 R.	 (2012).	 Perspective	 Distortions	 from	 Interpersonal	
Distance	Is	an	Implicit	Visual	Cue	for	Social	Judgments	of	Faces.	PLoS	ONE	7(9):	e45301.	
doi:10.1371/journal.pone.0045301	
Burton,	a.	M.,	Kramer,	R.	S.	S.,	Ritchie,	K.	L.,	&	Jenkins,	R.	(2015).	Identity	From	Variation:	
Representations	 of	 Faces	 Derived	 From	 Multiple	 Instances.	 Cognitive	 Science,	 40,	 1–22.	
doi:10.1111/cogs.12231	
Burton,	 a.	 M.,	 Schweinberger,	 S.	 R.,	 Jenkins,	 R.,	 &	 Kaufmann,	 J.	 M.	 (2015).	 Arguments	
Against	 a	 Configural	 Processing	 Account	 of	 Familiar	 Face	 Recognition.	 Perspectives	 on	
Psychological	Science,	10(4),	482–496.	doi:10.1177/1745691615583129	
Burton,	 a.	 M.,	 &	 Vokey,	 J.	 R.	 (1998).	 The	 Face-Space	 Typicality	 Paradox:	 Understanding	
the	 Face-Space	 Metaphor.	 The	 Quarterly	 Journal	 of	 Experimental	 Psychology	 Section	 A,	
51(3),	475–483.	doi:10.1080/713755768	
Burton,	A.	M.,	White,	D.,	&	McNeil,	A.	(2010).	The	Glasgow	Face-matching	Test.	Behaviour	
Research	Methods,	42(1),	286-291.	
Burton,	 A.	 M.,	 Wilson,	 S.,	 Cowan,	 M.,	 Bruce,	 V.	 (1999)	 Face	 recognition	 in	 poor-quality	
video:	 Evidence	 from	 security	 surveillance.	 Psychol	 Sci	 10:	 243–248.	 doi:	 10.1111/14679280.00144	
Clutterbuck,	 R.,	 &	 Johnston,	 R.	 a.	 (2002).	 Exploring	 levels	 of	 face	 familiarity	 by	 using	 an	
indirect	face-matching	measure.	Perception,	31(8),	985–994.	doi:10.1068/p3335	
Clutterbuck,	R.,	&	Johnston,	R.	a.	(2004).	Demonstrating	the	acquired	familiarity	of	faces	
by	using	a	gender-decision	task.	Perception,	33(2),	159–168.		
Clutterbuck,	 R.,	 &	 Johnston,	 R.	 a.	 (2005).	 Demonstrating	 how	 unfamiliar	 faces	 become	
familiar	using	a	face	matching	task.	European	Journal	of	Cognitive	Psychology,	17(1),	97–
116.	doi:10.1080/09541440340000439	

	

243	
	

	
	
Coates,	Tim	(1999).	The	strange	story	of	Adolph	Beck.	London:	Stationery	Office.		
Collishaw,	 S.	 M.,	 &	 Hole,	 G.	 J.	 (2000).	 Featural	 and	 configurational	 processes	 in	 the	
recognition	 of	 faces	 of	 different	 familiarity.	 Perception,	 29(8),	 893–909.	
doi:10.1068/p2949	
Costen,	N.	P.,	Parker,	D.	M.m	&	Craw,	I.	(1994).	Spatial	content	and	spatial	quantization	
effects	in	face	recognition.	Perception,	23,	129-146.		
Costen,	 N.	 P.,	 Parker,	 D.	 M.,	 &	 Craw,	 I.	 (1996).	 Effects	 of	 high-pass	 and	 low-pass	 spatial	
filtering	on	face	identification.	Perception	and	Psychophysics,	58,	602-612.		
Davies,	G.,	Ellis,	H.	D.,	&	Shepherd,	J.	(1978).	Face	recognition	accuracy	as	a	function	of	
mode	 of	 representation.	 Journal	 of	 Applied	 Psychology,	 63(2),	 180–187.	
doi:10.1037//0021-9010.63.2.180	
Dhamecha,	 T.	 I.,	 Singh,	 R.,	 Vasta,	 M.	 &	 Kumar,	 A.	 (2014).	 Recognising	 disguised	 faces:	
Human	and	machine	evaluation.	PlosOne.9(7):	e99212.	
Diamond,	R.	&	Carey,	S.	(1986).	Why	faces	are	and	are	not	special:	an	effect	of	expertise.	
Journal	of	Experimental	Psychology:	General.	115,	107-117.		
Donders,	 F.	 C.	 (1868/1969).	 On	 Speed	 of	 Mental	 Processes.	 Acta	 Psychologica,	 30,	 412431.		
Dowsett,	 A.	 J.,	 &	 Burton,	 a.	 M.	 (2014).	 Unfamiliar	 face	 matching:	 Pairs	 out-perform	
individuals	 and	 provide	 a	 route	 to	 training.	 British	 Journal	 of	 Psychology,	 n/a–n/a.	
doi:10.1111/bjop.12103	
Duchaine,	B.	(2011).	Developmental	Prosopagnosia:	Cognitive,	Neural	and	Developmental	
Investigations.	In	A.	J.	Calder.,	G.	Rhodes,	M.	H.	Johnson,	&	J.	V.	Haxby	(Eds.),	The	Oxford	
Handbook	of	Face	Perception	(pp.821-838).	Oxford,	UK:	Oxford	University	Press.		
Duchaine,	 B.,	 &	 Nakayama,	 K.	 (2006).	 The	 Cambridge	 Face	 Memory	 Test:	 Results	 for	
neurologically	 intact	 individuals	 and	 an	 investigation	 of	 its	 validity	 using	 inverted	 face	

	

244	
	

	
	
stimuli	

and	

prosopagnosic	

participants.	

Neuropsychologia,	

44(4),	

576–585.	

doi:10.1016/j.neuropsychologia.2005.07.001	
Ellis,	H.D.,	Shepherd,	J.W.	&	Davies,	G.M.	(1979).	Identification	of	familiar	and	unfamiliar	
faces	 from	 internal	 and	 external	 features:	 some	 implications	 for	 theories	 of	 face	
recognition.	Perception,	8,	431-439.		
Flow,	 H	 &	 Ebbesen,	 B.	 (2007).	 The	 effect	 of	 lineup	 member	 similarity	 on	 recognition	
accuracy	 in	 simultaneous	 and	 sequential	 lineups.	 Law	 and	 Human	 Behaviour.	 31(1),	 3352.		
Galper,	 R.E.	 (1970)	 Recognition	 of	 faces	 in	 photographic	 negative.	 Psychonomic	
Science,19,	207-208.		
Galton,	 F.	 (1907).	 Vox	 populi	 -	 The	 wisdom	 of	 crowds.	 Nature,	 75,	 450–451.	
doi:10.1038/075450a0	
Ghent,	 L.	 (1960).	 Recognition	 by	 children	 of	 realistic	 figures	 presented	 in	 various	
orientations.	Canadian	Journal	of	Psychology,	14,	249-256.		
Golstein,	A.G.	(1965).	Learning	of	inverted	and	normally	orientated	faces	in	children	and	
adults.	Pscyhonomic	Science,	3,	447-448.		
Grill-Spector,	 K.,	 Kushnir,	 T.,	 Edelman,	 S.,	 Avidan,	 G.,	 Itzchak,	 Y.,	 &	 Malach,	 R.	 (1999).	
Differential	 processing	 of	 objects	 under	 various	 viewing	 conditions	 in	 the	 human	 lateral	
occipital	complex.	Neuron,	24(1),	187–203.	doi:10.1016/S0896-6273(00)80832-6	
Haig,	N.	D.	(1984).	Faces	in	Perception.	Perception,	13,	1158–1165.	
Hancock,	P.	J.	B.,	Bruce,	V.,	&	Burton,	A.	M.	(2000).	Recognition	of	unfamiliar	faces,	4(9),	
330–337.	
Harmon,	L.	&	Julesz,	B.	(1973).	Masking	in	Visual	Recognition:	Effects	of	Two-Dimensional	
Filtered	Noise.	Science.	New	Series,	180.	1194-1197.		

	

245	
	

	
	
Harper,	 B.	 &	 Latto,	 R.	 (2001).	 Cyclopean	 vision,	 size	 estimation,	 and	 presence	 in	
orthostereoscopic	images.	Presebce,	10(3),	312-330.		
Henderson,	Z.,	Bruce,	V.,	&	Burton,	a.	M.	(2001).	Matching	the	faces	of	robbers	captured	
on	video.	Applied	Cognitive	Psychology,	15(4),	445–464.	doi:10.1002/acp.718	
Henle,	 M.	 (1942).	 An	 experimental	 investigation	 of	 past	 experience	 as	 a	 determinant	 of	
visual	perception.	Journal	of	Experimental	Psychology,	30,	1-22.	
Hill,	H.,	&	Bruce,	V.	(1996).	Effects	of	lighting	on	the	perception	of	facial	surfaces.	Journal	
of	 Experimental	 Psychology.	 Human	 Perception	 and	 Performance,	 22(4),	 986.	
doi:10.1037/0096-1523.22.4.986	
Hochberg,	 J.	 &	 Galper,	 R.E.	 (1967).	 Recognition	 of	 faces:	 An	 exploratory	 study.	
Psychonomic	Science,	9,	619-620.		
Hole,	G.	J.,	George,	P.	A.,	Eaves,	K.,	&	Rasek,	A.	(2002).	Effects	of	geometric	distortions	on	
face-recognition	performance.	Perception.	doi:10.1068/p3252	
Home	

Office.	

(2012).	

False	

ID	

Guidance.	

The	

Home	

Office.	

https://www.gov.uk/government/uploads/system/uploads/attachment_data/file/98108/
false-id-guidance.pdf	
Huff,	 CR.	 (1987).	 Wrongful	 Conviction:	 Societal	 tolerance	 of	 injustice.	 Research	 in	 social	
problems	and	public	policy.	4,	99-115.		
Jenkins,	 R.,	 &	 Burton,	 A.	 M.	 (2011).	 Stable	 face	 representations.	 Philosophical	
Transactions	 of	 the	 Royal	 Society	 of	 London.	 Series	 B,	 Biological	 Sciences,	 366(1571),	
1671–1683.	doi:10.1098/rstb.2010.0379	
Jenkins,	 R.,	 &	 Kerr,	 C.	 (2013).	 Identifiable	 images	 of	 bystanders	 extracted	 from	 corneal	
reflections.	PLoS	ONE,	8(12),	8–12.	doi:10.1371/journal.pone.0083325	
Jenkins,	R.,	White,	D.,	Van	Montfort,	X.,	&	Burton,	A.M.	(2011).	Variability	in	photos	of	the	
same	face.	Cognition,	121(3),	313–23.	doi:10.1016/j.cognition.2011.08.001	

	

246	
	

	
	
Johnston,	A.	Hill,	H.	&	Carman,	N.	(1992).	Recognising	faces:	effects	of	lighting	direction,	
inversion	and	brightness	reveral.	Perception,	21,	365-375.		
Kanade,	T.	(1977).	Computer	recognition	of	human	faces.	Stuttgart:	Birkhauser	Verlag.		
Kastner,	 S.,	 Pinsk,	 M.	 A.,	 De	 Weerd,	 P.,	 Desimone,	 R.,	 &	 Ungerleider,	 L.	 G.	 (1999).	
Increased	 activity	 in	 human	 visual	 cortex	 during	 directed	 attention	 in	 the	 absence	 of	
visual	stimulation.	Neuron,	22(4),	751–761.	doi:10.1016/S0896-6273(00)80734-5	
Kelly,	 M.D.	 (1970).	 Visual	 identification	 of	 people	 by	 computer.	 Tech.	 rep.	 AI-130,	
Stanford	AI	Project,	Stanford,	CA.		
Kemp,	 R.	 I.,	 Towell,	 N.	 &	 Pike,	 G.	 (1997).	 When	 seeing	 should	 not	 be	 believing:	
Photographs,	credit	cards	and	fraud.	Applied	Cognitive	Psychology,	11,211–222.		
Kemp,	 R.,	 Pike,	 G.,	 White,	 P.,	 &	 Musselman,	 a.	 (1996).	 Perception	 and	 recognition	 of	
normal	 and	 negative	 faces:	 the	 role	 of	 shape	 from	 shading	 and	 pigmentation	 cues.	
Perception,	25(1),	37–52.		
Kenny,	D.	a.,	Horner,	C.,	Kashy,	D.	a,	&	Chu,	L.	C.	(1992).	Consensus	at	zero	acquaintance:	
replication,	 behavioral	 cues,	 and	 stability.	 Journal	 of	 Personality	 and	 Social	 Psychology,	
62(1),	88–97.	doi:10.1037/0022-3514.62.1.88	
King,	a.	J.,	Cheng,	L.,	Starke,	S.	D.,	&	Myatt,	J.	P.	(2012).	Is	the	true	“wisdom	of	the	crowd”	
to	copy	successful	individuals?	Biology	Letters,	8(2),	197–200.	doi:10.1098/rsbl.2011.0795	
Kleinberg,	 K.	 F.,	 Vanezis,	 P.,	 &	 Burton,	 a	 M.	 (2007).	 Failure	 of	 anthropometry	 as	 a	 facial	
identification	 technique	 using	 high-quality	 photographs.	 Journal	 of	 Forensic	 Sciences,	
52(4),	779–83.	doi:10.1111/j.1556-4029.2007.00458.x	
Krause,	S.,	James,	R.,	Faria,	J.	J.,	Ruxton,	G.	D.,	&	Krause,	J.	(2011).	Swarm	intelligence	in	
humans:	 Diversity	 can	 trump	 ability.	 Animal	 Behaviour,	 81(5),	 941–948.	
doi:10.1016/j.anbehav.2010.12.018	

	

247	
	

	
	
Lander,	 K.,	 Bruce,	 V.,	 &	 Hill,	 H.	 (2001).	 Evaluating	 the	 effectiveness	 of	 pixelation	 and	
blurring	on	masking	the	identity	of	familiar	faces.	Applied	Cognitive	Psychology,	15,	101116.		
Leder,	 H.,	 &	 Carbon,	 C.C.	 (2006).	 Face-specific	 configural	 processing	 of	 relational	
information.	British	Journal	of	Psychology,	97,	19-29.		
Leikas,	 S.,	 Verkasalo,	 M.,	 &	 Lönnqvist,	 J.	 E.	 (2013).	 Posing	 personality:	 Is	 it	 possible	 to	
enact	the	Big	Five	traits	in	photographs?	Journal	of	Research	in	Personality,	47(1),	15–21.	
doi:10.1016/j.jrp.2012.10.012	
Light,	 L.L.;	 Kayra-Stuart,	 F.	 &	 Hollander,	 S.	 (1979)	 Recognition	 memory	 for	 typical	 and	
unusual	 faces.	 Journal	 of	 Experimental	 Psychology:	 Human	 Learning	 and	 Memory.	 Vol	
5(3),	212-228.	http://dx.doi.org/10.1037/0278-7393.5.3.212	
	
Liu,	C.H.	(2003).	Is	Face	Recognition	in	Pictures	Affected	by	the	Center	of	Projection?	IEE	
International	 Workshop	 on	 Analysis	 and	 Modeling	 of	 Faces	 and	 Gestures,	 (Los	 Alamitos,	
CA:	IEEE	Computer	Society),	53-59.			
Loftus,	E.	&	Doyle,	J.	(1992).	Eyewitness	testimony:	Civil	and	Criminal.	Charlottesville,	VA,	
Lexis	Law.		
Matinez,	 A.M	 &	 Benavente,	 R	 (1998).	 The	 AR	 Face	 Database.	 CVC	 Technical	 Report.	
Number	24.	
Maurer,	 D.,	 Grand,	 R.	 Le,	 &	 Mondloch,	 C.	 J.	 (2002).	 The	 many	 faces	 of	 configural	
processing.	 Trends	 in	 Cognitive	 Sciences,	 6(6),	 255–260.	 Retrieved	 from	
http://www.ncbi.nlm.nih.gov/pubmed/12039607	
Megreya	A.M.,	Burton	A.M.	(2007)	Hits	and	false	positives	in	face-matching:	A	familiaritybased	dissociation.	Perception	&	Psychophysics.	69,	1175–1184.	doi:	10.3758/bf03193954	
Megreya,	 A.	 M.,	 Burton,	 a.	 M.	 (2006).	 Unfamiliar	 faces	 are	 not	 faces:	 Evidence	 from	 a	
matching	task.	Memory	&	Cognition,	34	(4),	865–876.		

	

248	
	

	
	
Megreya,	A.M.,	&	Burton,	a.M.	(2008).	Matching	faces	to	photographs.	Poor	performance	
in	 eyewitness	 memory	 (without	 the	 memory).	 Journal	 of	 Experimental	 Psychology:	
Applied,	14(4),	364-372.	
Megreya,	A.M.	&	Bindemann,	M.	(2009).	Revisiting	the	processing	of	internal	and	external	
features	of	unfamiliar	faces	:	The	headscarf	effect.	Perception,	38,	1831-1848.		
Menon,	 A.,	 White,	 D.	 &	 Kemp.	 R.	 (2015).	 Variation	 in	 Photos	 of	 the	 Same	 Face	 Drives	
Improvements	in	Identity	Verification.	Perception.	doi:	10.1177/0301006615599902		
Morrone,	M,	C.,	Burr,	D,	C.,	Ross,	J.	(1983).	Added	noise	restores	recognisability	of	coarse	
quantized	images.	Nature,	305,	226-228.		
O’Donnell,	 C.,	 &	 Bruce,	 V.	 (2001).	 Familiarisation	 with	 faces	 selectively	 enhances	
sensitivity	to	changes	made	to	the	eyes.	Perception,	30(6),	755–764.	doi:10.1068/p3027	
O’Toole,	 A.J.,	 Edelman,	 S.	 &	 Bulthoff,	 H.H.	 (1998).	 Stimulus	 specific	 effects	 in	 face	
recognition	over	changes	in	viewpoint.	Vision	Research.	38,	2351-2363.		
Patterson,	 K.E.	 &	 Baddeley,	 A.D.	 (1977).	 When	 Face	 Recognition	 Fails.	 Journal	 of	
Experimental	Psyhcology:	Human	Learning	and	Memory,	3(4),	406-417.		
Pirenne,	 M.	 H.	 (1970)	 Optics,	 Painting,	 &	 Photography.	 New	 York:	 Cambridge	 University	
Press.	
Pizlo,	 Z.	 (1994).	 A	 theory	 of	 shape	 constancy	 based	 on	 perspective	 invarients.	 Vision	
Research.	34(12).	1637-1658.		
Ramanathan,	 N.,	 Chowdhury,	 A.R.	 &	 Chellappa,	 R.	 (2004).	 Facial	 Similarity	 Across	 Age,	
Disguise,	 Illumination	 and	 Pose.	 Proceedings	 of	 International	 Conference	 on	 Image	
Processing,	3,	1999-2002.		
Rhodes,	 G.,	 Brennan,	 S.	 &	 Carey,	 S.	 (1987).	 Identification	 and	 ratings	 of	 caricatures:	
Implications	for	mental	representations	of	faces.	Cognitive	Psychology.	19,	473-497.	

	

249	
	

	
	
Righi,	 G.,	 Peissig,	 J.J.	 &	 Tarr,	 M.J.	 (2002).	 Recognising	 disguised	 faces.	 Visual	 Cognition.	
20(2),	143-169.		
Robbins,	R.,	&	Mckone,	E.	(2003).	Can	holistic	processing	be	learned	for	inverted	faces?	
Cognition,	88,	79–107.	doi:10.1016/S0	
Robertson,	D.	J.,	Noyes,	E.,	Dowsett,	A.,	Jenkins,	R.,	Burton,	A.	M.,	&	Burton,	M.	(2016).	
Face	 recognition	 by	 Metropolitan	 Police	 Super-recognisers.	 PLoS	 ONE,	 11(2):	 e0150036.	
doi:10.1371/journal.pone.0150036	
Rossion,	 B.	 (2009).	 Distinguishing	 the	 cause	 and	 consequence	 of	 face	 inversion:	 the	
perceptual	

field	

hypothesis.	

Acta	

Psychologica,	

132(3),	

300–12.	

doi:10.1016/j.actpsy.2009.08.002	
Rule,	N.	O.,	&	Ambady,	N.	(2011).	Judgments	of	Power	From	College	Yearbook	Photos	and	
Later	 Career	 Success.	 Social	 Psychological	 and	 Personality	 Science,	 2(2),	 154–158.	
doi:10.1177/1948550610385473	
Russell,	 R.,	 Duchaine,	 B.,	 &	 Nakayama,	 K.	 (2009).	 Super-recognizers:	 people	 with	
extraordinary	 face	 recognition	 ability.	 Psychonomic	 Bulletin	 &	 Review,	 16(2),	 252–257.	
doi:10.3758/PBR.16.2.252	
Sandford,	 A.,	 &	 Burton,	 a.	 M.	 (2014).	 Tolerance	 for	 distorted	 faces:	 Challenges	 to	 a	
configural	 processing	 account	 of	 familiar	 face	 recognition.	 Cognition,	 132(3),	 262–268.	
doi:10.1016/j.cognition.2014.04.005	
Sergent,	J.	(1986).	Microgenesis	of	face	perception.	In	Aspects	of	Face	Processing	(Ellis,	H.	
D.	et	al.,	eds).	17-33,	Martinus	Nijhoff.		
Singh,	R.,	Vatsa,	M.	&	Noore,	A.	(2009).	Face	Recognition	with	Disguise	and	Single	Gallery	
Images.	Image	and	Vision	Computing,	27(3),	245-257.		
	
	

	

250	
	

	
	
Smith,	E.E.	&	Nielsen,	G.D.	(1970).	Representations	and	retrieval	processes	in	short-term	
memory:	 Recognition	 and	 recall	 of	 faces.	 Journal	 of	 Experimental	 Psychology,	 85,	 397405.		
Stevenage,	 S.	 V.	 (1998).	 Which	 twin	 are	 you?	 A	 demonstration	 of	 induced	 categorical	
perception	 of	 identical	 twin	 faces.	 British	 Journal	 of	 Psychology,	 89(1),	 39–57.	
doi:10.1111/j.2044-8295.1998.tb02672.x	
Sutherland,	C.	a	M.,	Oldmeadow,	J.	a,	Santos,	I.	M.,	Towler,	J.,	Michael	Burt,	D.,	&	Young,	
A.	W.	(2013).	Social	inferences	from	faces:	ambient	images	generate	a	three-dimensional	
model.	Cognition,	127(1),	105–18.	doi:10.1016/j.cognition.2012.12.001	
Tanaka,	 J.	 W.	 &	 Farah,	 M.J.	 (1993).	 Parts	 and	 wholes	 in	 face	 recognition.	 The	 Quarterly	
Journal	of	Experimental	Psychology,	46,	225-245.		
Terry,	R.L.	(1993).	How	wearing	eye	glasses	effects	facial	recognition.	Current	Psychology.	
12(2).	151-162.	483-492.		
Terry,	 R.L.	 (1994).	 Effects	 of	 facial	 transformations	 on	 accuracy	 of	 face	 recognition.	 The	
Journal	of	Social	Psychology.	134(4),	151-162.	
Thomson,	D.M.	(1986).	Face	recognition:	More	than	a	feeling	of	familiarity?	In	H.D.	Ellis,	
M.A.	 Jeeves,	 F.	 Newcombe	 &	 A.	 Young	 (Eds.),	 Aspects	 of	 face	 processing.	 Dordrecht:	
Matinus	Nijhoff.		
Todorov,	A.,	Said,	C.	P.,	Engell,	A.	D.,	&	Oosterhof,	N.	N.	(2008).	Understanding	evaluation	
of	faces	on	social	dimensions,	(October),	455–460.	doi:10.1016/j.tics.2008.10.001	
Toseeb,	 U.,	 Keeble,	 D.	 R.	 T.,	 &	 Bryant,	 E.	 J.	 (2012).	 The	 significance	 of	 hair	 for	 face	
recognition.	PLoS	ONE,	7(3),	1–8.	doi:10.1371/journal.pone.0034144	
Towler,	A.,	White,	D.,	&	Kemp,	R.	I.	(2014).	Evaluating	training	methods	for	facial	image	
comparison:	 The	 face	 shape	 strategy	 does	 not	 work.	 Perception,	 43(2-3),	 214–218.	
doi:10.1068/p7676	
Troje,	 N.F.	 &	 Bulthoff,	 H.H.	 (1996).	 Face	 recognition	 under	 varying	 posers:	 the	 role	 of	
	

251	
	

	
	
texture	and	shape.	Vision	Research,	36,	1761	-1771.		
Uttal,	W,	R.,	Baruch,	T.,	Allen,	L.	(1995).	Sequential	image	degradations	in	a	recognition	
task.	Perception	&	Psychophysics,	57,	682-691.		
Valentine,	 T.	 (1991).	 A	 unified	 account	 of	 the	 effects	 of	 distinctiveness,	 inversion,	 and	
race	 in	 face	 recognition.	 Quarterly	 Journal	 of	 Experimental	 Psychology	 Section	 A,	 43(2),	
161-204.		
Webster,M.	 &	 Mollen,	 J.	 (1995).	 Colour	 constancy	 influenced	 by	 contrast	 adaptation.	
Nature.	373,	694-698.		
Wells,	 G.,	 Rydell,	 S.	 &	 Seelau,	 E.	 (1993).	 The	 selection	 of	 distractors	 for	 eyewitness	
lineups.	Journal	of	Applied	Psychology.	78(5).	835-844.	
Wells,	G.L.,	Small,	M.,	Penrod,	S.D,	Malpass,	R.S.,	Fulero,	S.M.,	Brimacombe,	C.A.E.	(1998).	
Eyewitness	 identification	 procedures:	 recommendations	 for	 lineups	 and	 photspreads.	
Law	and	Human	Behaviour.	22,	603-607.		
White,	 D.,	 Burton,	 A.	 M.,	 Kemp,	 R.	 I.,	 &	 Jenkins,	 R.	 O.	 B.	 (2013).	 Crowd	 Effects	 in	
Unfamiliar	Face	Matching,	777(November),	769–777.	
White,	 D.,	 Kemp,	 R.	 I.,	 Jenkins,	 R.,	 &	 Burton,	 A.	 M.	 (2014).	 Feedback	 training	 for	 facial	
image	 comparison.	 Psychonomic	 Bulletin	 &	 Review,	 21(1),	 100–6.	 doi:10.3758/s13423013-0475-3	
White,	 D.,	 Kemp,	 R.	 I.,	 Jenkins,	 R.,	 Matheson,	 M.,	 &	 Burton,	 A.	 M.	 (2014).	 Passport	
officers’	

errors	

in	

face-matching.	

PloS	

One,	

9(8),	

e103510.	

doi:10.1371/journal.pone.0103510	
White,	D.,	Dunn,	J.	D.,	Schmid,	A.	C.,	&	Kemp,	R.	I.	(2015).	Error	rates	in	users	of	automatic	
face	recognition	software.	PLoS	ONE,	10(10),	1–14.	doi:10.1371/journal.pone.0139827	

	

252	
	

	
	
White,	 D.,	 Phillips,	 P.	 J.,	 Hahn,	 C.	 A.,	 Hill,	 M.,	 Toole,	 A.	 J.	 O.,	 &	 White,	 D.	 (2015b).	
Perceptual	 expertise	 in	 forensic	 facial	 image	 comparison,	 Proceedings	 of	 the	 Royal	
Society,	282,	1–8.	
Winograd,	 E.	 (1976).	 Recognition	 memory	 for	 faces	 following	 nine	 different	 judgments.	
Bulletin	of	the	Pscyhonomic	Society,	8,	419-421.	
Woodhead,	M.	M.,	Baddeley,	a.	D.,	&	Simmonds,	D.	C.	V.	(1979).	On	Training	People	to	
Recognize	Faces.	Ergonomics,	22(3),	333–343.	doi:10.1080/00140137908924617	
Yin,	R.	K.	(1969).	Looking	at	upside-down	faces.	Journal	of	Experimental	Psychology,	81(1),	
141–145.	
Young,	 A.W.,	 Hay,	 D.C.,	 McWeeny,	 K.H.,	 Flude,	 B.M.	 &	 Ellis,	 A.W.	 (1985).	 Matching	
familiar	and	unfamiliar	faces	on	internal	and	external	features.	Perception,	14,	737-746.		
	
	
	

	

253	
	


File Typeapplication/pdf
File TitleMicrosoft Word - ThesisEilidhNoyes2.docx
File Modified2017-12-14
File Created2016-07-11

© 2024 OMB.report | Privacy Policy