Preparing Network Data for Pajek

Network data file is prepared for input into Pajek by closely following Graph theory concepts presented above. Non directed network data consist of a vertices set and an edges set. Likewise, a directed network or digraph data has a vertices set and an arcs set. (Pajek is perfectly able to handle both edges set and arcs set in one network data file.) Below is a friendship network data observed in the Hawthorne study, which gives us the famous Hawthorne effect where the mere presence of observers increases workers productivity. It is taken from Roethlisberger and Dickson (1939 :501ff).

A Pajek network file is a simple text file which you can write in Notepad, TextPad or WinEdt for instance. It must be a simple text file; Microsoft™ Word, for example, often gives unseen character which makes the text file not simple at all, so it is to be avoided. Give it extension .net and save it in a directory say c:\temp and take note of this directory. At the end of this you should have a file in your directory c:\temp\hawthorne-friend.net which you can work on later.

More about Pajek network data format

It is a good practice to precede network data with comments, each line marked with /* at the beginning and */ at the end to remind yourself of the description of your data such as its name, sources, and other pertinent information. Follow the example below.

Clearly Pajek network data closely followed Graph theory concepts described above because it has two parts (each marked by asterisk). The first part after the comments must start with *Vertices, that is asterisk and Vertices without a space between them, and the number of vertices. It is followed by a sequence of integers and labels, where the integer sequence starts from 1. The labels are one word or more, in which case it has to be double quoted like: 1 "Inspector 1". The second part is marked by *Edges, that is an asterisk and Edges without a space between them, which is followed by edges list, and ends with a blank line. Please include the last and only one blank line, i.e. your cursor must be on that blank line when you save the file, otherwise Pajek will be confused.

/* Cut from this line to the last line                 */
/* Save this in your local directory, say C:\temp      */
/* give it a name:                                     */
/* hawthorne-friend.net                                */
/* Of course you can just download this data from the  */
/*      link above by right clicking and saving it     */
/*                                                     */
/* Source: Roethlisberger and Dickson 1939 :501ff      */
/*                                                     */
/* An example of non directed or simple social network */
/* It has two parts (each marked by asterisk) which    */
/*   closely followed Graph theory definition of graph */

*Vertices 14
1 I1
2 I3
3 W1
4 W2
5 W3
6 W4
7 W5
8 W6
9 W7
10 W8
11 W9
12 S1
13 S2
14 S4
*Edges
1 5
3 5
3 6
5 6
9 10
9 11
10 11
3 12
5 12
6 12
10 14
11 14
9 12

/* Cut to the last line above. DO NOT INCLUDE THIS LINE  */

If you follow the route of cut and paste, you should have a file with a .net extension in your directory, say c:\temp\hawthorne-friend.net. You are now ready to work with this data in Pajek.

Directed network data format

An example of directed graph or network is taken from Luczkovich et al. (2003) which shows a Malaysian pitcher plant food web. The paper is available from Steve Borgatti (who, with Lin Freeman and Martin Everett, wrote the commercial and excellent UCINET version 6 Software for Network Analysis). The original dataset was described by Beaver (1985).

Two important notes about this data are in order. First, an arc in this data represent predator and prey relationship or predator → prey. However, Luczkovich et al. (2003) shows the food web as a flow of energy from prey to predator. As noted in graph theory concepts underlying social network analysis, only a transpose is needed to fit this interpretation. Follow the instruction below.

Second, vertices labels are given in a separate file to illustrate that labels are not absolutely necessary. Inclusion of labels, which is of secondary consideration, can be done as given in the instruction below. This data also illustrates a shortcut to listing a set of arcs. Three arcs each of value one by default: a → b, a → c, a → d can be summarised by a b c d.

/*  malaysia.net                                              */
/*  Luczkovich, et al. 2003. Defining and measuring trophic   */
/*  role similarity in food webs using regular equivalence    */
/*  Journal of Theoretical Biology 220(3) : 303-321  page 309 */
/*  read into Pajek                                           */
/*  First (to show flow of energy):                           */
/*  do Net > Transform > Transpose                            */
/*  do control-G    to draw sociogram and move them about     */
/*                                                            */
/*  Second (to include labels):                               */
/*  do Net > Transform > Add > Vertices labels : malaylab.net */

*Vertices 19
*Arcslist
1 4 6 17 5 13 12 9 8 7
3 13 12 8 7
4 6 5 9
2 6 5
13 16
12 16
9 16
9 19
8 16 18
7 16 18
15 16 19
14 16 19
6 18
5 18
16 19 18
11 18
10 18

Further comments on network boundary: The selection of subjects to be included in the analysis of social networks, also known as network boundary problem (Doreian and Woodard 1994; Scott 1991 :56ff; Laumann et al 1983; Marsden, forthcoming), is not always trivial. As Scott noted, "researchers are involved in a process of conceptual elaboration and model building " when drawing network boundaries. This process of course is not trivial in any empirical work. For a recent overview and suggested solution, see Marsden, forthcoming).

Back to Introduction to Social Network Analysis Back to top

© G Tampubolon - 17 December 2004