The GNU Awk User's Guide

Node:Group Functions, Previous:Passwd Functions, Up:Library Functions

Reading the Group Database

Much of the discussion presented in Reading the User Database, applies to the group database as well. Although there has traditionally been a well-known file (/etc/group) in a well-known format, the POSIX standard only provides a set of C library routines (<grp.h> and getgrent) for accessing the information. Even though this file may exist, it likely does not have complete information. Therefore, as with the user database, it is necessary to have a small C program that generates the group database as its output.

grcat, a C program that "cats" the group database, is as follows:

/*
 * grcat.c
 *
 * Generate a printable version of the group database
 */
#include <stdio.h>
#include <grp.h>

int
main(argc, argv)
int argc;
char **argv;
{
    struct group *g;
    int i;

    while ((g = getgrent()) != NULL) {
        printf("%s:%s:%d:", g->gr_name, g->gr_passwd,
                                            g->gr_gid);
        for (i = 0; g->gr_mem[i] != NULL; i++) {
            printf("%s", g->gr_mem[i]);
            if (g->gr_mem[i+1] != NULL)
                putchar(',');
        }
        putchar('\n');
    }
    endgrent();
    exit(0);
}

Each line in the group database represents one group. The fields are separated with colons and represent the following information:

Group name The group's name.
Group password The group's encrypted password. In practice, this field is never used; it is usually empty or set to *.
Group-ID The group's numeric group ID number; this number should be unique within the file.
Group member list A comma-separated list of usernames. These users are members of the group. Modern Unix systems allow users to be members of several groups simultaneously. If your system does, then there are elements "group1" through "groupN" in PROCINFO for those group ID numbers. (Note that PROCINFO is a gawk extension; see Built-in Variables.)

Here is what running grcat might produce:

$ grcat
-| wheel:*:0:arnold
-| nogroup:*:65534:
-| daemon:*:1:
-| kmem:*:2:
-| staff:*:10:arnold,miriam,andy
-| other:*:20:
...

Here are the functions for obtaining information from the group database. There are several, modeled after the C library functions of the same names:

# group.awk --- functions for dealing with the group file
BEGIN    \
{
    # Change to suit your system
    _gr_awklib = "/usr/local/libexec/awk/"
}

function _gr_init(    oldfs, oldrs, olddol0, grcat,
                             using_fw, n, a, i)
{
    if (_gr_inited)
        return

    oldfs = FS
    oldrs = RS
    olddol0 = $0
    using_fw = (PROCINFO["FS"] == "FIELDWIDTHS")
    FS = ":"
    RS = "\n"

    grcat = _gr_awklib "grcat"
    while ((grcat | getline) > 0) {
        if ($1 in _gr_byname)
            _gr_byname[$1] = _gr_byname[$1] "," $4
        else
            _gr_byname[$1] = $0
        if ($3 in _gr_bygid)
            _gr_bygid[$3] = _gr_bygid[$3] "," $4
        else
            _gr_bygid[$3] = $0

        n = split($4, a, "[ \t]*,[ \t]*")
        for (i = 1; i <= n; i++)
            if (a[i] in _gr_groupsbyuser)
                _gr_groupsbyuser[a[i]] = \
                    _gr_groupsbyuser[a[i]] " " $1
            else
                _gr_groupsbyuser[a[i]] = $1

        _gr_bycount[++_gr_count] = $0
    }
    close(grcat)
    _gr_count = 0
    _gr_inited++
    FS = oldfs
    if (using_fw)
        FIELDWIDTHS = FIELDWIDTHS
    RS = oldrs
    $0 = olddol0
}

The BEGIN rule sets a private variable to the directory where grcat is stored. Because it is used to help out an awk library routine, we have chosen to put it in /usr/local/libexec/awk. You might want it to be in a different directory on your system.

These routines follow the same general outline as the user database routines (see Reading the User Database). The _gr_inited variable is used to ensure that the database is scanned no more than once. The _gr_init function first saves FS, FIELDWIDTHS, RS, and $0, and then sets FS and RS to the correct values for scanning the group information.

The group information is stored is several associative arrays. The arrays are indexed by group name (_gr_byname), by group ID number (_gr_bygid), and by position in the database (_gr_bycount). There is an additional array indexed by username (_gr_groupsbyuser), which is a space-separated list of groups to which each user belongs.

Unlike the user database, it is possible to have multiple records in the database for the same group. This is common when a group has a large number of members. A pair of such entries might look like the following:

tvpeople:*:101:johnny,jay,arsenio
tvpeople:*:101:david,conan,tom,joan

For this reason, _gr_init looks to see if a group name or group ID number is already seen. If it is, then the usernames are simply concatenated onto the previous list of users. (There is actually a subtle problem with the code just presented. Suppose that the first time there were no names. This code adds the names with a leading comma. It also doesn't check that there is a $4.)

Finally, _gr_init closes the pipeline to grcat, restores FS (and FIELDWIDTHS if necessary), RS, and $0, initializes _gr_count to zero (it is used later), and makes _gr_inited nonzero.

The getgrnam function takes a group name as its argument, and if that group exists, it is returned. Otherwise, getgrnam returns the null string:

function getgrnam(group)
{
    _gr_init()
    if (group in _gr_byname)
        return _gr_byname[group]
    return ""
}

The getgrgid function is similar, it takes a numeric group ID and looks up the information associated with that group ID:

function getgrgid(gid)
{
    _gr_init()
    if (gid in _gr_bygid)
        return _gr_bygid[gid]
    return ""
}

The getgruser function does not have a C counterpart. It takes a username and returns the list of groups that have the user as a member:

function getgruser(user)
{
    _gr_init()
    if (user in _gr_groupsbyuser)
        return _gr_groupsbyuser[user]
    return ""
}

The getgrent function steps through the database one entry at a time. It uses _gr_count to track its position in the list:

function getgrent()
{
    _gr_init()
    if (++_gr_count in _gr_bycount)
        return _gr_bycount[_gr_count]
    return ""
}

The endgrent function resets _gr_count to zero so that getgrent can start over again:

function endgrent()
{
    _gr_count = 0
}

As with the user database routines, each function calls _gr_init to initialize the arrays. Doing so only incurs the extra overhead of running grcat if these functions are used (as opposed to moving the body of _gr_init into a BEGIN rule).

Most of the work is in scanning the database and building the various associative arrays. The functions that the user calls are themselves very simple, relying on awk's associative arrays to do work.

The id program in Printing out User Information, uses these functions.

Group name	The group's name.
Group password	The group's encrypted password. In practice, this field is never used; it is usually empty or set to `*`.
Group-ID	The group's numeric group ID number; this number should be unique within the file.
Group member list	A comma-separated list of usernames. These users are members of the group. Modern Unix systems allow users to be members of several groups simultaneously. If your system does, then there are elements `"group1"` through `"groupN"` in `PROCINFO` for those group ID numbers. (Note that `PROCINFO` is a `gawk` extension; see Built-in Variables.)