Adding new expressions to nftables

Introduction

I’m recently writing something that uses Linux’s firewall framework to do some non-standard operations packets. Extending the kernel is required for my task but unfortunately documentations about this topic I find online are quite dated. These old documents are mainly for kernel version 2.4 and earlier 2.6.x, in which new matches or targets are registered by calling ipt_register_match and ipt_register_target. The related subsystem of kernel has changed a lot since then, and iptables has been replaced by nftables. Although we can use xt_register_match and xt_register_target instead, I prefer to move to the new nftables framework. Due to the lack of documentation, I have to dig into the source code of Linux kernel to figure out how things works, and this post is the note for that. As Linus Torvalds says in 2008, “Linux is evolution, not intelligent design”, the design and API of nftables might be changing very fast. So I’m not only trying to make a brief review on the design or API of nftables. But also, this post will serve as a guide on how to find the correct way of doing things by reading the kernel source code. The eager reader can go directly to the summary section. This post is based on kernel version 4.13, the most recent version when this post is started writing.

Here in this post, we will solve a toy problem: monitor all outgoing TCP traffic from port 80, if it contains the string given by the user, log it. I don’t assume any knowledge in the design or kernel API of nftables, but I do assume the reader has read and understand well the official documents on how to use nftables.

Starting point of kernel code

The starting point is to find which source file to read. The following command gives a nice overview on nftables in Linux kernel:

1
grep -P 'nftables|nf_tables' -r .

The output shall be something that looks like:

1
2
3
4
5
6
7
8
9
./include/net/netfilter/nf_tables_core.h:int nf_tables_core_module_init(void);
./include/net/netfilter/nf_tables_core.h:void nf_tables_core_module_exit(void);
./include/net/netfilter/nf_tables_ipv4.h:#include <net/netfilter/nf_tables.h>
./include/net/netfilter/nf_tables.h:#include <linux/netfilter/nf_tables.h>
./include/net/netfilter/nf_tables.h: * struct nft_verdict - nf_tables verdict
./include/net/netfilter/nf_tables.h: * @code: nf_tables/netfilter verdict code
./include/net/netfilter/nf_tables.h: * struct nft_regs - nf_tables register set
./include/net/netfilter/nf_tables.h: * struct nft_ctx - nf_tables rule/set context
...........

The files listed in the output will be the files to look at. A good tool to read Linux kernel source code is FreeElectrons. We can start by looking at the file names in directory include/net/netfilter on its website. We can see nft_masq.h, nft_redir.h, nft_reject.h in that directory. These are all actions in nftables. Following the references of symbols defined in these files will lead us towards sample codes on how to create new actions. Let’s take reject as an example. From its header, we can find an interesting symbol nft_reject_init. Looking around all the definitions and references of that symbol, we are able to find the core code at net/ipv4/netfilter/nft_reject_ipv4.c. In its core code, L61-L72 reads:

1
2
3
4
5
6
7
8
9
10
11
12
static int __init nft_reject_ipv4_module_init(void)
{
return nft_register_expr(&nft_reject_ipv4_type);
}
static void __exit nft_reject_ipv4_module_exit(void)
{
nft_unregister_expr(&nft_reject_ipv4_type);
}
module_init(nft_reject_ipv4_module_init);
module_exit(nft_reject_ipv4_module_exit);

We can immediately know from the above code that we should call nft_register_expr to register an expression and call nft_unregister_expr to unregister.

Now let’s take a look at the prototype of nft_register_expr in nf_tables.h:

1
int nft_register_expr(struct nft_expr_type *);

It takes one parameter of type struct nft_expr_type *. This struct is also defined in nf_tables.h at L681. The usage of nft_register_expr and nft_expr_type can be guessed by reading all the examples. The list of all examples can be found from its reference in here. In that list, there is one file that looks very interesting from its file name:

1
net/netfilter/nf_tables_core.c, line 251

Open this link, we can see all these basic expressions:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
static struct nft_expr_type *nft_basic_types[] = {
&nft_imm_type,
&nft_cmp_type,
&nft_lookup_type,
&nft_bitwise_type,
&nft_byteorder_type,
&nft_payload_type,
&nft_dynset_type,
&nft_range_type,
};
int __init nf_tables_core_module_init(void)
{
int err, i;
for (i = 0; i < ARRAY_SIZE(nft_basic_types); i++) {
err = nft_register_expr(nft_basic_types[i]);
if (err)
goto err;
}
return 0;
err:
while (i-- > 0)
nft_unregister_expr(nft_basic_types[i]);
return err;
}

Now we know what to look at. The next step will be to read these examples to get a feeling on how to write our own expression.

The usage of kernel API

To know the usage of related API, we choose to read the reject operation for the inet family and compare operation as sample code. The source code of reject is located at net/netfilter/nft_reject_inet.c. The source code of compare is loacated at net/netfilter/nft_cmp.c. In the case that one expression only correspond to one operation, the usage is shown below by the source code of reject operation at L120:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
static struct nft_expr_type nft_reject_inet_type;
static const struct nft_expr_ops nft_reject_inet_ops = {
.type = &nft_reject_inet_type,
.size = NFT_EXPR_SIZE(sizeof(struct nft_reject)),
.eval = nft_reject_inet_eval,
.init = nft_reject_inet_init,
.dump = nft_reject_inet_dump,
.validate = nft_reject_validate,
};
static struct nft_expr_type nft_reject_inet_type __read_mostly = {
.family = NFPROTO_INET,
.name = "reject",
.ops = &nft_reject_inet_ops,
.policy = nft_reject_policy,
.maxattr = NFTA_REJECT_MAX,
.owner = THIS_MODULE,
};

From the above code, we know that we should create an instance of both struct nft_expr_ops and struct nft_expr_type and point to each other at nft_expr_ops.type and nft_expr_type.ops. In the case that one expression correspond to many operations, the usage is shown below by the source code of compare operation:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
static const struct nft_expr_ops nft_cmp_ops = {
.type = &nft_cmp_type,
.size = NFT_EXPR_SIZE(sizeof(struct nft_cmp_expr)),
.eval = nft_cmp_eval,
.init = nft_cmp_init,
.dump = nft_cmp_dump,
};
...........
const struct nft_expr_ops nft_cmp_fast_ops = {
.type = &nft_cmp_type,
.size = NFT_EXPR_SIZE(sizeof(struct nft_cmp_fast_expr)),
.eval = NULL, /* inlined */
.init = nft_cmp_fast_init,
.dump = nft_cmp_fast_dump,
};
...........
static const struct nft_expr_ops *
nft_cmp_select_ops(const struct nft_ctx *ctx, const struct nlattr * const tb[])
{
struct nft_data_desc desc;
struct nft_data data;
enum nft_cmp_ops op;
int err;
if (tb[NFTA_CMP_SREG] == NULL ||
tb[NFTA_CMP_OP] == NULL ||
tb[NFTA_CMP_DATA] == NULL)
return ERR_PTR(-EINVAL);
op = ntohl(nla_get_be32(tb[NFTA_CMP_OP]));
switch (op) {
case NFT_CMP_EQ:
case NFT_CMP_NEQ:
case NFT_CMP_LT:
case NFT_CMP_LTE:
case NFT_CMP_GT:
case NFT_CMP_GTE:
break;
default:
return ERR_PTR(-EINVAL);
}
err = nft_data_init(NULL, &data, sizeof(data), &desc,
tb[NFTA_CMP_DATA]);
if (err < 0)
return ERR_PTR(err);
if (desc.type != NFT_DATA_VALUE) {
err = -EINVAL;
goto err1;
}
if (desc.len <= sizeof(u32) && op == NFT_CMP_EQ)
return &nft_cmp_fast_ops;
return &nft_cmp_ops;
err1:
nft_data_release(&data, desc.type);
return ERR_PTR(-EINVAL);
}
struct nft_expr_type nft_cmp_type __read_mostly = {
.name = "cmp",
.select_ops = nft_cmp_select_ops,
.policy = nft_cmp_policy,
.maxattr = NFTA_CMP_MAX,
.owner = THIS_MODULE,
};

From this we can see that we should create an instance of struct nft_expr_ops for each operation, and use select_ops to choose dynamically which operation to use. The select_ops should return the pointer to the operation chosen, or an ERR_PTR in case of error. Now Let’s discuss struct nft_expr_ops and struct nft_expr_type in detail separately.

struct nft_expr_ops

Let’s take a look at struct nft_expr_ops first. It’s definition is at include/net/netfilter/nf_tables.h#L722:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
/**
* struct nft_expr_ops - nf_tables expression operations
*
* @eval: Expression evaluation function
* @size: full expression size, including private data size
* @init: initialization function
* @destroy: destruction function
* @dump: function to dump parameters
* @type: expression type
* @validate: validate expression, called during loop detection
* @data: extra data to attach to this expression operation
*/
struct nft_expr;
struct nft_expr_ops {
void (*eval)(const struct nft_expr *expr,
struct nft_regs *regs,
const struct nft_pktinfo *pkt);
int (*clone)(struct nft_expr *dst,
const struct nft_expr *src);
unsigned int size;
int (*init)(const struct nft_ctx *ctx,
const struct nft_expr *expr,
const struct nlattr * const tb[]);
void (*destroy)(const struct nft_ctx *ctx,
const struct nft_expr *expr);
int (*dump)(struct sk_buff *skb,
const struct nft_expr *expr);
int (*validate)(const struct nft_ctx *ctx,
const struct nft_expr *expr,
const struct nft_data **data);
const struct nft_expr_type *type;
void *data;
};

From the name and comments of these fields, we can see that init, destroy, clone play the role of constructor, destructor, and copy constructor. What to do in these functions is shown in the code above. In that code, init is defined as nft_reject_inet_init and clone and destroy are not defined. The source code for nft_reject_inet_init is located at L64:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
static int nft_reject_inet_init(const struct nft_ctx *ctx,
const struct nft_expr *expr,
const struct nlattr * const tb[])
{
struct nft_reject *priv = nft_expr_priv(expr);
int icmp_code;
if (tb[NFTA_REJECT_TYPE] == NULL)
return -EINVAL;
priv->type = ntohl(nla_get_be32(tb[NFTA_REJECT_TYPE]));
switch (priv->type) {
case NFT_REJECT_ICMP_UNREACH:
case NFT_REJECT_ICMPX_UNREACH:
if (tb[NFTA_REJECT_ICMP_CODE] == NULL)
return -EINVAL;
icmp_code = nla_get_u8(tb[NFTA_REJECT_ICMP_CODE]);
if (priv->type == NFT_REJECT_ICMPX_UNREACH &&
icmp_code > NFT_REJECT_ICMPX_MAX)
return -EINVAL;
priv->icmp_code = icmp_code;
break;
case NFT_REJECT_TCP_RST:
break;
default:
return -EINVAL;
}
return 0;
}

By reading this function and looking into all other functions called by this function, we can see that the following things will happen: The kernel will allocate memory for an instance of struct nft_reject, which is the struct that stores operation specific data, at expr->data. In order for the kernel to know the size of memory to allocate for struct nft_reject, its size is passed to nft_expr_ops.size as shown in the 4th line at the code snippet above:

1
.size = NFT_EXPR_SIZE(sizeof(struct nft_reject))

The init function is responsible to initialize the fields of this instance by reading attributes from netlink by calling functions like nla_get_<type>. Data from netlink is stored at the argument tb. In case of error, the init function should return a negative number, otherwise 0 should be returned. Up to now, we are not sure how to let netlink know what attributes are expected and what are the length of these attributes yet, but don’t worry, things will become clear as we keep reading. Let’s for now just forget about this problem.

Now let’s take a look at the dump field, it is implemented by nft_reject_inet_dump for reject operation. The code is located at L96:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
static int nft_reject_inet_dump(struct sk_buff *skb,
const struct nft_expr *expr)
{
const struct nft_reject *priv = nft_expr_priv(expr);
if (nla_put_be32(skb, NFTA_REJECT_TYPE, htonl(priv->type)))
goto nla_put_failure;
switch (priv->type) {
case NFT_REJECT_ICMP_UNREACH:
case NFT_REJECT_ICMPX_UNREACH:
if (nla_put_u8(skb, NFTA_REJECT_ICMP_CODE, priv->icmp_code))
goto nla_put_failure;
break;
default:
break;
}
return 0;
nla_put_failure:
return -1;
}

We can see that this operation send back the parameters to netlink using functions like nla_put_<type>. In case of success, 0 should be returned, otherwise it should return a negative number.

The function that evaluate the evaluation correspond to the field eval. We can think of there are two types of expressions: those that match some conditions, and those that do something, such as drop, reject, accept, dnat, etc., to the packets. For these two types, the eval should be slightly different. Here we use the source code of both compare and reject as example. In reject, it is implemented as nft_reject_inet_eval. The source code is located at L20:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
static void nft_reject_inet_eval(const struct nft_expr *expr,
struct nft_regs *regs,
const struct nft_pktinfo *pkt)
{
struct nft_reject *priv = nft_expr_priv(expr);
switch (nft_pf(pkt)) {
case NFPROTO_IPV4:
switch (priv->type) {
case NFT_REJECT_ICMP_UNREACH:
nf_send_unreach(pkt->skb, priv->icmp_code,
nft_hook(pkt));
break;
case NFT_REJECT_TCP_RST:
nf_send_reset(nft_net(pkt), pkt->skb, nft_hook(pkt));
break;
case NFT_REJECT_ICMPX_UNREACH:
nf_send_unreach(pkt->skb,
nft_reject_icmp_code(priv->icmp_code),
nft_hook(pkt));
break;
}
break;
case NFPROTO_IPV6:
switch (priv->type) {
case NFT_REJECT_ICMP_UNREACH:
nf_send_unreach6(nft_net(pkt), pkt->skb,
priv->icmp_code, nft_hook(pkt));
break;
case NFT_REJECT_TCP_RST:
nf_send_reset6(nft_net(pkt), pkt->skb, nft_hook(pkt));
break;
case NFT_REJECT_ICMPX_UNREACH:
nf_send_unreach6(nft_net(pkt), pkt->skb,
nft_reject_icmpv6_code(priv->icmp_code),
nft_hook(pkt));
break;
}
break;
}
regs->verdict.code = NF_DROP;
}

In the compare operation, it is implemented as nft_cmp_eval. The source code is located at L27:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
static void nft_cmp_eval(const struct nft_expr *expr,
struct nft_regs *regs,
const struct nft_pktinfo *pkt)
{
const struct nft_cmp_expr *priv = nft_expr_priv(expr);
int d;
d = memcmp(&regs->data[priv->sreg], &priv->data, priv->len);
switch (priv->op) {
case NFT_CMP_EQ:
if (d != 0)
goto mismatch;
break;
case NFT_CMP_NEQ:
if (d == 0)
goto mismatch;
break;
case NFT_CMP_LT:
if (d == 0)
goto mismatch;
case NFT_CMP_LTE:
if (d > 0)
goto mismatch;
break;
case NFT_CMP_GT:
if (d == 0)
goto mismatch;
case NFT_CMP_GTE:
if (d < 0)
goto mismatch;
break;
}
return;
mismatch:
regs->verdict.code = NFT_BREAK;
}

From these two functions, we can see that this function tells the kernel to do something by setting regs->verdict.code or to continue to the next expression by not changing regs->verdict.code. For actions, the value of regs->verdict.code should be set to one of the following as shown in include/uapi/linux/netfilter.h#L9:

1
2
3
4
5
6
/* Responses from hook functions. */
#define NF_DROP 0
#define NF_ACCEPT 1
#define NF_STOLEN 2
#define NF_QUEUE 3
#define NF_REPEAT 4

For matches, it should be a value in enum nft_verdicts, which is listed at include/uapi/linux/netfilter/nf_tables.h#L49:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
/**
* enum nft_verdicts - nf_tables internal verdicts
*
* @NFT_CONTINUE: continue evaluation of the current rule
* @NFT_BREAK: terminate evaluation of the current rule
* @NFT_JUMP: push the current chain on the jump stack and jump to a chain
* @NFT_GOTO: jump to a chain without pushing the current chain on the jump stack
* @NFT_RETURN: return to the topmost chain on the jump stack
*
* The nf_tables verdicts share their numeric space with the netfilter verdicts.
*/
enum nft_verdicts {
NFT_CONTINUE = -1,
NFT_BREAK = -2,
NFT_JUMP = -3,
NFT_GOTO = -4,
NFT_RETURN = -5,
};

The field validate is used to check the validation of operation, for example: masquerade is only available at hook point POSTROUTING, reject is only available at hook point LOCAL INPUT, LOCAL_OUTPUT and FORWARD, etc. This can be shown at the source code of at net/netfilter/nft_reject.c#L29:

1
2
3
4
5
6
7
8
9
int nft_reject_validate(const struct nft_ctx *ctx,
const struct nft_expr *expr,
const struct nft_data **data)
{
return nft_chain_validate_hooks(ctx->chain,
(1 << NF_INET_LOCAL_IN) |
(1 << NF_INET_FORWARD) |
(1 << NF_INET_LOCAL_OUT));
}

The function nft_chain_validate_hooks is used to validate the hook point. There are other helper functions to validate different things, the list of these functions can be obtained by searching the string “validate” at include/net/netfilter/nf_tables.h.

struct nft_expr_type

The definition of nft_expr_type is at include/net/netfilter/nf_tables.h#L681:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
/**
* struct nft_expr_type - nf_tables expression type
*
* @select_ops: function to select nft_expr_ops
* @ops: default ops, used when no select_ops functions is present
* @list: used internally
* @name: Identifier
* @owner: module reference
* @policy: netlink attribute policy
* @maxattr: highest netlink attribute number
* @family: address family for AF-specific types
* @flags: expression type flags
*/
struct nft_expr_type {
const struct nft_expr_ops *(*select_ops)(const struct nft_ctx *,
const struct nlattr * const tb[]);
const struct nft_expr_ops *ops;
struct list_head list;
const char *name;
struct module *owner;
const struct nla_policy *policy;
unsigned int maxattr;
u8 family;
u8 flags;
};

The field ops and select_ops is already discussed; the field list is internally, so we should not worry about it here; the field name is the name of the expression; the field owner should be set to the pointer towards the current module. These are all trivial fields. Now let’s take a look at the policy and maxattr field. The related code at the definition of nft_reject_inet_type is:

1
2
.policy = nft_reject_policy,
.maxattr = NFTA_REJECT_MAX,

The array nft_reject_policy is defined at L23:

1
2
3
4
const struct nla_policy nft_reject_policy[NFTA_REJECT_MAX + 1] = {
[NFTA_REJECT_TYPE] = { .type = NLA_U32 },
[NFTA_REJECT_ICMP_CODE] = { .type = NLA_U8 },
};

The two array index above, NFTA_REJECT_TYPE and NFTA_REJECT_ICMP_CODE, belongs to an enum named nft_reject_attributes. And the definition of NFTA_REJECT_MAX and nft_reject_attributes is located at include/uapi/linux/netfilter/nf_tables.h#L1089:

1
2
3
4
5
6
7
8
9
10
11
12
13
/**
* enum nft_reject_attributes - nf_tables reject expression netlink attributes
*
* @NFTA_REJECT_TYPE: packet type to use (NLA_U32: nft_reject_types)
* @NFTA_REJECT_ICMP_CODE: ICMP code to use (NLA_U8)
*/
enum nft_reject_attributes {
NFTA_REJECT_UNSPEC,
NFTA_REJECT_TYPE,
NFTA_REJECT_ICMP_CODE,
__NFTA_REJECT_MAX
};
#define NFTA_REJECT_MAX (__NFTA_REJECT_MAX - 1)

Recall that we raised a question before on how does the kernel knows what are the attributes expected by the expression. The policy field is exactly the answer to this question. Let’s now dig deeper and read the source code of netlink starting at include/net/netlink.h#L9:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
/* ========================================================================
* Netlink Messages and Attributes Interface (As Seen On TV)
* ------------------------------------------------------------------------
* Messages Interface
* ------------------------------------------------------------------------
*
* Message Format:
* <--- nlmsg_total_size(payload) --->
* <-- nlmsg_msg_size(payload) ->
* +----------+- - -+-------------+- - -+-------- - -
* | nlmsghdr | Pad | Payload | Pad | nlmsghdr
* +----------+- - -+-------------+- - -+-------- - -
* nlmsg_data(nlh)---^ ^
* nlmsg_next(nlh)-----------------------+
*
* Payload Format:
* <---------------------- nlmsg_len(nlh) --------------------->
* <------ hdrlen ------> <- nlmsg_attrlen(nlh, hdrlen) ->
* +----------------------+- - -+--------------------------------+
* | Family Header | Pad | Attributes |
* +----------------------+- - -+--------------------------------+
* nlmsg_attrdata(nlh, hdrlen)---^
*
* Data Structures:
* struct nlmsghdr netlink message header
*
* Message Construction:
* nlmsg_new() create a new netlink message
* nlmsg_put() add a netlink message to an skb
* nlmsg_put_answer() callback based nlmsg_put()
* nlmsg_end() finalize netlink message
* nlmsg_get_pos() return current position in message
* nlmsg_trim() trim part of message
* nlmsg_cancel() cancel message construction
* nlmsg_free() free a netlink message
*
* Message Sending:
* nlmsg_multicast() multicast message to several groups
* nlmsg_unicast() unicast a message to a single socket
* nlmsg_notify() send notification message
*
* Message Length Calculations:
* nlmsg_msg_size(payload) length of message w/o padding
* nlmsg_total_size(payload) length of message w/ padding
* nlmsg_padlen(payload) length of padding at tail
*
* Message Payload Access:
* nlmsg_data(nlh) head of message payload
* nlmsg_len(nlh) length of message payload
* nlmsg_attrdata(nlh, hdrlen) head of attributes data
* nlmsg_attrlen(nlh, hdrlen) length of attributes data
*
* Message Parsing:
* nlmsg_ok(nlh, remaining) does nlh fit into remaining bytes?
* nlmsg_next(nlh, remaining) get next netlink message
* nlmsg_parse() parse attributes of a message
* nlmsg_find_attr() find an attribute in a message
* nlmsg_for_each_msg() loop over all messages
* nlmsg_validate() validate netlink message incl. attrs
* nlmsg_for_each_attr() loop over all attributes
*
* Misc:
* nlmsg_report() report back to application?
*
* ------------------------------------------------------------------------
* Attributes Interface
* ------------------------------------------------------------------------
*
* Attribute Format:
* <------- nla_total_size(payload) ------->
* <---- nla_attr_size(payload) ----->
* +----------+- - -+- - - - - - - - - +- - -+-------- - -
* | Header | Pad | Payload | Pad | Header
* +----------+- - -+- - - - - - - - - +- - -+-------- - -
* <- nla_len(nla) -> ^
* nla_data(nla)----^ |
* nla_next(nla)-----------------------------'
*
* Data Structures:
* struct nlattr netlink attribute header
*
* Attribute Construction:
* nla_reserve(skb, type, len) reserve room for an attribute
* nla_reserve_nohdr(skb, len) reserve room for an attribute w/o hdr
* nla_put(skb, type, len, data) add attribute to skb
* nla_put_nohdr(skb, len, data) add attribute w/o hdr
* nla_append(skb, len, data) append data to skb
*
* Attribute Construction for Basic Types:
* nla_put_u8(skb, type, value) add u8 attribute to skb
* nla_put_u16(skb, type, value) add u16 attribute to skb
* nla_put_u32(skb, type, value) add u32 attribute to skb
* nla_put_u64_64bit(skb, type,
* value, padattr) add u64 attribute to skb
* nla_put_s8(skb, type, value) add s8 attribute to skb
* nla_put_s16(skb, type, value) add s16 attribute to skb
* nla_put_s32(skb, type, value) add s32 attribute to skb
* nla_put_s64(skb, type, value,
* padattr) add s64 attribute to skb
* nla_put_string(skb, type, str) add string attribute to skb
* nla_put_flag(skb, type) add flag attribute to skb
* nla_put_msecs(skb, type, jiffies,
* padattr) add msecs attribute to skb
* nla_put_in_addr(skb, type, addr) add IPv4 address attribute to skb
* nla_put_in6_addr(skb, type, addr) add IPv6 address attribute to skb
*
* Nested Attributes Construction:
* nla_nest_start(skb, type) start a nested attribute
* nla_nest_end(skb, nla) finalize a nested attribute
* nla_nest_cancel(skb, nla) cancel nested attribute construction
*
* Attribute Length Calculations:
* nla_attr_size(payload) length of attribute w/o padding
* nla_total_size(payload) length of attribute w/ padding
* nla_padlen(payload) length of padding
*
* Attribute Payload Access:
* nla_data(nla) head of attribute payload
* nla_len(nla) length of attribute payload
*
* Attribute Payload Access for Basic Types:
* nla_get_u8(nla) get payload for a u8 attribute
* nla_get_u16(nla) get payload for a u16 attribute
* nla_get_u32(nla) get payload for a u32 attribute
* nla_get_u64(nla) get payload for a u64 attribute
* nla_get_s8(nla) get payload for a s8 attribute
* nla_get_s16(nla) get payload for a s16 attribute
* nla_get_s32(nla) get payload for a s32 attribute
* nla_get_s64(nla) get payload for a s64 attribute
* nla_get_flag(nla) return 1 if flag is true
* nla_get_msecs(nla) get payload for a msecs attribute
*
* Attribute Misc:
* nla_memcpy(dest, nla, count) copy attribute into memory
* nla_memcmp(nla, data, size) compare attribute with memory area
* nla_strlcpy(dst, nla, size) copy attribute to a sized string
* nla_strcmp(nla, str) compare attribute with string
*
* Attribute Parsing:
* nla_ok(nla, remaining) does nla fit into remaining bytes?
* nla_next(nla, remaining) get next netlink attribute
* nla_validate() validate a stream of attributes
* nla_validate_nested() validate a stream of nested attributes
* nla_find() find attribute in stream of attributes
* nla_find_nested() find attribute in nested attributes
* nla_parse() parse and validate stream of attrs
* nla_parse_nested() parse nested attribuets
* nla_for_each_attr() loop over all attributes
* nla_for_each_nested() loop over the nested attributes
*=========================================================================
*/
/**
* Standard attribute types to specify validation policy
*/
enum {
NLA_UNSPEC,
NLA_U8,
NLA_U16,
NLA_U32,
NLA_U64,
NLA_STRING,
NLA_FLAG,
NLA_MSECS,
NLA_NESTED,
NLA_NESTED_COMPAT,
NLA_NUL_STRING,
NLA_BINARY,
NLA_S8,
NLA_S16,
NLA_S32,
NLA_S64,
__NLA_TYPE_MAX,
};
#define NLA_TYPE_MAX (__NLA_TYPE_MAX - 1)
/**
* struct nla_policy - attribute validation policy
* @type: Type of attribute or NLA_UNSPEC
* @len: Type specific length of payload
*
* Policies are defined as arrays of this struct, the array must be
* accessible by attribute type up to the highest identifier to be expected.
*
* Meaning of `len' field:
* NLA_STRING Maximum length of string
* NLA_NUL_STRING Maximum length of string (excluding NUL)
* NLA_FLAG Unused
* NLA_BINARY Maximum length of attribute payload
* NLA_NESTED Don't use `len' field -- length verification is
* done by checking len of nested header (or empty)
* NLA_NESTED_COMPAT Minimum length of structure payload
* NLA_U8, NLA_U16,
* NLA_U32, NLA_U64,
* NLA_S8, NLA_S16,
* NLA_S32, NLA_S64,
* NLA_MSECS Leaving the length field zero will verify the
* given type fits, using it verifies minimum length
* just like "All other"
* All other Minimum length of attribute payload
*
* Example:
* static const struct nla_policy my_policy[ATTR_MAX+1] = {
* [ATTR_FOO] = { .type = NLA_U16 },
* [ATTR_BAR] = { .type = NLA_STRING, .len = BARSIZ },
* [ATTR_BAZ] = { .len = sizeof(struct mystruct) },
* };
*/
struct nla_policy {
u16 type;
u16 len;
};

The comments explains itself very well. The source code of attributes of different expressions are all defined at include/uapi/linux/netfilter/nf_tables.h. To get a feeling on how to write an array like this, just search the string “attributes” in this file. All definition of attributes should begin with an UNSPEC to leave space for internal usage.

The field family is the address family of your expression. Possible values can be found at include/uapi/linux/netfilter.h#L59:

1
2
3
4
5
6
7
8
9
10
11
enum {
NFPROTO_UNSPEC = 0,
NFPROTO_INET = 1,
NFPROTO_IPV4 = 2,
NFPROTO_ARP = 3,
NFPROTO_NETDEV = 5,
NFPROTO_BRIDGE = 7,
NFPROTO_IPV6 = 10,
NFPROTO_DECNET = 12,
NFPROTO_NUMPROTO,
};

The field flags are used to denote expression types. Currently, only one flag is available, that is if an expression is stateful. See include/net/netfilter/nf_tables.h#L707:

1
#define NFT_EXPR_STATEFUL 0x1

Summary on kernel codes

Create an instance of struct nft_expr_ops for each operation of this expression. Implements its fields as in its definition. Use init, clone, destroy to initialize, clone and destroy object. In init, read attributes from netlink and setup operation’s struct. Implement the core function of this operation in eval, tell kernel what to do by setting regs->verdict.code. In dump, send the attributes through netlink. Apply constraints to operations in validate.

Create an instance of struct nft_expr_type for your expression. Implements its fields as in its definition. If you have multiple operations that should be selected dynamically, implement select_ops otherwise set ops. Set name, owner according to your expression. If applicable, set address family at family. If applicable, use flags to indicate if your expression is stateful. Create an array of struct nla_policy, setup attribute information in that array, and set this array as policy. Set maxattr as the maximum number of attributes.

Call nft_register_expr to register your expression. Call nft_unregister_expr to unregister your expression.

Writing our own kernel code

With the knowledge on how to write kernel codes, we are ready to write our own module to add our expression. Here we call our expression “abcde”

abcde.h:

1
2
3
4
5
6
7
8
9
10
11
12
#ifndef _ABCDE_H
#define _ABCDE_H
enum nft_abcde_attributes {
NFTA_ABCDE_UNSPEC,
NFTA_ABCDE_TEXT,
__NFTA_ABCDE_MAX,
};
#define NFTA_ABCDE_MAX (__NFTA_ABCDE_MAX - 1)
#endif /* _ABCDE_H */

abcde.c:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
#include <net/netfilter/nf_tables.h>
#include <linux/tcp.h>
#include "abcde.h"
#define ABCDE_TEXT_SIZE 128
struct nft_abcde {
char text[ABCDE_TEXT_SIZE];
int len;
};
static inline bool match_packet(struct nft_abcde *priv, struct sk_buff *skb) {
struct tcphdr *tcph = tcp_hdr(skb);
char *user_data = (char *)((char *)tcph + (tcph->doff * 4));
char *tail = skb_tail_pointer(skb);
char *p;
for (p = user_data; p < tail - priv->len; p++) {
int i; bool found = true;
for (i = 0; i < priv->len; i++)
if (p[i] != priv->text[i]) {
found = false;
break;
}
if (found)
return true;
}
return false;
}
static const struct nla_policy nft_abcde_policy[NFTA_ABCDE_MAX + 1] = {
[NFTA_ABCDE_TEXT] = { .type = NLA_STRING, .len = ABCDE_TEXT_SIZE },
};
static void nft_abcde_eval(const struct nft_expr *expr, struct nft_regs *regs, const struct nft_pktinfo *pkt) {
struct nft_abcde *priv = nft_expr_priv(expr);
struct sk_buff *skb = pkt->skb;
if(match_packet(priv, skb))
regs->verdict.code = NFT_CONTINUE;
else
regs->verdict.code = NFT_BREAK;
}
static int nft_abcde_init(const struct nft_ctx *ctx, const struct nft_expr *expr, const struct nlattr * const tb[]) {
struct nft_abcde *priv = nft_expr_priv(expr);
if (tb[NFTA_ABCDE_TEXT] == NULL)
return -EINVAL;
nla_strlcpy(priv->text, tb[NFTA_ABCDE_TEXT], ABCDE_TEXT_SIZE);
priv->len = strlen(priv->text);
return 0;
}
static int nft_abcde_dump(struct sk_buff *skb, const struct nft_expr *expr) {
const struct nft_abcde *priv = nft_expr_priv(expr);
if (nla_put_string(skb, NFTA_ABCDE_TEXT, priv->text))
return -1;
return 0;
}
static struct nft_expr_type nft_abcde_type;
static const struct nft_expr_ops nft_abcde_op = {
.eval = nft_abcde_eval,
.size = sizeof(struct nft_abcde),
.init = nft_abcde_init,
.dump = nft_abcde_dump,
.type = &nft_abcde_type,
};
static struct nft_expr_type nft_abcde_type __read_mostly = {
.ops = &nft_abcde_op,
.name = "abcde",
.owner = THIS_MODULE,
.policy = nft_abcde_policy,
.maxattr = NFTA_ABCDE_MAX,
};
static int __init nft_abcde_module_init(void) {
return nft_register_expr(&nft_abcde_type);
}
static void __exit nft_abcde_module_exit(void) {
nft_unregister_expr(&nft_abcde_type);
}
module_init(nft_abcde_module_init);
module_exit(nft_abcde_module_exit);
MODULE_AUTHOR("Xiang Gao");
MODULE_LICENSE("GPL");
MODULE_DESCRIPTION("A sample nftables expression.");

Makefile:

1
2
3
4
5
6
obj-m = abcde.o
KVERSION = $(shell uname -r)
all:
make -C /lib/modules/$(KVERSION)/build M=$(PWD) modules
clean:
make -C /lib/modules/$(KVERSION)/build M=$(PWD) clean

The complete source code for this example can be found at GitHub:
https://github.com/zasdfgbnm/nftables-abcde

Modify user space tool

In order to be able to conveniently use our new expression “abcde”, it would be good to modify the source code of user space tool, i.e. the nft command, to make it aware of our new expression. Extending the user space tool is easier. We first check it out from its git repository and switch to tag v0.7 (the newest release when this article is written):

1
2
3
git clone git://git.netfilter.org/nftables
cd nftables
git checkout v0.7 -b abcde

To figure out where to modify, let’s run grep to see how the expression reject is implemented:

1
grep -i reject -r src include

The above command will output something like:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
src/datatype.c: SYMBOL("reject-route", ICMPV6_REJECT_ROUTE),
......
src/evaluate.c:static int reject_payload_gen_dependency_tcp(struct eval_ctx *ctx,
......
src/netlink_delinearize.c:static void netlink_parse_reject(struct netlink_parse_ctx *ctx,
......
src/parser_bison.y:%token _REJECT "reject"
......
src/scanner.l:"reject" { return _REJECT; }
src/statement.c:static void reject_stmt_print(const struct stmt *stmt)
......
include/linux/netfilter/nf_tables.h: * enum nft_reject_types - nf_tables reject expression reject types
......
include/statement.h:struct reject_stmt {
......

This tells us the files we may want to modify. A good start point is scanner.l and parser_bison.y. We can copy and paste the code for reject, replace it with our own thing.

After some try and error, we end up with the following patch generated by git diff v0.7:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
diff --git a/include/statement.h b/include/statement.h
index 277ff2f..9043790 100644
--- a/include/statement.h
+++ b/include/statement.h
@@ -76,6 +76,12 @@ struct reject_stmt {
extern struct stmt *reject_stmt_alloc(const struct location *loc);
+struct abcde_stmt {
+ const char * text;
+};
+
+extern struct stmt *abcde_stmt_alloc(const struct location *loc);
+
struct nat_stmt {
enum nft_nat_types type;
struct expr *addr;
@@ -199,6 +205,7 @@ extern struct stmt *xt_stmt_alloc(const struct location *loc);
* @STMT_LIMIT: limit statement
* @STMT_LOG: log statement
* @STMT_REJECT: REJECT statement
+ * @STMT_ABCDE: abcde statement
* @STMT_NAT: NAT statement
* @STMT_MASQ: masquerade statement
* @STMT_REDIR: redirect statement
@@ -222,6 +229,7 @@ enum stmt_types {
STMT_LIMIT,
STMT_LOG,
STMT_REJECT,
+ STMT_ABCDE,
STMT_NAT,
STMT_MASQ,
STMT_REDIR,
@@ -280,6 +288,7 @@ struct stmt {
struct log_stmt log;
struct limit_stmt limit;
struct reject_stmt reject;
+ struct abcde_stmt abcde;
struct nat_stmt nat;
struct masq_stmt masq;
struct redir_stmt redir;
diff --git a/src/evaluate.c b/src/evaluate.c
index 8a3da54..751d702 100644
--- a/src/evaluate.c
+++ b/src/evaluate.c
@@ -2495,6 +2495,8 @@ int stmt_evaluate(struct eval_ctx *ctx, struct stmt *stmt)
return stmt_evaluate_ct(ctx, stmt);
case STMT_LOG:
return stmt_evaluate_log(ctx, stmt);
+ case STMT_ABCDE:
+ return 0;
case STMT_REJECT:
return stmt_evaluate_reject(ctx, stmt);
case STMT_NAT:
diff --git a/src/netlink_delinearize.c b/src/netlink_delinearize.c
index cb0f6ac..f8b83a6 100644
--- a/src/netlink_delinearize.c
+++ b/src/netlink_delinearize.c
@@ -799,6 +799,16 @@ static void netlink_parse_reject(struct netlink_parse_ctx *ctx,
ctx->stmt = stmt;
}
+static void netlink_parse_abcde(struct netlink_parse_ctx *ctx,
+ const struct location *loc,
+ const struct nftnl_expr *expr)
+{
+ struct stmt *stmt;
+ stmt = abcde_stmt_alloc(loc);
+ stmt->abcde.text = xstrdup(nftnl_expr_get_str(expr, NFTNL_EXPR_ABCDE_TEXT));
+ ctx->stmt = stmt;
+}
+
static void netlink_parse_nat(struct netlink_parse_ctx *ctx,
const struct location *loc,
const struct nftnl_expr *nle)
@@ -1144,6 +1154,7 @@ static const struct {
{ .name = "limit", .parse = netlink_parse_limit },
{ .name = "range", .parse = netlink_parse_range },
{ .name = "reject", .parse = netlink_parse_reject },
+ { .name = "abcde", .parse = netlink_parse_abcde },
{ .name = "nat", .parse = netlink_parse_nat },
{ .name = "notrack", .parse = netlink_parse_notrack },
{ .name = "masq", .parse = netlink_parse_masq },
diff --git a/src/netlink_linearize.c b/src/netlink_linearize.c
index 0915038..893ae7e 100644
--- a/src/netlink_linearize.c
+++ b/src/netlink_linearize.c
@@ -874,6 +874,18 @@ static void netlink_gen_reject_stmt(struct netlink_linearize_ctx *ctx,
nftnl_rule_add_expr(ctx->nlr, nle);
}
+static void netlink_gen_abcde_stmt(struct netlink_linearize_ctx *ctx,
+ const struct stmt *stmt)
+{
+ struct nftnl_expr *nle;
+ nle = alloc_nft_expr("abcde");
+
+ if (stmt->abcde.text != NULL) {
+ nftnl_expr_set_str(nle, NFTNL_EXPR_ABCDE_TEXT, stmt->abcde.text);
+ }
+ nftnl_rule_add_expr(ctx->nlr, nle);
+}
+
static void netlink_gen_nat_stmt(struct netlink_linearize_ctx *ctx,
const struct stmt *stmt)
{
@@ -1200,6 +1212,8 @@ static void netlink_gen_stmt(struct netlink_linearize_ctx *ctx,
return netlink_gen_log_stmt(ctx, stmt);
case STMT_REJECT:
return netlink_gen_reject_stmt(ctx, stmt);
+ case STMT_ABCDE:
+ return netlink_gen_abcde_stmt(ctx, stmt);
case STMT_NAT:
return netlink_gen_nat_stmt(ctx, stmt);
case STMT_MASQ:
diff --git a/src/parser_bison.y b/src/parser_bison.y
index deaaf06..ac16c72 100644
--- a/src/parser_bison.y
+++ b/src/parser_bison.y
@@ -399,6 +399,7 @@ static void location_update(struct location *loc, struct location *rhs, int n)
%token RANDOM "random"
%token FULLY_RANDOM "fully-random"
%token PERSISTENT "persistent"
+%token ABCDE "abcde"
%token QUEUE "queue"
%token QUEUENUM "num"
@@ -499,6 +500,8 @@ static void location_update(struct location *loc, struct location *rhs, int n)
%type <val> set_stmt_op
%type <stmt> flow_stmt flow_stmt_alloc
%destructor { stmt_free($$); } flow_stmt flow_stmt_alloc
+%type <stmt> abcde_stmt abcde_stmt_alloc
+%destructor { stmt_free($$); } abcde_stmt abcde_stmt_alloc
%type <expr> symbol_expr verdict_expr integer_expr variable_expr
%destructor { expr_free($$); } symbol_expr verdict_expr integer_expr variable_expr
@@ -1400,6 +1403,7 @@ stmt : verdict_stmt
| limit_stmt
| quota_stmt
| reject_stmt
+ | abcde_stmt
| nat_stmt
| queue_stmt
| ct_stmt
@@ -1736,6 +1740,21 @@ reject_opts : /* empty */
}
;
+abcde_stmt : abcde_stmt_alloc abcde_opts
+ ;
+
+abcde_stmt_alloc : ABCDE
+ {
+ $$ = abcde_stmt_alloc(&@$);
+ }
+ ;
+
+abcde_opts : string
+ {
+ $<stmt>0->abcde.text = $1;
+ }
+ ;
+
nat_stmt : nat_stmt_alloc nat_stmt_args
;
diff --git a/src/scanner.l b/src/scanner.l
index 625023f..595db76 100644
--- a/src/scanner.l
+++ b/src/scanner.l
@@ -333,6 +333,7 @@ addrstring ({macaddr}|{ip4addr}|{ip6addr})
"random" { return RANDOM; }
"fully-random" { return FULLY_RANDOM; }
"persistent" { return PERSISTENT; }
+"abcde" { return ABCDE; }
"ll" { return LL_HDR; }
"nh" { return NETWORK_HDR; }
diff --git a/src/statement.c b/src/statement.c
index e70eb51..29b8015 100644
--- a/src/statement.c
+++ b/src/statement.c
@@ -417,6 +417,28 @@ struct stmt *reject_stmt_alloc(const struct location *loc)
return stmt_alloc(loc, &reject_stmt_ops);
}
+static void abcde_stmt_print(const struct stmt *stmt)
+{
+ printf("abcde \"%s\"", stmt->abcde.text);
+}
+
+static void abcde_stmt_destroy(struct stmt *stmt)
+{
+ xfree(stmt->abcde.text);
+}
+
+static const struct stmt_ops abcde_stmt_ops = {
+ .type = STMT_ABCDE,
+ .name = "abcde",
+ .print = abcde_stmt_print,
+ .destroy = abcde_stmt_destroy,
+};
+
+struct stmt *abcde_stmt_alloc(const struct location *loc)
+{
+ return stmt_alloc(loc, &abcde_stmt_ops);
+}
+
static void print_nf_nat_flags(uint32_t flags)
{
const char *delim = " ";

Same thing applies to libnftnl, we clone the repository and checkout the tag libnftnl-1.0.7:

1
2
3
git clone git://git.netfilter.org/libnftnl
cd libnftnl
git checkout libnftnl-1.0.7 -b abcde

After some try and error, we end up with the following patch using command git diff libnftnl-1.0.7:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
diff --git a/include/libnftnl/expr.h b/include/libnftnl/expr.h
index 74e986d..26259a7 100644
--- a/include/libnftnl/expr.h
+++ b/include/libnftnl/expr.h
@@ -186,6 +186,10 @@ enum {
NFTNL_EXPR_REJECT_CODE,
};
+enum {
+ NFTNL_EXPR_ABCDE_TEXT = NFTNL_EXPR_BASE,
+};
+
enum {
NFTNL_EXPR_QUEUE_NUM = NFTNL_EXPR_BASE,
NFTNL_EXPR_QUEUE_TOTAL,
diff --git a/include/linux/netfilter/abcde.h b/include/linux/netfilter/abcde.h
new file mode 100644
index 0000000..eb027a7
--- /dev/null
+++ b/include/linux/netfilter/abcde.h
@@ -0,0 +1,12 @@
+#ifndef _ABCDE_H
+#define _ABCDE_H
+
+enum nft_abcde_attributes {
+ NFTA_ABCDE_UNSPEC,
+ NFTA_ABCDE_TEXT,
+ __NFTA_ABCDE_MAX,
+};
+
+#define NFTA_ABCDE_MAX (__NFTA_ABCDE_MAX - 1)
+
+#endif /* _ABCDE_H */
diff --git a/src/Makefile.am b/src/Makefile.am
index 485a8c4..a9cb87d 100644
--- a/src/Makefile.am
+++ b/src/Makefile.am
@@ -35,6 +35,7 @@ libnftnl_la_SOURCES = utils.c \
expr/fwd.c \
expr/limit.c \
expr/log.c \
+ expr/abcde.c \
expr/lookup.c \
expr/dynset.c \
expr/immediate.c \
diff --git a/src/expr/abcde.c b/src/expr/abcde.c
new file mode 100644
index 0000000..e76abd4
--- /dev/null
+++ b/src/expr/abcde.c
@@ -0,0 +1,183 @@
+#include <stdio.h>
+#include <stdint.h>
+#include <string.h>
+#include <arpa/inet.h>
+#include <errno.h>
+#include <linux/netfilter/nf_tables.h>
+#include <linux/netfilter/abcde.h>
+
+#include "internal.h"
+#include <libmnl/libmnl.h>
+#include <libnftnl/expr.h>
+#include <libnftnl/rule.h>
+
+struct nftnl_expr_abcde {
+ const char *text;
+};
+
+static int nftnl_expr_abcde_set(struct nftnl_expr *e, uint16_t type,
+ const void *data, uint32_t data_len)
+{
+ struct nftnl_expr_abcde *abcde = nftnl_expr_data(e);
+ switch(type){
+ case NFTNL_EXPR_ABCDE_TEXT:
+ abcde->text = strdup(data);
+ if (!abcde->text)
+ return -1;
+ break;
+ }
+ return 0;
+}
+
+static const void *
+nftnl_expr_abcde_get(const struct nftnl_expr *e, uint16_t type,
+ uint32_t *data_len)
+{
+ struct nftnl_expr_abcde *abcde = nftnl_expr_data(e);
+
+ switch(type) {
+ case NFTNL_EXPR_ABCDE_TEXT:
+ *data_len = strlen(abcde->text)+1;
+ return abcde->text;
+ }
+ return NULL;
+}
+
+static int nftnl_expr_abcde_cb(const struct nlattr *attr, void *data)
+{
+ const struct nlattr **tb = data;
+ int type = mnl_attr_get_type(attr);
+
+ if (mnl_attr_type_valid(attr, NFTA_ABCDE_MAX) < 0)
+ return MNL_CB_OK;
+
+ switch(type) {
+ case NFTNL_EXPR_ABCDE_TEXT:
+ if (mnl_attr_validate(attr, MNL_TYPE_STRING) < 0)
+ abi_breakage();
+ break;
+ }
+
+ tb[type] = attr;
+ return MNL_CB_OK;
+}
+
+static void
+nftnl_expr_abcde_build(struct nlmsghdr *nlh, const struct nftnl_expr *e)
+{
+ struct nftnl_expr_abcde *abcde = nftnl_expr_data(e);
+
+ if (e->flags & (1 << NFTNL_EXPR_ABCDE_TEXT))
+ mnl_attr_put_strz(nlh, NFTNL_EXPR_ABCDE_TEXT, abcde->text);
+}
+
+static int
+nftnl_expr_abcde_parse(struct nftnl_expr *e, struct nlattr *attr)
+{
+ struct nftnl_expr_abcde *abcde = nftnl_expr_data(e);
+ struct nlattr *tb[NFTA_ABCDE_MAX+1] = {};
+
+ if (mnl_attr_parse_nested(attr, nftnl_expr_abcde_cb, tb) < 0)
+ return -1;
+
+ if (tb[NFTNL_EXPR_ABCDE_TEXT]) {
+ if (abcde->text)
+ xfree(abcde->text);
+
+ abcde->text = strdup(mnl_attr_get_str(tb[NFTNL_EXPR_ABCDE_TEXT]));
+ if (!abcde->text)
+ return -1;
+ e->flags |= (1 << NFTNL_EXPR_ABCDE_TEXT);
+ }
+
+ return 0;
+}
+
+static int nftnl_expr_abcde_json_parse(struct nftnl_expr *e, json_t *root,
+ struct nftnl_parse_err *err)
+{
+#ifdef JSON_PARSING
+ const char *text;
+ uint16_t group, qthreshold;
+
+ text = nftnl_jansson_parse_str(root, "text", err);
+ if (text != NULL)
+ nftnl_expr_set_str(e, NFTNL_EXPR_ABCDE_TEXT, text);
+
+ return 0;
+#else
+ errno = EOPNOTSUPP;
+ return -1;
+#endif
+}
+
+static int nftnl_expr_abcde_snprintf_default(char *buf, size_t size,
+ const struct nftnl_expr *e)
+{
+ struct nftnl_expr_abcde *abcde = nftnl_expr_data(e);
+ int ret, offset = 0, len = size;
+
+ if (e->flags & (1 << NFTNL_EXPR_ABCDE_TEXT)) {
+ ret = snprintf(buf, len, "text %s ", abcde->text);
+ SNPRINTF_BUFFER_SIZE(ret, size, len, offset);
+ }
+
+ return offset;
+}
+
+static int nftnl_expr_abcde_export(char *buf, size_t size,
+ const struct nftnl_expr *e, int type)
+{
+ struct nftnl_expr_abcde *abcde = nftnl_expr_data(e);
+ NFTNL_BUF_INIT(b, buf, size);
+
+ if (e->flags & (1 << NFTNL_EXPR_ABCDE_TEXT))
+ nftnl_buf_str(&b, type, abcde->text, "text");
+
+ return nftnl_buf_done(&b);
+}
+
+static int
+nftnl_expr_abcde_snprintf(char *buf, size_t len, uint32_t type,
+ uint32_t flags, const struct nftnl_expr *e)
+{
+ switch(type) {
+ case NFTNL_OUTPUT_DEFAULT:
+ return nftnl_expr_abcde_snprintf_default(buf, len, e);
+ case NFTNL_OUTPUT_XML:
+ case NFTNL_OUTPUT_JSON:
+ return nftnl_expr_abcde_export(buf, len, e, type);
+ default:
+ break;
+ }
+ return -1;
+}
+
+static void nftnl_expr_abcde_free(const struct nftnl_expr *e)
+{
+ struct nftnl_expr_abcde *abcde = nftnl_expr_data(e);
+
+ xfree(abcde->text);
+}
+
+static bool nftnl_expr_abcde_cmp(const struct nftnl_expr *e1,
+ const struct nftnl_expr *e2)
+{
+ struct nftnl_expr_abcde *l1 = nftnl_expr_data(e1);
+ struct nftnl_expr_abcde *l2 = nftnl_expr_data(e2);
+ return !strcmp(l1->text, l2->text);
+}
+
+struct expr_ops expr_ops_abcde = {
+ .name = "abcde",
+ .alloc_len = sizeof(struct nftnl_expr_abcde),
+ .max_attr = NFTA_ABCDE_MAX,
+ .free = nftnl_expr_abcde_free,
+ .cmp = nftnl_expr_abcde_cmp,
+ .set = nftnl_expr_abcde_set,
+ .get = nftnl_expr_abcde_get,
+ .parse = nftnl_expr_abcde_parse,
+ .build = nftnl_expr_abcde_build,
+ .snprintf = nftnl_expr_abcde_snprintf,
+ .json_parse = nftnl_expr_abcde_json_parse,
+};
diff --git a/src/expr_ops.c b/src/expr_ops.c
index 7a0e1e3..a02878c 100644
--- a/src/expr_ops.c
+++ b/src/expr_ops.c
@@ -33,6 +33,7 @@ extern struct expr_ops expr_ops_target;
extern struct expr_ops expr_ops_dynset;
extern struct expr_ops expr_ops_hash;
extern struct expr_ops expr_ops_fib;
+extern struct expr_ops expr_ops_abcde;
static struct expr_ops expr_ops_notrack = {
.name = "notrack",
@@ -69,6 +70,7 @@ static struct expr_ops *expr_ops[] = {
&expr_ops_hash,
&expr_ops_fib,
&expr_ops_objref,
+ &expr_ops_abcde,
NULL,
};

The abcde branch of nftables and libnftnl can be found at GitHub:
https://github.com/zasdfgbnm/nftables/tree/abcde
https://github.com/zasdfgbnm/libnftnl/tree/abcde

Test

Our new module can be tested by inserting our module, and then using our self-compiled nft tool to add a rule that looks like:

1
2
3
4
5
PREFIX=/home/gaoxiang/tmp/test_nftables
export LD_LIBRARY_PATH=$PREFIX/lib
$PREFIX/sbin/nft add table ip test
$PREFIX/sbin/nft add chain test test \{ type filter hook postrouting priority 0\; \}
$PREFIX/sbin/nft add rule ip test test tcp sport 4000 abcde darkhttpd log prefix "darkhttpd___"

Open a darkhttpd server, access to it, and the output of dmesg will looks like:

1
2
3
4
5
6
7
8
9
[ 2427.056229] darkhttpd___IN= OUT=lo SRC=127.0.0.1 DST=127.0.0.1 LEN=60 TOS=0x00 PREC=0x00 TTL=64 ID=0 DF PROTO=TCP SPT=4000 DPT=36840 WINDOW=43690 RES=0x00 ACK SYN URGP=0
[ 2427.094038] darkhttpd___IN= OUT=lo SRC=127.0.0.1 DST=127.0.0.1 LEN=52 TOS=0x00 PREC=0x00 TTL=64 ID=9012 DF PROTO=TCP SPT=4000 DPT=36840 WINDOW=357 RES=0x00 ACK URGP=0
[ 2427.094162] darkhttpd___IN= OUT=lo SRC=127.0.0.1 DST=127.0.0.1 LEN=269 TOS=0x00 PREC=0x00 TTL=64 ID=9013 DF PROTO=TCP SPT=4000 DPT=36840 WINDOW=357 RES=0x00 ACK PSH URGP=0
[ 2427.094216] darkhttpd___IN= OUT=lo SRC=127.0.0.1 DST=127.0.0.1 LEN=97 TOS=0x00 PREC=0x00 TTL=64 ID=9014 DF PROTO=TCP SPT=4000 DPT=36840 WINDOW=357 RES=0x00 ACK PSH URGP=0
[ 2427.361198] darkhttpd___IN= OUT=lo SRC=127.0.0.1 DST=127.0.0.1 LEN=246 TOS=0x00 PREC=0x00 TTL=64 ID=9015 DF PROTO=TCP SPT=4000 DPT=36840 WINDOW=372 RES=0x00 ACK PSH URGP=0
[ 2427.361215] darkhttpd___IN= OUT=lo SRC=127.0.0.1 DST=127.0.0.1 LEN=258 TOS=0x00 PREC=0x00 TTL=64 ID=9016 DF PROTO=TCP SPT=4000 DPT=36840 WINDOW=372 RES=0x00 ACK PSH URGP=0
[ 2475.658088] darkhttpd___IN= OUT=lo SRC=127.0.0.1 DST=127.0.0.1 LEN=52 TOS=0x00 PREC=0x00 TTL=64 ID=9017 DF PROTO=TCP SPT=4000 DPT=36840 WINDOW=372 RES=0x00 ACK URGP=0
[ 2521.747363] darkhttpd___IN= OUT=lo SRC=127.0.0.1 DST=127.0.0.1 LEN=52 TOS=0x00 PREC=0x00 TTL=64 ID=9018 DF PROTO=TCP SPT=4000 DPT=36840 WINDOW=372 RES=0x00 ACK URGP=0
[ 2567.826378] darkhttpd___IN= OUT=lo SRC=127.0.0.1 DST=127.0.0.1 LEN=52 TOS=0x00 PREC=0x00 TTL=64 ID=9019 DF PROTO=TCP SPT=4000 DPT=36840 WINDOW=372 RES=0x00 ACK URGP=0