p4-dev@lists.p4.org

list for questions/discussion of p4 programs and tools

View all threads

Same hash value for different parameters.

SG
Sahil Gupta
Fri, Feb 19, 2021 8:16 AM

Hi all,
I was storing values in a register using the CRC16 hash function of flow
tuples and ipv4 total length field. To my surprise SYN packet and HTTP GET
request are having the same hash values.
I want the retransmitted packet not to be processed by my P4 program logic
as the first copy of the packet already processed by it.
But since, hash of SYN and HTTP GET packet fields are the same for some
connections, my logic to classify original and retransmitted copy is
failing.
Any suggestions how I can avoid this hash collision?

Attaching my customized pkt lifecycle log file generated from log_msg
function, s1.log and pcap for reference.

Case 1: (Hash is same)
retransmitted_register_index: 7 for both SYN and HTTP GET request though
ipv4 total length is 60 and 126 respectively.
SYN pkd id = 56199
HTTP pkt id = 56201

Case 2: (Hash is different)
retransmitted_register_index: 6 and 0 for SYN and HTTP GET request. (ipv4
total length is 60 and 126)
SYN pkt id = 45176
HTTP pkt id = 45178

Hash function and parameter used:
hash(retransmitted_register_index, HashAlgorithm.crc16,
(bit<32>)0,{hdr.ipv4.srcAddr,
hdr.tcp.srcPort,hdr.ipv4.dstAddr,hdr.tcp.dstPort,hdr.ipv4.totalLen},(bit<32>)10);

Regards
Sahil Gupta

Hi all, I was storing values in a register using the CRC16 hash function of flow tuples and ipv4 total length field. To my surprise SYN packet and HTTP GET request are having the same hash values. I want the retransmitted packet not to be processed by my P4 program logic as the first copy of the packet already processed by it. But since, hash of SYN and HTTP GET packet fields are the same for some connections, my logic to classify original and retransmitted copy is failing. Any suggestions how I can avoid this hash collision? Attaching my customized pkt lifecycle log file generated from log_msg function, s1.log and pcap for reference. Case 1: (Hash is same) *retransmitted_register_index*: 7 for both SYN and HTTP GET request though ipv4 total length is 60 and 126 respectively. SYN pkd id = 56199 HTTP pkt id = 56201 Case 2: (Hash is different) *retransmitted_register_index*: 6 and 0 for SYN and HTTP GET request. (ipv4 total length is 60 and 126) SYN pkt id = 45176 HTTP pkt id = 45178 *Hash function and parameter used*: hash(retransmitted_register_index, HashAlgorithm.crc16, (bit<32>)0,{hdr.ipv4.srcAddr, hdr.tcp.srcPort,hdr.ipv4.dstAddr,hdr.tcp.dstPort,hdr.ipv4.totalLen},(bit<32>)10); Regards Sahil Gupta
EO
Eder Ollora Zaballa
Fri, Feb 19, 2021 6:04 PM

Hi,

I could be wrong but I think CRC16 follows the birth paradox in regard to the collision probability. I did a fast calculation and the collision for something like CRC16 with only 2 different input values is very low. The problem is that when you have multiple different values, that is, multiple combinations of different tuples, then the probability is fairly high. With only 300 different combinations, you stand a collision chance of 50%. You can see it below.

[cid:513f4074-506d-47bf-be81-c78cfb4a7388]

I am not aware of many ways to keep a unique identification (per flow) and done in the data plane. As you said, you use the 16 bit value as an index of a register. Values of 32 bits have better collision probabilities, but still, might either be difficult to use it as a register index or some cases might still not stand a single collision.

I could imagine, though, sending a digest to the controller when a first packet arrives and let the controller decide an ID (if your idea can stand this case). Then use this ID at the data plane, maybe as parameter of a table when the tuple matches. If involving the controller is fast enough it might work for you. I think some more experienced P4 developers might have better ideas and might give you a better answer.  🙂

Regards,


From: Sahil Gupta sg5414@rit.edu
Sent: 19 February 2021 09:16
To: p4-dev p4-dev@lists.p4.org
Subject: [P4-dev] Same hash value for different parameters.

Hi all,
I was storing values in a register using the CRC16 hash function of flow tuples and ipv4 total length field. To my surprise SYN packet and HTTP GET request are having the same hash values.
I want the retransmitted packet not to be processed by my P4 program logic as the first copy of the packet already processed by it.
But since, hash of SYN and HTTP GET packet fields are the same for some connections, my logic to classify original and retransmitted copy is failing.
Any suggestions how I can avoid this hash collision?

Attaching my customized pkt lifecycle log file generated from log_msg function, s1.log and pcap for reference.

Case 1: (Hash is same)
retransmitted_register_index: 7 for both SYN and HTTP GET request though ipv4 total length is 60 and 126 respectively.
SYN pkd id = 56199
HTTP pkt id = 56201

Case 2: (Hash is different)
retransmitted_register_index: 6 and 0 for SYN and HTTP GET request. (ipv4 total length is 60 and 126)
SYN pkt id = 45176
HTTP pkt id = 45178

Hash function and parameter used:
hash(retransmitted_register_index, HashAlgorithm.crc16, (bit<32>)0,{hdr.ipv4.srcAddr, hdr.tcp.srcPort,hdr.ipv4.dstAddr,hdr.tcp.dstPort,hdr.ipv4.totalLen},(bit<32>)10);

Regards
Sahil Gupta

Hi, I could be wrong but I think CRC16 follows the birth paradox in regard to the collision probability. I did a fast calculation and the collision for something like CRC16 with only 2 different input values is very low. The problem is that when you have multiple different values, that is, multiple combinations of different tuples, then the probability is fairly high. With only 300 different combinations, you stand a collision chance of 50%. You can see it below. [cid:513f4074-506d-47bf-be81-c78cfb4a7388] I am not aware of many ways to keep a unique identification (per flow) and done in the data plane. As you said, you use the 16 bit value as an index of a register. Values of 32 bits have better collision probabilities, but still, might either be difficult to use it as a register index or some cases might still not stand a single collision. I could imagine, though, sending a digest to the controller when a first packet arrives and let the controller decide an ID (if your idea can stand this case). Then use this ID at the data plane, maybe as parameter of a table when the tuple matches. If involving the controller is fast enough it might work for you. I think some more experienced P4 developers might have better ideas and might give you a better answer. 🙂 Regards, ________________________________ From: Sahil Gupta <sg5414@rit.edu> Sent: 19 February 2021 09:16 To: p4-dev <p4-dev@lists.p4.org> Subject: [P4-dev] Same hash value for different parameters. Hi all, I was storing values in a register using the CRC16 hash function of flow tuples and ipv4 total length field. To my surprise SYN packet and HTTP GET request are having the same hash values. I want the retransmitted packet not to be processed by my P4 program logic as the first copy of the packet already processed by it. But since, hash of SYN and HTTP GET packet fields are the same for some connections, my logic to classify original and retransmitted copy is failing. Any suggestions how I can avoid this hash collision? Attaching my customized pkt lifecycle log file generated from log_msg function, s1.log and pcap for reference. Case 1: (Hash is same) retransmitted_register_index: 7 for both SYN and HTTP GET request though ipv4 total length is 60 and 126 respectively. SYN pkd id = 56199 HTTP pkt id = 56201 Case 2: (Hash is different) retransmitted_register_index: 6 and 0 for SYN and HTTP GET request. (ipv4 total length is 60 and 126) SYN pkt id = 45176 HTTP pkt id = 45178 Hash function and parameter used: hash(retransmitted_register_index, HashAlgorithm.crc16, (bit<32>)0,{hdr.ipv4.srcAddr, hdr.tcp.srcPort,hdr.ipv4.dstAddr,hdr.tcp.dstPort,hdr.ipv4.totalLen},(bit<32>)10); Regards Sahil Gupta
SG
Sahil Gupta
Fri, Feb 19, 2021 8:12 PM

Thanks Eder for analytics over CRC16 hash collision probability.
I don't want to involve the controller since my approach itself induces
delays in communications.
This packet classification exercise is to reduce this delay.
With this classification some unwanted packets will not be processed by my
dataplane program.
Involving controller is suicide for dataplane router.
Let's see what others have to add.

Thanks
Sahil Gupta

On Fri, Feb 19, 2021 at 1:05 PM Eder Ollora Zaballa eoza@fotonik.dtu.dk
wrote:

Hi,

I could be wrong but I think CRC16 follows the birth paradox in regard to
the collision probability. I did a fast calculation and the collision for
something like CRC16 with only 2 different input values is very low. The
problem is that when you have multiple different values, that is, multiple
combinations of different tuples, then the probability is fairly high. With
only 300 different combinations, you stand a collision chance of 50%. You
can see it below.

I am not aware of many ways to keep a unique identification (per flow) and
done in the data plane. As you said, you use the 16 bit value as an index
of a register. Values of 32 bits have better collision probabilities, but
still, might either be difficult to use it as a register index or some
cases might still not stand a single collision.

I could imagine, though, sending a digest to the controller when a first
packet arrives and let the controller decide an ID (if your idea can stand
this case). Then use this ID at the data plane, maybe as parameter of a
table when the tuple matches. If involving the controller is fast enough it
might work for you. I think some more experienced P4 developers might have
better ideas and might give you a better answer.  🙂

Regards,

From: Sahil Gupta sg5414@rit.edu
Sent: 19 February 2021 09:16
To: p4-dev p4-dev@lists.p4.org
Subject: [P4-dev] Same hash value for different parameters.

Hi all,
I was storing values in a register using the CRC16 hash function of flow
tuples and ipv4 total length field. To my surprise SYN packet and HTTP GET
request are having the same hash values.
I want the retransmitted packet not to be processed by my P4 program logic
as the first copy of the packet already processed by it.
But since, hash of SYN and HTTP GET packet fields are the same for some
connections, my logic to classify original and retransmitted copy is
failing.
Any suggestions how I can avoid this hash collision?

Attaching my customized pkt lifecycle log file generated from log_msg
function, s1.log and pcap for reference.

Case 1: (Hash is same)
retransmitted_register_index: 7 for both SYN and HTTP GET request
though ipv4 total length is 60 and 126 respectively.
SYN pkd id = 56199
HTTP pkt id = 56201

Case 2: (Hash is different)
retransmitted_register_index: 6 and 0 for SYN and HTTP GET request.
(ipv4 total length is 60 and 126)
SYN pkt id = 45176
HTTP pkt id = 45178

Hash function and parameter used:
hash(retransmitted_register_index, HashAlgorithm.crc16,
(bit<32>)0,{hdr.ipv4.srcAddr,
hdr.tcp.srcPort,hdr.ipv4.dstAddr,hdr.tcp.dstPort,hdr.ipv4.totalLen},(bit<32>)10);

Regards
Sahil Gupta

Thanks Eder for analytics over CRC16 hash collision probability. I don't want to involve the controller since my approach itself induces delays in communications. This packet classification exercise is to reduce this delay. With this classification some unwanted packets will not be processed by my dataplane program. Involving controller is suicide for dataplane router. Let's see what others have to add. ☺ Thanks Sahil Gupta On Fri, Feb 19, 2021 at 1:05 PM Eder Ollora Zaballa <eoza@fotonik.dtu.dk> wrote: > Hi, > > I could be wrong but I think CRC16 follows the birth paradox in regard to > the collision probability. I did a fast calculation and the collision for > something like CRC16 with only 2 different input values is very low. The > problem is that when you have multiple different values, that is, multiple > combinations of different tuples, then the probability is fairly high. With > only 300 different combinations, you stand a collision chance of 50%. You > can see it below. > > > > I am not aware of many ways to keep a unique identification (per flow) and > done in the data plane. As you said, you use the 16 bit value as an index > of a register. Values of 32 bits have better collision probabilities, but > still, might either be difficult to use it as a register index or some > cases might still not stand a single collision. > > I could imagine, though, sending a digest to the controller when a first > packet arrives and let the controller decide an ID (if your idea can stand > this case). Then use this ID at the data plane, maybe as parameter of a > table when the tuple matches. If involving the controller is fast enough it > might work for you. I think some more experienced P4 developers might have > better ideas and might give you a better answer. 🙂 > > Regards, > ------------------------------ > *From:* Sahil Gupta <sg5414@rit.edu> > *Sent:* 19 February 2021 09:16 > *To:* p4-dev <p4-dev@lists.p4.org> > *Subject:* [P4-dev] Same hash value for different parameters. > > Hi all, > I was storing values in a register using the CRC16 hash function of flow > tuples and ipv4 total length field. To my surprise SYN packet and HTTP GET > request are having the same hash values. > I want the retransmitted packet not to be processed by my P4 program logic > as the first copy of the packet already processed by it. > But since, hash of SYN and HTTP GET packet fields are the same for some > connections, my logic to classify original and retransmitted copy is > failing. > Any suggestions how I can avoid this hash collision? > > > > Attaching my customized pkt lifecycle log file generated from log_msg > function, s1.log and pcap for reference. > > Case 1: (Hash is same) > *retransmitted_register_index*: 7 for both SYN and HTTP GET request > though ipv4 total length is 60 and 126 respectively. > SYN pkd id = 56199 > HTTP pkt id = 56201 > > Case 2: (Hash is different) > *retransmitted_register_index*: 6 and 0 for SYN and HTTP GET request. > (ipv4 total length is 60 and 126) > SYN pkt id = 45176 > HTTP pkt id = 45178 > > *Hash function and parameter used*: > hash(retransmitted_register_index, HashAlgorithm.crc16, > (bit<32>)0,{hdr.ipv4.srcAddr, > hdr.tcp.srcPort,hdr.ipv4.dstAddr,hdr.tcp.dstPort,hdr.ipv4.totalLen},(bit<32>)10); > > > Regards > Sahil Gupta > >
H
hemant@mnkcg.com
Sat, Feb 20, 2021 2:26 PM

What is your flow tuple?

A common data plane classification tuple is the 5-tuple (source IP, destination IP, source layer-4 port, destination layer-4 port, and protocol).  HTTP uses TCP, so both HTTP and TCP have the same IP protocol.  If the traffic for HTTP and TCP uses identical source and destination IP and layer-4 ports, why would hashed value for HTTP and TCP value be different?

The issue is not a P4 issue.  Any tuple hash in C would have the same problem.

A good implementations of tuple hash in C is OVS.

https://github.com/openvswitch/ovs/blob/79349cbab0b2a755140eedb91833ad2760520a83/lib/flow.c#L2221

The theory behind tuple hashes is described in this paper.

https://tinyurl.com/yrs65622

Hemant

From: Eder Ollora Zaballa via P4-dev p4-dev@lists.p4.org
Sent: Friday, February 19, 2021 1:05 PM
To: p4-dev p4-dev@lists.p4.org; sg5414@rit.edu
Subject: [P4-dev] Re: Same hash value for different parameters.

Hi,

I could be wrong but I think CRC16 follows the birth paradox in regard to the collision probability. I did a fast calculation and the collision for something like CRC16 with only 2 different input values is very low. The problem is that when you have multiple different values, that is, multiple combinations of different tuples, then the probability is fairly high. With only 300 different combinations, you stand a collision chance of 50%. You can see it below.

I am not aware of many ways to keep a unique identification (per flow) and done in the data plane. As you said, you use the 16 bit value as an index of a register. Values of 32 bits have better collision probabilities, but still, might either be difficult to use it as a register index or some cases might still not stand a single collision.

I could imagine, though, sending a digest to the controller when a first packet arrives and let the controller decide an ID (if your idea can stand this case). Then use this ID at the data plane, maybe as parameter of a table when the tuple matches. If involving the controller is fast enough it might work for you. I think some more experienced P4 developers might have better ideas and might give you a better answer.  🙂

Regards,


From: Sahil Gupta <sg5414@rit.edu mailto:sg5414@rit.edu >
Sent: 19 February 2021 09:16
To: p4-dev <p4-dev@lists.p4.org mailto:p4-dev@lists.p4.org >
Subject: [P4-dev] Same hash value for different parameters.

Hi all,

I was storing values in a register using the CRC16 hash function of flow tuples and ipv4 total length field. To my surprise SYN packet and HTTP GET request are having the same hash values.

I want the retransmitted packet not to be processed by my P4 program logic as the first copy of the packet already processed by it.

But since, hash of SYN and HTTP GET packet fields are the same for some connections, my logic to classify original and retransmitted copy is failing.

Any suggestions how I can avoid this hash collision?

Attaching my customized pkt lifecycle log file generated from log_msg function, s1.log and pcap for reference.

Case 1: (Hash is same)

retransmitted_register_index: 7 for both SYN and HTTP GET request though ipv4 total length is 60 and 126 respectively.

SYN pkd id = 56199

HTTP pkt id = 56201

Case 2: (Hash is different)

retransmitted_register_index: 6 and 0 for SYN and HTTP GET request. (ipv4 total length is 60 and 126)

SYN pkt id = 45176
HTTP pkt id = 45178

Hash function and parameter used:
hash(retransmitted_register_index, HashAlgorithm.crc16, (bit<32>)0,{hdr.ipv4.srcAddr, hdr.tcp.srcPort,hdr.ipv4.dstAddr,hdr.tcp.dstPort,hdr.ipv4.totalLen},(bit<32>)10);

Regards

Sahil Gupta

What is your flow tuple? A common data plane classification tuple is the 5-tuple (source IP, destination IP, source layer-4 port, destination layer-4 port, and protocol). HTTP uses TCP, so both HTTP and TCP have the same IP protocol. If the traffic for HTTP and TCP uses identical source and destination IP and layer-4 ports, why would hashed value for HTTP and TCP value be different? The issue is not a P4 issue. Any tuple hash in C would have the same problem. A good implementations of tuple hash in C is OVS. https://github.com/openvswitch/ovs/blob/79349cbab0b2a755140eedb91833ad2760520a83/lib/flow.c#L2221 The theory behind tuple hashes is described in this paper. https://tinyurl.com/yrs65622 Hemant From: Eder Ollora Zaballa via P4-dev <p4-dev@lists.p4.org> Sent: Friday, February 19, 2021 1:05 PM To: p4-dev <p4-dev@lists.p4.org>; sg5414@rit.edu Subject: [P4-dev] Re: Same hash value for different parameters. Hi, I could be wrong but I think CRC16 follows the birth paradox in regard to the collision probability. I did a fast calculation and the collision for something like CRC16 with only 2 different input values is very low. The problem is that when you have multiple different values, that is, multiple combinations of different tuples, then the probability is fairly high. With only 300 different combinations, you stand a collision chance of 50%. You can see it below. I am not aware of many ways to keep a unique identification (per flow) and done in the data plane. As you said, you use the 16 bit value as an index of a register. Values of 32 bits have better collision probabilities, but still, might either be difficult to use it as a register index or some cases might still not stand a single collision. I could imagine, though, sending a digest to the controller when a first packet arrives and let the controller decide an ID (if your idea can stand this case). Then use this ID at the data plane, maybe as parameter of a table when the tuple matches. If involving the controller is fast enough it might work for you. I think some more experienced P4 developers might have better ideas and might give you a better answer. 🙂 Regards, _____ From: Sahil Gupta <sg5414@rit.edu <mailto:sg5414@rit.edu> > Sent: 19 February 2021 09:16 To: p4-dev <p4-dev@lists.p4.org <mailto:p4-dev@lists.p4.org> > Subject: [P4-dev] Same hash value for different parameters. Hi all, I was storing values in a register using the CRC16 hash function of flow tuples and ipv4 total length field. To my surprise SYN packet and HTTP GET request are having the same hash values. I want the retransmitted packet not to be processed by my P4 program logic as the first copy of the packet already processed by it. But since, hash of SYN and HTTP GET packet fields are the same for some connections, my logic to classify original and retransmitted copy is failing. Any suggestions how I can avoid this hash collision? Attaching my customized pkt lifecycle log file generated from log_msg function, s1.log and pcap for reference. Case 1: (Hash is same) retransmitted_register_index: 7 for both SYN and HTTP GET request though ipv4 total length is 60 and 126 respectively. SYN pkd id = 56199 HTTP pkt id = 56201 Case 2: (Hash is different) retransmitted_register_index: 6 and 0 for SYN and HTTP GET request. (ipv4 total length is 60 and 126) SYN pkt id = 45176 HTTP pkt id = 45178 Hash function and parameter used: hash(retransmitted_register_index, HashAlgorithm.crc16, (bit<32>)0,{hdr.ipv4.srcAddr, hdr.tcp.srcPort,hdr.ipv4.dstAddr,hdr.tcp.dstPort,hdr.ipv4.totalLen},(bit<32>)10); Regards Sahil Gupta
SG
Sahil Gupta
Sat, Feb 20, 2021 2:30 PM

I am using following values to create hash.
Source IP, source port, destination IP, destination port, and ipv4 total
length.

Since ipv4 total length field is different for SYN and HTTP packet, I am
expecting different hashes.

I will look into your papers as well for more theory on this.

Thanks
Sahil Gupta

On Sat, Feb 20, 2021, 9:27 AM hemant@mnkcg.com wrote:

What is your flow tuple?

A common data plane classification tuple is the 5-tuple (source IP,
destination IP, source layer-4 port, destination layer-4 port, and
protocol).  HTTP uses TCP, so both HTTP and TCP have the same IP protocol.
If the traffic for HTTP and TCP uses identical source and destination IP
and layer-4 ports, why would hashed value for HTTP and TCP value be
different?

The issue is not a P4 issue.  Any tuple hash in C would have the same
problem.

A good implementations of tuple hash in C is OVS.

https://github.com/openvswitch/ovs/blob/79349cbab0b2a755140eedb91833ad2760520a83/lib/flow.c#L2221

The theory behind tuple hashes is described in this paper.

https://tinyurl.com/yrs65622

Hemant

From: Eder Ollora Zaballa via P4-dev p4-dev@lists.p4.org
Sent: Friday, February 19, 2021 1:05 PM
To: p4-dev p4-dev@lists.p4.org; sg5414@rit.edu
Subject: [P4-dev] Re: Same hash value for different parameters.

Hi,

I could be wrong but I think CRC16 follows the birth paradox in regard to
the collision probability. I did a fast calculation and the collision for
something like CRC16 with only 2 different input values is very low. The
problem is that when you have multiple different values, that is, multiple
combinations of different tuples, then the probability is fairly high. With
only 300 different combinations, you stand a collision chance of 50%. You
can see it below.

I am not aware of many ways to keep a unique identification (per flow) and
done in the data plane. As you said, you use the 16 bit value as an index
of a register. Values of 32 bits have better collision probabilities, but
still, might either be difficult to use it as a register index or some
cases might still not stand a single collision.

I could imagine, though, sending a digest to the controller when a first
packet arrives and let the controller decide an ID (if your idea can stand
this case). Then use this ID at the data plane, maybe as parameter of a
table when the tuple matches. If involving the controller is fast enough it
might work for you. I think some more experienced P4 developers might have
better ideas and might give you a better answer.  🙂

Regards,

From: Sahil Gupta sg5414@rit.edu
Sent: 19 February 2021 09:16
To: p4-dev p4-dev@lists.p4.org
Subject: [P4-dev] Same hash value for different parameters.

Hi all,

I was storing values in a register using the CRC16 hash function of flow
tuples and ipv4 total length field. To my surprise SYN packet and HTTP GET
request are having the same hash values.

I want the retransmitted packet not to be processed by my P4 program logic
as the first copy of the packet already processed by it.

But since, hash of SYN and HTTP GET packet fields are the same for some
connections, my logic to classify original and retransmitted copy is
failing.

Any suggestions how I can avoid this hash collision?

Attaching my customized pkt lifecycle log file generated from log_msg
function, s1.log and pcap for reference.

Case 1: (Hash is same)

retransmitted_register_index: 7 for both SYN and HTTP GET request
though ipv4 total length is 60 and 126 respectively.

SYN pkd id = 56199

HTTP pkt id = 56201

Case 2: (Hash is different)

retransmitted_register_index: 6 and 0 for SYN and HTTP GET request.
(ipv4 total length is 60 and 126)

SYN pkt id = 45176
HTTP pkt id = 45178

Hash function and parameter used:
hash(retransmitted_register_index, HashAlgorithm.crc16,
(bit<32>)0,{hdr.ipv4.srcAddr,
hdr.tcp.srcPort,hdr.ipv4.dstAddr,hdr.tcp.dstPort,hdr.ipv4.totalLen},(bit<32>)10);

Regards

Sahil Gupta

I am using following values to create hash. Source IP, source port, destination IP, destination port, and ipv4 total length. Since ipv4 total length field is different for SYN and HTTP packet, I am expecting different hashes. I will look into your papers as well for more theory on this. Thanks Sahil Gupta On Sat, Feb 20, 2021, 9:27 AM <hemant@mnkcg.com> wrote: > What is your flow tuple? > > > > A common data plane classification tuple is the 5-tuple (source IP, > destination IP, source layer-4 port, destination layer-4 port, and > protocol). HTTP uses TCP, so both HTTP and TCP have the same IP protocol. > If the traffic for HTTP and TCP uses identical source and destination IP > and layer-4 ports, why would hashed value for HTTP and TCP value be > different? > > > > The issue is not a P4 issue. Any tuple hash in C would have the same > problem. > > > > A good implementations of tuple hash in C is OVS. > > > > > https://github.com/openvswitch/ovs/blob/79349cbab0b2a755140eedb91833ad2760520a83/lib/flow.c#L2221 > > > > The theory behind tuple hashes is described in this paper. > > > > https://tinyurl.com/yrs65622 > > > > Hemant > > > > *From:* Eder Ollora Zaballa via P4-dev <p4-dev@lists.p4.org> > *Sent:* Friday, February 19, 2021 1:05 PM > *To:* p4-dev <p4-dev@lists.p4.org>; sg5414@rit.edu > *Subject:* [P4-dev] Re: Same hash value for different parameters. > > > > Hi, > > > > I could be wrong but I think CRC16 follows the birth paradox in regard to > the collision probability. I did a fast calculation and the collision for > something like CRC16 with only 2 different input values is very low. The > problem is that when you have multiple different values, that is, multiple > combinations of different tuples, then the probability is fairly high. With > only 300 different combinations, you stand a collision chance of 50%. You > can see it below. > > > > > > I am not aware of many ways to keep a unique identification (per flow) and > done in the data plane. As you said, you use the 16 bit value as an index > of a register. Values of 32 bits have better collision probabilities, but > still, might either be difficult to use it as a register index or some > cases might still not stand a single collision. > > > > I could imagine, though, sending a digest to the controller when a first > packet arrives and let the controller decide an ID (if your idea can stand > this case). Then use this ID at the data plane, maybe as parameter of a > table when the tuple matches. If involving the controller is fast enough it > might work for you. I think some more experienced P4 developers might have > better ideas and might give you a better answer. 🙂 > > > > Regards, > ------------------------------ > > *From:* Sahil Gupta <sg5414@rit.edu> > *Sent:* 19 February 2021 09:16 > *To:* p4-dev <p4-dev@lists.p4.org> > *Subject:* [P4-dev] Same hash value for different parameters. > > > > Hi all, > > I was storing values in a register using the CRC16 hash function of flow > tuples and ipv4 total length field. To my surprise SYN packet and HTTP GET > request are having the same hash values. > > I want the retransmitted packet not to be processed by my P4 program logic > as the first copy of the packet already processed by it. > > But since, hash of SYN and HTTP GET packet fields are the same for some > connections, my logic to classify original and retransmitted copy is > failing. > > Any suggestions how I can avoid this hash collision? > > > > > Attaching my customized pkt lifecycle log file generated from log_msg > function, s1.log and pcap for reference. > > Case 1: (Hash is same) > > *retransmitted_register_index*: 7 for both SYN and HTTP GET request > though ipv4 total length is 60 and 126 respectively. > > SYN pkd id = 56199 > > HTTP pkt id = 56201 > > Case 2: (Hash is different) > > *retransmitted_register_index*: 6 and 0 for SYN and HTTP GET request. > (ipv4 total length is 60 and 126) > > SYN pkt id = 45176 > HTTP pkt id = 45178 > > > *Hash function and parameter used*: > hash(retransmitted_register_index, HashAlgorithm.crc16, > (bit<32>)0,{hdr.ipv4.srcAddr, > hdr.tcp.srcPort,hdr.ipv4.dstAddr,hdr.tcp.dstPort,hdr.ipv4.totalLen},(bit<32>)10); > > Regards > > Sahil Gupta >
H
hemant@mnkcg.com
Sat, Feb 20, 2021 3:05 PM

Clearly, the ipv4 packet length doesn’t suffice if all other fields of the tuple are identical.  Use HTTP and UDP and swap IP length with protocol in tuple and see if you get a collision.

Even the Varghese paper has to use a near-perfect hash.  A good source for perfect hash with test code is here:

https://burtleburtle.net/bob/hash/perfect.html

You should play with such code to first make sure what collides or not.  Then, I would get back to P4.

Hemant

From: Sahil Gupta sg5414@rit.edu
Sent: Saturday, February 20, 2021 9:31 AM
To: hemant@mnkcg.com
Cc: Eder Ollora Zaballa eoza@fotonik.dtu.dk; p4-dev p4-dev@lists.p4.org
Subject: Re: [P4-dev] Re: Same hash value for different parameters.

I am using following values to create hash.

Source IP, source port, destination IP, destination port, and ipv4 total length.

Since ipv4 total length field is different for SYN and HTTP packet, I am expecting different hashes.

I will look into your papers as well for more theory on this.

Thanks

Sahil Gupta

On Sat, Feb 20, 2021, 9:27 AM <hemant@mnkcg.com mailto:hemant@mnkcg.com > wrote:

What is your flow tuple?

A common data plane classification tuple is the 5-tuple (source IP, destination IP, source layer-4 port, destination layer-4 port, and protocol).  HTTP uses TCP, so both HTTP and TCP have the same IP protocol.  If the traffic for HTTP and TCP uses identical source and destination IP and layer-4 ports, why would hashed value for HTTP and TCP value be different?

The issue is not a P4 issue.  Any tuple hash in C would have the same problem.

A good implementations of tuple hash in C is OVS.

https://github.com/openvswitch/ovs/blob/79349cbab0b2a755140eedb91833ad2760520a83/lib/flow.c#L2221

The theory behind tuple hashes is described in this paper.

https://tinyurl.com/yrs65622

Hemant

From: Eder Ollora Zaballa via P4-dev <p4-dev@lists.p4.org mailto:p4-dev@lists.p4.org >
Sent: Friday, February 19, 2021 1:05 PM
To: p4-dev <p4-dev@lists.p4.org mailto:p4-dev@lists.p4.org >; sg5414@rit.edu mailto:sg5414@rit.edu
Subject: [P4-dev] Re: Same hash value for different parameters.

Hi,

I could be wrong but I think CRC16 follows the birth paradox in regard to the collision probability. I did a fast calculation and the collision for something like CRC16 with only 2 different input values is very low. The problem is that when you have multiple different values, that is, multiple combinations of different tuples, then the probability is fairly high. With only 300 different combinations, you stand a collision chance of 50%. You can see it below.

I am not aware of many ways to keep a unique identification (per flow) and done in the data plane. As you said, you use the 16 bit value as an index of a register. Values of 32 bits have better collision probabilities, but still, might either be difficult to use it as a register index or some cases might still not stand a single collision.

I could imagine, though, sending a digest to the controller when a first packet arrives and let the controller decide an ID (if your idea can stand this case). Then use this ID at the data plane, maybe as parameter of a table when the tuple matches. If involving the controller is fast enough it might work for you. I think some more experienced P4 developers might have better ideas and might give you a better answer.  🙂

Regards,


From: Sahil Gupta <sg5414@rit.edu mailto:sg5414@rit.edu >
Sent: 19 February 2021 09:16
To: p4-dev <p4-dev@lists.p4.org mailto:p4-dev@lists.p4.org >
Subject: [P4-dev] Same hash value for different parameters.

Hi all,

I was storing values in a register using the CRC16 hash function of flow tuples and ipv4 total length field. To my surprise SYN packet and HTTP GET request are having the same hash values.

I want the retransmitted packet not to be processed by my P4 program logic as the first copy of the packet already processed by it.

But since, hash of SYN and HTTP GET packet fields are the same for some connections, my logic to classify original and retransmitted copy is failing.

Any suggestions how I can avoid this hash collision?

Attaching my customized pkt lifecycle log file generated from log_msg function, s1.log and pcap for reference.

Case 1: (Hash is same)

retransmitted_register_index: 7 for both SYN and HTTP GET request though ipv4 total length is 60 and 126 respectively.

SYN pkd id = 56199

HTTP pkt id = 56201

Case 2: (Hash is different)

retransmitted_register_index: 6 and 0 for SYN and HTTP GET request. (ipv4 total length is 60 and 126)

SYN pkt id = 45176
HTTP pkt id = 45178

Hash function and parameter used:
hash(retransmitted_register_index, HashAlgorithm.crc16, (bit<32>)0,{hdr.ipv4.srcAddr, hdr.tcp.srcPort,hdr.ipv4.dstAddr,hdr.tcp.dstPort,hdr.ipv4.totalLen},(bit<32>)10);

Regards

Sahil Gupta

Clearly, the ipv4 packet length doesn’t suffice if all other fields of the tuple are identical. Use HTTP and UDP and swap IP length with protocol in tuple and see if you get a collision. Even the Varghese paper has to use a near-perfect hash. A good source for perfect hash with test code is here: https://burtleburtle.net/bob/hash/perfect.html You should play with such code to first make sure what collides or not. Then, I would get back to P4. Hemant From: Sahil Gupta <sg5414@rit.edu> Sent: Saturday, February 20, 2021 9:31 AM To: hemant@mnkcg.com Cc: Eder Ollora Zaballa <eoza@fotonik.dtu.dk>; p4-dev <p4-dev@lists.p4.org> Subject: Re: [P4-dev] Re: Same hash value for different parameters. I am using following values to create hash. Source IP, source port, destination IP, destination port, and ipv4 total length. Since ipv4 total length field is different for SYN and HTTP packet, I am expecting different hashes. I will look into your papers as well for more theory on this. Thanks Sahil Gupta On Sat, Feb 20, 2021, 9:27 AM <hemant@mnkcg.com <mailto:hemant@mnkcg.com> > wrote: What is your flow tuple? A common data plane classification tuple is the 5-tuple (source IP, destination IP, source layer-4 port, destination layer-4 port, and protocol). HTTP uses TCP, so both HTTP and TCP have the same IP protocol. If the traffic for HTTP and TCP uses identical source and destination IP and layer-4 ports, why would hashed value for HTTP and TCP value be different? The issue is not a P4 issue. Any tuple hash in C would have the same problem. A good implementations of tuple hash in C is OVS. https://github.com/openvswitch/ovs/blob/79349cbab0b2a755140eedb91833ad2760520a83/lib/flow.c#L2221 The theory behind tuple hashes is described in this paper. https://tinyurl.com/yrs65622 Hemant From: Eder Ollora Zaballa via P4-dev <p4-dev@lists.p4.org <mailto:p4-dev@lists.p4.org> > Sent: Friday, February 19, 2021 1:05 PM To: p4-dev <p4-dev@lists.p4.org <mailto:p4-dev@lists.p4.org> >; sg5414@rit.edu <mailto:sg5414@rit.edu> Subject: [P4-dev] Re: Same hash value for different parameters. Hi, I could be wrong but I think CRC16 follows the birth paradox in regard to the collision probability. I did a fast calculation and the collision for something like CRC16 with only 2 different input values is very low. The problem is that when you have multiple different values, that is, multiple combinations of different tuples, then the probability is fairly high. With only 300 different combinations, you stand a collision chance of 50%. You can see it below. I am not aware of many ways to keep a unique identification (per flow) and done in the data plane. As you said, you use the 16 bit value as an index of a register. Values of 32 bits have better collision probabilities, but still, might either be difficult to use it as a register index or some cases might still not stand a single collision. I could imagine, though, sending a digest to the controller when a first packet arrives and let the controller decide an ID (if your idea can stand this case). Then use this ID at the data plane, maybe as parameter of a table when the tuple matches. If involving the controller is fast enough it might work for you. I think some more experienced P4 developers might have better ideas and might give you a better answer. 🙂 Regards, _____ From: Sahil Gupta <sg5414@rit.edu <mailto:sg5414@rit.edu> > Sent: 19 February 2021 09:16 To: p4-dev <p4-dev@lists.p4.org <mailto:p4-dev@lists.p4.org> > Subject: [P4-dev] Same hash value for different parameters. Hi all, I was storing values in a register using the CRC16 hash function of flow tuples and ipv4 total length field. To my surprise SYN packet and HTTP GET request are having the same hash values. I want the retransmitted packet not to be processed by my P4 program logic as the first copy of the packet already processed by it. But since, hash of SYN and HTTP GET packet fields are the same for some connections, my logic to classify original and retransmitted copy is failing. Any suggestions how I can avoid this hash collision? Attaching my customized pkt lifecycle log file generated from log_msg function, s1.log and pcap for reference. Case 1: (Hash is same) retransmitted_register_index: 7 for both SYN and HTTP GET request though ipv4 total length is 60 and 126 respectively. SYN pkd id = 56199 HTTP pkt id = 56201 Case 2: (Hash is different) retransmitted_register_index: 6 and 0 for SYN and HTTP GET request. (ipv4 total length is 60 and 126) SYN pkt id = 45176 HTTP pkt id = 45178 Hash function and parameter used: hash(retransmitted_register_index, HashAlgorithm.crc16, (bit<32>)0,{hdr.ipv4.srcAddr, hdr.tcp.srcPort,hdr.ipv4.dstAddr,hdr.tcp.dstPort,hdr.ipv4.totalLen},(bit<32>)10); Regards Sahil Gupta
SG
Sahil Gupta
Sat, Feb 20, 2021 3:36 PM

Thanks hemant.
I will look into it.

On Sat, Feb 20, 2021, 10:05 AM hemant@mnkcg.com wrote:

Clearly, the ipv4 packet length doesn’t suffice if all other fields of the
tuple are identical.  Use HTTP and UDP and swap IP length with protocol in
tuple and see if you get a collision.

Even the Varghese paper has to use a near-perfect hash.  A good source for
perfect hash with test code is here:

https://burtleburtle.net/bob/hash/perfect.html

You should play with such code to first make sure what collides or not.
Then, I would get back to P4.

Hemant

From: Sahil Gupta sg5414@rit.edu
Sent: Saturday, February 20, 2021 9:31 AM
To: hemant@mnkcg.com
Cc: Eder Ollora Zaballa eoza@fotonik.dtu.dk; p4-dev <
p4-dev@lists.p4.org>
Subject: Re: [P4-dev] Re: Same hash value for different parameters.

I am using following values to create hash.

Source IP, source port, destination IP, destination port, and ipv4 total
length.

Since ipv4 total length field is different for SYN and HTTP packet, I am
expecting different hashes.

I will look into your papers as well for more theory on this.

Thanks

Sahil Gupta

On Sat, Feb 20, 2021, 9:27 AM hemant@mnkcg.com wrote:

What is your flow tuple?

A common data plane classification tuple is the 5-tuple (source IP,
destination IP, source layer-4 port, destination layer-4 port, and
protocol).  HTTP uses TCP, so both HTTP and TCP have the same IP protocol.
If the traffic for HTTP and TCP uses identical source and destination IP
and layer-4 ports, why would hashed value for HTTP and TCP value be
different?

The issue is not a P4 issue.  Any tuple hash in C would have the same
problem.

A good implementations of tuple hash in C is OVS.

https://github.com/openvswitch/ovs/blob/79349cbab0b2a755140eedb91833ad2760520a83/lib/flow.c#L2221

The theory behind tuple hashes is described in this paper.

https://tinyurl.com/yrs65622

Hemant

From: Eder Ollora Zaballa via P4-dev p4-dev@lists.p4.org
Sent: Friday, February 19, 2021 1:05 PM
To: p4-dev p4-dev@lists.p4.org; sg5414@rit.edu
Subject: [P4-dev] Re: Same hash value for different parameters.

Hi,

I could be wrong but I think CRC16 follows the birth paradox in regard to
the collision probability. I did a fast calculation and the collision for
something like CRC16 with only 2 different input values is very low. The
problem is that when you have multiple different values, that is, multiple
combinations of different tuples, then the probability is fairly high. With
only 300 different combinations, you stand a collision chance of 50%. You
can see it below.

I am not aware of many ways to keep a unique identification (per flow) and
done in the data plane. As you said, you use the 16 bit value as an index
of a register. Values of 32 bits have better collision probabilities, but
still, might either be difficult to use it as a register index or some
cases might still not stand a single collision.

I could imagine, though, sending a digest to the controller when a first
packet arrives and let the controller decide an ID (if your idea can stand
this case). Then use this ID at the data plane, maybe as parameter of a
table when the tuple matches. If involving the controller is fast enough it
might work for you. I think some more experienced P4 developers might have
better ideas and might give you a better answer.  🙂

Regards,

From: Sahil Gupta sg5414@rit.edu
Sent: 19 February 2021 09:16
To: p4-dev p4-dev@lists.p4.org
Subject: [P4-dev] Same hash value for different parameters.

Hi all,

I was storing values in a register using the CRC16 hash function of flow
tuples and ipv4 total length field. To my surprise SYN packet and HTTP GET
request are having the same hash values.

I want the retransmitted packet not to be processed by my P4 program logic
as the first copy of the packet already processed by it.

But since, hash of SYN and HTTP GET packet fields are the same for some
connections, my logic to classify original and retransmitted copy is
failing.

Any suggestions how I can avoid this hash collision?

Attaching my customized pkt lifecycle log file generated from log_msg
function, s1.log and pcap for reference.

Case 1: (Hash is same)

retransmitted_register_index: 7 for both SYN and HTTP GET request
though ipv4 total length is 60 and 126 respectively.

SYN pkd id = 56199

HTTP pkt id = 56201

Case 2: (Hash is different)

retransmitted_register_index: 6 and 0 for SYN and HTTP GET request.
(ipv4 total length is 60 and 126)

SYN pkt id = 45176
HTTP pkt id = 45178

Hash function and parameter used:
hash(retransmitted_register_index, HashAlgorithm.crc16,
(bit<32>)0,{hdr.ipv4.srcAddr,
hdr.tcp.srcPort,hdr.ipv4.dstAddr,hdr.tcp.dstPort,hdr.ipv4.totalLen},(bit<32>)10);

Regards

Sahil Gupta

Thanks hemant. I will look into it. On Sat, Feb 20, 2021, 10:05 AM <hemant@mnkcg.com> wrote: > Clearly, the ipv4 packet length doesn’t suffice if all other fields of the > tuple are identical. Use HTTP and UDP and swap IP length with protocol in > tuple and see if you get a collision. > > > > Even the Varghese paper has to use a near-perfect hash. A good source for > perfect hash with test code is here: > > > > https://burtleburtle.net/bob/hash/perfect.html > > > > You should play with such code to first make sure what collides or not. > Then, I would get back to P4. > > > > Hemant > > > > *From:* Sahil Gupta <sg5414@rit.edu> > *Sent:* Saturday, February 20, 2021 9:31 AM > *To:* hemant@mnkcg.com > *Cc:* Eder Ollora Zaballa <eoza@fotonik.dtu.dk>; p4-dev < > p4-dev@lists.p4.org> > *Subject:* Re: [P4-dev] Re: Same hash value for different parameters. > > > > I am using following values to create hash. > > Source IP, source port, destination IP, destination port, and ipv4 total > length. > > > > Since ipv4 total length field is different for SYN and HTTP packet, I am > expecting different hashes. > > > > I will look into your papers as well for more theory on this. > > > > > > Thanks > > Sahil Gupta > > > > > > On Sat, Feb 20, 2021, 9:27 AM <hemant@mnkcg.com> wrote: > > What is your flow tuple? > > > > A common data plane classification tuple is the 5-tuple (source IP, > destination IP, source layer-4 port, destination layer-4 port, and > protocol). HTTP uses TCP, so both HTTP and TCP have the same IP protocol. > If the traffic for HTTP and TCP uses identical source and destination IP > and layer-4 ports, why would hashed value for HTTP and TCP value be > different? > > > > The issue is not a P4 issue. Any tuple hash in C would have the same > problem. > > > > A good implementations of tuple hash in C is OVS. > > > > > https://github.com/openvswitch/ovs/blob/79349cbab0b2a755140eedb91833ad2760520a83/lib/flow.c#L2221 > > > > The theory behind tuple hashes is described in this paper. > > > > https://tinyurl.com/yrs65622 > > > > Hemant > > > > *From:* Eder Ollora Zaballa via P4-dev <p4-dev@lists.p4.org> > *Sent:* Friday, February 19, 2021 1:05 PM > *To:* p4-dev <p4-dev@lists.p4.org>; sg5414@rit.edu > *Subject:* [P4-dev] Re: Same hash value for different parameters. > > > > Hi, > > > > I could be wrong but I think CRC16 follows the birth paradox in regard to > the collision probability. I did a fast calculation and the collision for > something like CRC16 with only 2 different input values is very low. The > problem is that when you have multiple different values, that is, multiple > combinations of different tuples, then the probability is fairly high. With > only 300 different combinations, you stand a collision chance of 50%. You > can see it below. > > > > > > I am not aware of many ways to keep a unique identification (per flow) and > done in the data plane. As you said, you use the 16 bit value as an index > of a register. Values of 32 bits have better collision probabilities, but > still, might either be difficult to use it as a register index or some > cases might still not stand a single collision. > > > > I could imagine, though, sending a digest to the controller when a first > packet arrives and let the controller decide an ID (if your idea can stand > this case). Then use this ID at the data plane, maybe as parameter of a > table when the tuple matches. If involving the controller is fast enough it > might work for you. I think some more experienced P4 developers might have > better ideas and might give you a better answer. 🙂 > > > > Regards, > ------------------------------ > > *From:* Sahil Gupta <sg5414@rit.edu> > *Sent:* 19 February 2021 09:16 > *To:* p4-dev <p4-dev@lists.p4.org> > *Subject:* [P4-dev] Same hash value for different parameters. > > > > Hi all, > > I was storing values in a register using the CRC16 hash function of flow > tuples and ipv4 total length field. To my surprise SYN packet and HTTP GET > request are having the same hash values. > > I want the retransmitted packet not to be processed by my P4 program logic > as the first copy of the packet already processed by it. > > But since, hash of SYN and HTTP GET packet fields are the same for some > connections, my logic to classify original and retransmitted copy is > failing. > > Any suggestions how I can avoid this hash collision? > > > > > Attaching my customized pkt lifecycle log file generated from log_msg > function, s1.log and pcap for reference. > > Case 1: (Hash is same) > > *retransmitted_register_index*: 7 for both SYN and HTTP GET request > though ipv4 total length is 60 and 126 respectively. > > SYN pkd id = 56199 > > HTTP pkt id = 56201 > > Case 2: (Hash is different) > > *retransmitted_register_index*: 6 and 0 for SYN and HTTP GET request. > (ipv4 total length is 60 and 126) > > SYN pkt id = 45176 > HTTP pkt id = 45178 > > > *Hash function and parameter used*: > hash(retransmitted_register_index, HashAlgorithm.crc16, > (bit<32>)0,{hdr.ipv4.srcAddr, > hdr.tcp.srcPort,hdr.ipv4.dstAddr,hdr.tcp.dstPort,hdr.ipv4.totalLen},(bit<32>)10); > > Regards > > Sahil Gupta > >
AF
Andy Fingerhut
Sun, Feb 21, 2021 6:06 PM

If you have some data plane algorithm where hash collisions mean that your
algorithm does not work at all, then perhaps you should be using a P4 table
with exact match keys instead of hashing.

Every hash function from an input space with W-bit wide keys down to a
16-bit output or 32-bit output is going to have collisions, unless you have
some special situation that you have guaranteed in advance that the W-bit
wide keys are restricted to a special subset of them, such that the special
subset plus the hash function you have chosen means they cannot collide.
If you don't have any such restrictions on the input values, collisions can
and will happen.  You should plan for them in whatever algorithm you are
using.

Andy

On Sat, Feb 20, 2021 at 8:12 AM Sahil Gupta sg5414@rit.edu wrote:

Thanks hemant.
I will look into it.

On Sat, Feb 20, 2021, 10:05 AM hemant@mnkcg.com wrote:

Clearly, the ipv4 packet length doesn’t suffice if all other fields of
the tuple are identical.  Use HTTP and UDP and swap IP length with protocol
in tuple and see if you get a collision.

Even the Varghese paper has to use a near-perfect hash.  A good source
for perfect hash with test code is here:

https://burtleburtle.net/bob/hash/perfect.html

You should play with such code to first make sure what collides or not.
Then, I would get back to P4.

Hemant

From: Sahil Gupta sg5414@rit.edu
Sent: Saturday, February 20, 2021 9:31 AM
To: hemant@mnkcg.com
Cc: Eder Ollora Zaballa eoza@fotonik.dtu.dk; p4-dev <
p4-dev@lists.p4.org>
Subject: Re: [P4-dev] Re: Same hash value for different parameters.

I am using following values to create hash.

Source IP, source port, destination IP, destination port, and ipv4 total
length.

Since ipv4 total length field is different for SYN and HTTP packet, I am
expecting different hashes.

I will look into your papers as well for more theory on this.

Thanks

Sahil Gupta

On Sat, Feb 20, 2021, 9:27 AM hemant@mnkcg.com wrote:

What is your flow tuple?

A common data plane classification tuple is the 5-tuple (source IP,
destination IP, source layer-4 port, destination layer-4 port, and
protocol).  HTTP uses TCP, so both HTTP and TCP have the same IP protocol.
If the traffic for HTTP and TCP uses identical source and destination IP
and layer-4 ports, why would hashed value for HTTP and TCP value be
different?

The issue is not a P4 issue.  Any tuple hash in C would have the same
problem.

A good implementations of tuple hash in C is OVS.

https://github.com/openvswitch/ovs/blob/79349cbab0b2a755140eedb91833ad2760520a83/lib/flow.c#L2221

The theory behind tuple hashes is described in this paper.

https://tinyurl.com/yrs65622

Hemant

From: Eder Ollora Zaballa via P4-dev p4-dev@lists.p4.org
Sent: Friday, February 19, 2021 1:05 PM
To: p4-dev p4-dev@lists.p4.org; sg5414@rit.edu
Subject: [P4-dev] Re: Same hash value for different parameters.

Hi,

I could be wrong but I think CRC16 follows the birth paradox in regard to
the collision probability. I did a fast calculation and the collision for
something like CRC16 with only 2 different input values is very low. The
problem is that when you have multiple different values, that is, multiple
combinations of different tuples, then the probability is fairly high. With
only 300 different combinations, you stand a collision chance of 50%. You
can see it below.

I am not aware of many ways to keep a unique identification (per flow)
and done in the data plane. As you said, you use the 16 bit value as an
index of a register. Values of 32 bits have better collision probabilities,
but still, might either be difficult to use it as a register index or some
cases might still not stand a single collision.

I could imagine, though, sending a digest to the controller when a first
packet arrives and let the controller decide an ID (if your idea can stand
this case). Then use this ID at the data plane, maybe as parameter of a
table when the tuple matches. If involving the controller is fast enough it
might work for you. I think some more experienced P4 developers might have
better ideas and might give you a better answer.  🙂

Regards,

From: Sahil Gupta sg5414@rit.edu
Sent: 19 February 2021 09:16
To: p4-dev p4-dev@lists.p4.org
Subject: [P4-dev] Same hash value for different parameters.

Hi all,

I was storing values in a register using the CRC16 hash function of flow
tuples and ipv4 total length field. To my surprise SYN packet and HTTP GET
request are having the same hash values.

I want the retransmitted packet not to be processed by my P4 program
logic as the first copy of the packet already processed by it.

But since, hash of SYN and HTTP GET packet fields are the same for some
connections, my logic to classify original and retransmitted copy is
failing.

Any suggestions how I can avoid this hash collision?

Attaching my customized pkt lifecycle log file generated from log_msg
function, s1.log and pcap for reference.

Case 1: (Hash is same)

retransmitted_register_index: 7 for both SYN and HTTP GET request
though ipv4 total length is 60 and 126 respectively.

SYN pkd id = 56199

HTTP pkt id = 56201

Case 2: (Hash is different)

retransmitted_register_index: 6 and 0 for SYN and HTTP GET request.
(ipv4 total length is 60 and 126)

SYN pkt id = 45176
HTTP pkt id = 45178

Hash function and parameter used:
hash(retransmitted_register_index, HashAlgorithm.crc16,
(bit<32>)0,{hdr.ipv4.srcAddr,
hdr.tcp.srcPort,hdr.ipv4.dstAddr,hdr.tcp.dstPort,hdr.ipv4.totalLen},(bit<32>)10);

Regards

Sahil Gupta


P4-dev mailing list -- p4-dev@lists.p4.org
To unsubscribe send an email to p4-dev-leave@lists.p4.org
%(web_page_url)slistinfo%(cgiext)s/%(_internal_name)s

If you have some data plane algorithm where hash collisions mean that your algorithm does not work at all, then perhaps you should be using a P4 table with exact match keys instead of hashing. Every hash function from an input space with W-bit wide keys down to a 16-bit output or 32-bit output is going to have collisions, unless you have some special situation that you have guaranteed in advance that the W-bit wide keys are restricted to a special subset of them, such that the special subset plus the hash function you have chosen means they cannot collide. If you don't have any such restrictions on the input values, collisions can and will happen. You should plan for them in whatever algorithm you are using. Andy On Sat, Feb 20, 2021 at 8:12 AM Sahil Gupta <sg5414@rit.edu> wrote: > Thanks hemant. > I will look into it. > > > > On Sat, Feb 20, 2021, 10:05 AM <hemant@mnkcg.com> wrote: > >> Clearly, the ipv4 packet length doesn’t suffice if all other fields of >> the tuple are identical. Use HTTP and UDP and swap IP length with protocol >> in tuple and see if you get a collision. >> >> >> >> Even the Varghese paper has to use a near-perfect hash. A good source >> for perfect hash with test code is here: >> >> >> >> https://burtleburtle.net/bob/hash/perfect.html >> >> >> >> You should play with such code to first make sure what collides or not. >> Then, I would get back to P4. >> >> >> >> Hemant >> >> >> >> *From:* Sahil Gupta <sg5414@rit.edu> >> *Sent:* Saturday, February 20, 2021 9:31 AM >> *To:* hemant@mnkcg.com >> *Cc:* Eder Ollora Zaballa <eoza@fotonik.dtu.dk>; p4-dev < >> p4-dev@lists.p4.org> >> *Subject:* Re: [P4-dev] Re: Same hash value for different parameters. >> >> >> >> I am using following values to create hash. >> >> Source IP, source port, destination IP, destination port, and ipv4 total >> length. >> >> >> >> Since ipv4 total length field is different for SYN and HTTP packet, I am >> expecting different hashes. >> >> >> >> I will look into your papers as well for more theory on this. >> >> >> >> >> >> Thanks >> >> Sahil Gupta >> >> >> >> >> >> On Sat, Feb 20, 2021, 9:27 AM <hemant@mnkcg.com> wrote: >> >> What is your flow tuple? >> >> >> >> A common data plane classification tuple is the 5-tuple (source IP, >> destination IP, source layer-4 port, destination layer-4 port, and >> protocol). HTTP uses TCP, so both HTTP and TCP have the same IP protocol. >> If the traffic for HTTP and TCP uses identical source and destination IP >> and layer-4 ports, why would hashed value for HTTP and TCP value be >> different? >> >> >> >> The issue is not a P4 issue. Any tuple hash in C would have the same >> problem. >> >> >> >> A good implementations of tuple hash in C is OVS. >> >> >> >> >> https://github.com/openvswitch/ovs/blob/79349cbab0b2a755140eedb91833ad2760520a83/lib/flow.c#L2221 >> >> >> >> The theory behind tuple hashes is described in this paper. >> >> >> >> https://tinyurl.com/yrs65622 >> >> >> >> Hemant >> >> >> >> *From:* Eder Ollora Zaballa via P4-dev <p4-dev@lists.p4.org> >> *Sent:* Friday, February 19, 2021 1:05 PM >> *To:* p4-dev <p4-dev@lists.p4.org>; sg5414@rit.edu >> *Subject:* [P4-dev] Re: Same hash value for different parameters. >> >> >> >> Hi, >> >> >> >> I could be wrong but I think CRC16 follows the birth paradox in regard to >> the collision probability. I did a fast calculation and the collision for >> something like CRC16 with only 2 different input values is very low. The >> problem is that when you have multiple different values, that is, multiple >> combinations of different tuples, then the probability is fairly high. With >> only 300 different combinations, you stand a collision chance of 50%. You >> can see it below. >> >> >> >> >> >> I am not aware of many ways to keep a unique identification (per flow) >> and done in the data plane. As you said, you use the 16 bit value as an >> index of a register. Values of 32 bits have better collision probabilities, >> but still, might either be difficult to use it as a register index or some >> cases might still not stand a single collision. >> >> >> >> I could imagine, though, sending a digest to the controller when a first >> packet arrives and let the controller decide an ID (if your idea can stand >> this case). Then use this ID at the data plane, maybe as parameter of a >> table when the tuple matches. If involving the controller is fast enough it >> might work for you. I think some more experienced P4 developers might have >> better ideas and might give you a better answer. 🙂 >> >> >> >> Regards, >> ------------------------------ >> >> *From:* Sahil Gupta <sg5414@rit.edu> >> *Sent:* 19 February 2021 09:16 >> *To:* p4-dev <p4-dev@lists.p4.org> >> *Subject:* [P4-dev] Same hash value for different parameters. >> >> >> >> Hi all, >> >> I was storing values in a register using the CRC16 hash function of flow >> tuples and ipv4 total length field. To my surprise SYN packet and HTTP GET >> request are having the same hash values. >> >> I want the retransmitted packet not to be processed by my P4 program >> logic as the first copy of the packet already processed by it. >> >> But since, hash of SYN and HTTP GET packet fields are the same for some >> connections, my logic to classify original and retransmitted copy is >> failing. >> >> Any suggestions how I can avoid this hash collision? >> >> >> >> >> Attaching my customized pkt lifecycle log file generated from log_msg >> function, s1.log and pcap for reference. >> >> Case 1: (Hash is same) >> >> *retransmitted_register_index*: 7 for both SYN and HTTP GET request >> though ipv4 total length is 60 and 126 respectively. >> >> SYN pkd id = 56199 >> >> HTTP pkt id = 56201 >> >> Case 2: (Hash is different) >> >> *retransmitted_register_index*: 6 and 0 for SYN and HTTP GET request. >> (ipv4 total length is 60 and 126) >> >> SYN pkt id = 45176 >> HTTP pkt id = 45178 >> >> >> *Hash function and parameter used*: >> hash(retransmitted_register_index, HashAlgorithm.crc16, >> (bit<32>)0,{hdr.ipv4.srcAddr, >> hdr.tcp.srcPort,hdr.ipv4.dstAddr,hdr.tcp.dstPort,hdr.ipv4.totalLen},(bit<32>)10); >> >> Regards >> >> Sahil Gupta >> >> _______________________________________________ > P4-dev mailing list -- p4-dev@lists.p4.org > To unsubscribe send an email to p4-dev-leave@lists.p4.org > %(web_page_url)slistinfo%(cgiext)s/%(_internal_name)s
H
hemant@mnkcg.com
Sun, Feb 21, 2021 9:33 PM

Exact match is tricky with Sahil’s traffic.  His traffic is HTTP at layer-7 and TCP at layer-4.  If an implementation sticks to using layer-4 or lower layers (3 (IP) and laye-2 (MAC)),  HTTP and TCP have same IP protocol.  If HTTP and TCP have identical source/destination IP and TCP ports, a 5-tuple fails to separate these two kind of traffic.  So, looks like Sahil decided to swap protocol with packet length, but the 4-tuple has identical bits and the hash is unable to differentiate two different packet lengths and causes collisions. I don’t even know what is one trying to do to separate HTTP and TCP traffic especially when today,  HTTP traffic uses HTTPS which encrypts HTTP traffic.

Hemant

From: Andy Fingerhut andy.fingerhut@gmail.com
Sent: Sunday, February 21, 2021 1:06 PM
To: Sahil Gupta sg5414@rit.edu
Cc: hemant@mnkcg.com; p4-dev p4-dev@lists.p4.org
Subject: Re: [P4-dev] Re: Same hash value for different parameters.

If you have some data plane algorithm where hash collisions mean that your algorithm does not work at all, then perhaps you should be using a P4 table with exact match keys instead of hashing.

Every hash function from an input space with W-bit wide keys down to a 16-bit output or 32-bit output is going to have collisions, unless you have some special situation that you have guaranteed in advance that the W-bit wide keys are restricted to a special subset of them, such that the special subset plus the hash function you have chosen means they cannot collide.  If you don't have any such restrictions on the input values, collisions can and will happen.  You should plan for them in whatever algorithm you are using.

Andy

On Sat, Feb 20, 2021 at 8:12 AM Sahil Gupta <sg5414@rit.edu mailto:sg5414@rit.edu > wrote:

Thanks hemant.

I will look into it.

On Sat, Feb 20, 2021, 10:05 AM <hemant@mnkcg.com mailto:hemant@mnkcg.com > wrote:

Clearly, the ipv4 packet length doesn’t suffice if all other fields of the tuple are identical.  Use HTTP and UDP and swap IP length with protocol in tuple and see if you get a collision.

Even the Varghese paper has to use a near-perfect hash.  A good source for perfect hash with test code is here:

https://burtleburtle.net/bob/hash/perfect.html

You should play with such code to first make sure what collides or not.  Then, I would get back to P4.

Hemant

From: Sahil Gupta <sg5414@rit.edu mailto:sg5414@rit.edu >
Sent: Saturday, February 20, 2021 9:31 AM
To: hemant@mnkcg.com mailto:hemant@mnkcg.com
Cc: Eder Ollora Zaballa <eoza@fotonik.dtu.dk mailto:eoza@fotonik.dtu.dk >; p4-dev <p4-dev@lists.p4.org mailto:p4-dev@lists.p4.org >
Subject: Re: [P4-dev] Re: Same hash value for different parameters.

I am using following values to create hash.

Source IP, source port, destination IP, destination port, and ipv4 total length.

Since ipv4 total length field is different for SYN and HTTP packet, I am expecting different hashes.

I will look into your papers as well for more theory on this.

Thanks

Sahil Gupta

On Sat, Feb 20, 2021, 9:27 AM <hemant@mnkcg.com mailto:hemant@mnkcg.com > wrote:

What is your flow tuple?

A common data plane classification tuple is the 5-tuple (source IP, destination IP, source layer-4 port, destination layer-4 port, and protocol).  HTTP uses TCP, so both HTTP and TCP have the same IP protocol.  If the traffic for HTTP and TCP uses identical source and destination IP and layer-4 ports, why would hashed value for HTTP and TCP value be different?

The issue is not a P4 issue.  Any tuple hash in C would have the same problem.

A good implementations of tuple hash in C is OVS.

https://github.com/openvswitch/ovs/blob/79349cbab0b2a755140eedb91833ad2760520a83/lib/flow.c#L2221

The theory behind tuple hashes is described in this paper.

https://tinyurl.com/yrs65622

Hemant

From: Eder Ollora Zaballa via P4-dev <p4-dev@lists.p4.org mailto:p4-dev@lists.p4.org >
Sent: Friday, February 19, 2021 1:05 PM
To: p4-dev <p4-dev@lists.p4.org mailto:p4-dev@lists.p4.org >; sg5414@rit.edu mailto:sg5414@rit.edu
Subject: [P4-dev] Re: Same hash value for different parameters.

Hi,

I could be wrong but I think CRC16 follows the birth paradox in regard to the collision probability. I did a fast calculation and the collision for something like CRC16 with only 2 different input values is very low. The problem is that when you have multiple different values, that is, multiple combinations of different tuples, then the probability is fairly high. With only 300 different combinations, you stand a collision chance of 50%. You can see it below.

I am not aware of many ways to keep a unique identification (per flow) and done in the data plane. As you said, you use the 16 bit value as an index of a register. Values of 32 bits have better collision probabilities, but still, might either be difficult to use it as a register index or some cases might still not stand a single collision.

I could imagine, though, sending a digest to the controller when a first packet arrives and let the controller decide an ID (if your idea can stand this case). Then use this ID at the data plane, maybe as parameter of a table when the tuple matches. If involving the controller is fast enough it might work for you. I think some more experienced P4 developers might have better ideas and might give you a better answer.  🙂

Regards,


From: Sahil Gupta <sg5414@rit.edu mailto:sg5414@rit.edu >
Sent: 19 February 2021 09:16
To: p4-dev <p4-dev@lists.p4.org mailto:p4-dev@lists.p4.org >
Subject: [P4-dev] Same hash value for different parameters.

Hi all,

I was storing values in a register using the CRC16 hash function of flow tuples and ipv4 total length field. To my surprise SYN packet and HTTP GET request are having the same hash values.

I want the retransmitted packet not to be processed by my P4 program logic as the first copy of the packet already processed by it.

But since, hash of SYN and HTTP GET packet fields are the same for some connections, my logic to classify original and retransmitted copy is failing.

Any suggestions how I can avoid this hash collision?

Attaching my customized pkt lifecycle log file generated from log_msg function, s1.log and pcap for reference.

Case 1: (Hash is same)

retransmitted_register_index: 7 for both SYN and HTTP GET request though ipv4 total length is 60 and 126 respectively.

SYN pkd id = 56199

HTTP pkt id = 56201

Case 2: (Hash is different)

retransmitted_register_index: 6 and 0 for SYN and HTTP GET request. (ipv4 total length is 60 and 126)

SYN pkt id = 45176
HTTP pkt id = 45178

Hash function and parameter used:
hash(retransmitted_register_index, HashAlgorithm.crc16, (bit<32>)0,{hdr.ipv4.srcAddr, hdr.tcp.srcPort,hdr.ipv4.dstAddr,hdr.tcp.dstPort,hdr.ipv4.totalLen},(bit<32>)10);

Regards

Sahil Gupta


P4-dev mailing list -- p4-dev@lists.p4.org mailto:p4-dev@lists.p4.org
To unsubscribe send an email to p4-dev-leave@lists.p4.org mailto:p4-dev-leave@lists.p4.org
%(web_page_url)slistinfo%(cgiext)s/%(_internal_name)s

Exact match is tricky with Sahil’s traffic. His traffic is HTTP at layer-7 and TCP at layer-4. If an implementation sticks to using layer-4 or lower layers (3 (IP) and laye-2 (MAC)), HTTP and TCP have same IP protocol. If HTTP and TCP have identical source/destination IP and TCP ports, a 5-tuple fails to separate these two kind of traffic. So, looks like Sahil decided to swap protocol with packet length, but the 4-tuple has identical bits and the hash is unable to differentiate two different packet lengths and causes collisions. I don’t even know what is one trying to do to separate HTTP and TCP traffic especially when today, HTTP traffic uses HTTPS which encrypts HTTP traffic. Hemant From: Andy Fingerhut <andy.fingerhut@gmail.com> Sent: Sunday, February 21, 2021 1:06 PM To: Sahil Gupta <sg5414@rit.edu> Cc: hemant@mnkcg.com; p4-dev <p4-dev@lists.p4.org> Subject: Re: [P4-dev] Re: Same hash value for different parameters. If you have some data plane algorithm where hash collisions mean that your algorithm does not work at all, then perhaps you should be using a P4 table with exact match keys instead of hashing. Every hash function from an input space with W-bit wide keys down to a 16-bit output or 32-bit output is going to have collisions, unless you have some special situation that you have guaranteed in advance that the W-bit wide keys are restricted to a special subset of them, such that the special subset plus the hash function you have chosen means they cannot collide. If you don't have any such restrictions on the input values, collisions can and will happen. You should plan for them in whatever algorithm you are using. Andy On Sat, Feb 20, 2021 at 8:12 AM Sahil Gupta <sg5414@rit.edu <mailto:sg5414@rit.edu> > wrote: Thanks hemant. I will look into it. On Sat, Feb 20, 2021, 10:05 AM <hemant@mnkcg.com <mailto:hemant@mnkcg.com> > wrote: Clearly, the ipv4 packet length doesn’t suffice if all other fields of the tuple are identical. Use HTTP and UDP and swap IP length with protocol in tuple and see if you get a collision. Even the Varghese paper has to use a near-perfect hash. A good source for perfect hash with test code is here: https://burtleburtle.net/bob/hash/perfect.html You should play with such code to first make sure what collides or not. Then, I would get back to P4. Hemant From: Sahil Gupta <sg5414@rit.edu <mailto:sg5414@rit.edu> > Sent: Saturday, February 20, 2021 9:31 AM To: hemant@mnkcg.com <mailto:hemant@mnkcg.com> Cc: Eder Ollora Zaballa <eoza@fotonik.dtu.dk <mailto:eoza@fotonik.dtu.dk> >; p4-dev <p4-dev@lists.p4.org <mailto:p4-dev@lists.p4.org> > Subject: Re: [P4-dev] Re: Same hash value for different parameters. I am using following values to create hash. Source IP, source port, destination IP, destination port, and ipv4 total length. Since ipv4 total length field is different for SYN and HTTP packet, I am expecting different hashes. I will look into your papers as well for more theory on this. Thanks Sahil Gupta On Sat, Feb 20, 2021, 9:27 AM <hemant@mnkcg.com <mailto:hemant@mnkcg.com> > wrote: What is your flow tuple? A common data plane classification tuple is the 5-tuple (source IP, destination IP, source layer-4 port, destination layer-4 port, and protocol). HTTP uses TCP, so both HTTP and TCP have the same IP protocol. If the traffic for HTTP and TCP uses identical source and destination IP and layer-4 ports, why would hashed value for HTTP and TCP value be different? The issue is not a P4 issue. Any tuple hash in C would have the same problem. A good implementations of tuple hash in C is OVS. https://github.com/openvswitch/ovs/blob/79349cbab0b2a755140eedb91833ad2760520a83/lib/flow.c#L2221 The theory behind tuple hashes is described in this paper. https://tinyurl.com/yrs65622 Hemant From: Eder Ollora Zaballa via P4-dev <p4-dev@lists.p4.org <mailto:p4-dev@lists.p4.org> > Sent: Friday, February 19, 2021 1:05 PM To: p4-dev <p4-dev@lists.p4.org <mailto:p4-dev@lists.p4.org> >; sg5414@rit.edu <mailto:sg5414@rit.edu> Subject: [P4-dev] Re: Same hash value for different parameters. Hi, I could be wrong but I think CRC16 follows the birth paradox in regard to the collision probability. I did a fast calculation and the collision for something like CRC16 with only 2 different input values is very low. The problem is that when you have multiple different values, that is, multiple combinations of different tuples, then the probability is fairly high. With only 300 different combinations, you stand a collision chance of 50%. You can see it below. I am not aware of many ways to keep a unique identification (per flow) and done in the data plane. As you said, you use the 16 bit value as an index of a register. Values of 32 bits have better collision probabilities, but still, might either be difficult to use it as a register index or some cases might still not stand a single collision. I could imagine, though, sending a digest to the controller when a first packet arrives and let the controller decide an ID (if your idea can stand this case). Then use this ID at the data plane, maybe as parameter of a table when the tuple matches. If involving the controller is fast enough it might work for you. I think some more experienced P4 developers might have better ideas and might give you a better answer. 🙂 Regards, _____ From: Sahil Gupta <sg5414@rit.edu <mailto:sg5414@rit.edu> > Sent: 19 February 2021 09:16 To: p4-dev <p4-dev@lists.p4.org <mailto:p4-dev@lists.p4.org> > Subject: [P4-dev] Same hash value for different parameters. Hi all, I was storing values in a register using the CRC16 hash function of flow tuples and ipv4 total length field. To my surprise SYN packet and HTTP GET request are having the same hash values. I want the retransmitted packet not to be processed by my P4 program logic as the first copy of the packet already processed by it. But since, hash of SYN and HTTP GET packet fields are the same for some connections, my logic to classify original and retransmitted copy is failing. Any suggestions how I can avoid this hash collision? Attaching my customized pkt lifecycle log file generated from log_msg function, s1.log and pcap for reference. Case 1: (Hash is same) retransmitted_register_index: 7 for both SYN and HTTP GET request though ipv4 total length is 60 and 126 respectively. SYN pkd id = 56199 HTTP pkt id = 56201 Case 2: (Hash is different) retransmitted_register_index: 6 and 0 for SYN and HTTP GET request. (ipv4 total length is 60 and 126) SYN pkt id = 45176 HTTP pkt id = 45178 Hash function and parameter used: hash(retransmitted_register_index, HashAlgorithm.crc16, (bit<32>)0,{hdr.ipv4.srcAddr, hdr.tcp.srcPort,hdr.ipv4.dstAddr,hdr.tcp.dstPort,hdr.ipv4.totalLen},(bit<32>)10); Regards Sahil Gupta _______________________________________________ P4-dev mailing list -- p4-dev@lists.p4.org <mailto:p4-dev@lists.p4.org> To unsubscribe send an email to p4-dev-leave@lists.p4.org <mailto:p4-dev-leave@lists.p4.org> %(web_page_url)slistinfo%(cgiext)s/%(_internal_name)s
AF
Andy Fingerhut
Sun, Feb 21, 2021 10:22 PM

If Sahil is willing to publish the problem he is trying to solve, perhaps
that would be a better starting point for discussing reasonable solutions
using P4 mechanisms, versus starting with "hashing is not doing what I
hope".  I am not clear on the actual application Sahil is trying to develop.

Andy

On Sun, Feb 21, 2021 at 1:33 PM hemant@mnkcg.com wrote:

Exact match is tricky with Sahil’s traffic.  His traffic is HTTP at
layer-7 and TCP at layer-4.  If an implementation sticks to using layer-4
or lower layers (3 (IP) and laye-2 (MAC)),  HTTP and TCP have same IP
protocol.  If HTTP and TCP have identical source/destination IP and TCP
ports, a 5-tuple fails to separate these two kind of traffic.  So, looks
like Sahil decided to swap protocol with packet length, but the 4-tuple has
identical bits and the hash is unable to differentiate two different packet
lengths and causes collisions. I don’t even know what is one trying to do
to separate HTTP and TCP traffic especially when today,  HTTP traffic uses
HTTPS which encrypts HTTP traffic.

Hemant

From: Andy Fingerhut andy.fingerhut@gmail.com
Sent: Sunday, February 21, 2021 1:06 PM
To: Sahil Gupta sg5414@rit.edu
Cc: hemant@mnkcg.com; p4-dev p4-dev@lists.p4.org
Subject: Re: [P4-dev] Re: Same hash value for different parameters.

If you have some data plane algorithm where hash collisions mean that your
algorithm does not work at all, then perhaps you should be using a P4 table
with exact match keys instead of hashing.

Every hash function from an input space with W-bit wide keys down to a
16-bit output or 32-bit output is going to have collisions, unless you have
some special situation that you have guaranteed in advance that the W-bit
wide keys are restricted to a special subset of them, such that the special
subset plus the hash function you have chosen means they cannot collide.
If you don't have any such restrictions on the input values, collisions can
and will happen.  You should plan for them in whatever algorithm you are
using.

Andy

On Sat, Feb 20, 2021 at 8:12 AM Sahil Gupta sg5414@rit.edu wrote:

Thanks hemant.

I will look into it.

On Sat, Feb 20, 2021, 10:05 AM hemant@mnkcg.com wrote:

Clearly, the ipv4 packet length doesn’t suffice if all other fields of the
tuple are identical.  Use HTTP and UDP and swap IP length with protocol in
tuple and see if you get a collision.

Even the Varghese paper has to use a near-perfect hash.  A good source for
perfect hash with test code is here:

https://burtleburtle.net/bob/hash/perfect.html

You should play with such code to first make sure what collides or not.
Then, I would get back to P4.

Hemant

From: Sahil Gupta sg5414@rit.edu
Sent: Saturday, February 20, 2021 9:31 AM
To: hemant@mnkcg.com
Cc: Eder Ollora Zaballa eoza@fotonik.dtu.dk; p4-dev <
p4-dev@lists.p4.org>
Subject: Re: [P4-dev] Re: Same hash value for different parameters.

I am using following values to create hash.

Source IP, source port, destination IP, destination port, and ipv4 total
length.

Since ipv4 total length field is different for SYN and HTTP packet, I am
expecting different hashes.

I will look into your papers as well for more theory on this.

Thanks

Sahil Gupta

On Sat, Feb 20, 2021, 9:27 AM hemant@mnkcg.com wrote:

What is your flow tuple?

A common data plane classification tuple is the 5-tuple (source IP,
destination IP, source layer-4 port, destination layer-4 port, and
protocol).  HTTP uses TCP, so both HTTP and TCP have the same IP protocol.
If the traffic for HTTP and TCP uses identical source and destination IP
and layer-4 ports, why would hashed value for HTTP and TCP value be
different?

The issue is not a P4 issue.  Any tuple hash in C would have the same
problem.

A good implementations of tuple hash in C is OVS.

https://github.com/openvswitch/ovs/blob/79349cbab0b2a755140eedb91833ad2760520a83/lib/flow.c#L2221

The theory behind tuple hashes is described in this paper.

https://tinyurl.com/yrs65622

Hemant

From: Eder Ollora Zaballa via P4-dev p4-dev@lists.p4.org
Sent: Friday, February 19, 2021 1:05 PM
To: p4-dev p4-dev@lists.p4.org; sg5414@rit.edu
Subject: [P4-dev] Re: Same hash value for different parameters.

Hi,

I could be wrong but I think CRC16 follows the birth paradox in regard to
the collision probability. I did a fast calculation and the collision for
something like CRC16 with only 2 different input values is very low. The
problem is that when you have multiple different values, that is, multiple
combinations of different tuples, then the probability is fairly high. With
only 300 different combinations, you stand a collision chance of 50%. You
can see it below.

I am not aware of many ways to keep a unique identification (per flow) and
done in the data plane. As you said, you use the 16 bit value as an index
of a register. Values of 32 bits have better collision probabilities, but
still, might either be difficult to use it as a register index or some
cases might still not stand a single collision.

I could imagine, though, sending a digest to the controller when a first
packet arrives and let the controller decide an ID (if your idea can stand
this case). Then use this ID at the data plane, maybe as parameter of a
table when the tuple matches. If involving the controller is fast enough it
might work for you. I think some more experienced P4 developers might have
better ideas and might give you a better answer.  🙂

Regards,

From: Sahil Gupta sg5414@rit.edu
Sent: 19 February 2021 09:16
To: p4-dev p4-dev@lists.p4.org
Subject: [P4-dev] Same hash value for different parameters.

Hi all,

I was storing values in a register using the CRC16 hash function of flow
tuples and ipv4 total length field. To my surprise SYN packet and HTTP GET
request are having the same hash values.

I want the retransmitted packet not to be processed by my P4 program logic
as the first copy of the packet already processed by it.

But since, hash of SYN and HTTP GET packet fields are the same for some
connections, my logic to classify original and retransmitted copy is
failing.

Any suggestions how I can avoid this hash collision?

Attaching my customized pkt lifecycle log file generated from log_msg
function, s1.log and pcap for reference.

Case 1: (Hash is same)

retransmitted_register_index: 7 for both SYN and HTTP GET request
though ipv4 total length is 60 and 126 respectively.

SYN pkd id = 56199

HTTP pkt id = 56201

Case 2: (Hash is different)

retransmitted_register_index: 6 and 0 for SYN and HTTP GET request.
(ipv4 total length is 60 and 126)

SYN pkt id = 45176
HTTP pkt id = 45178

Hash function and parameter used:
hash(retransmitted_register_index, HashAlgorithm.crc16,
(bit<32>)0,{hdr.ipv4.srcAddr,
hdr.tcp.srcPort,hdr.ipv4.dstAddr,hdr.tcp.dstPort,hdr.ipv4.totalLen},(bit<32>)10);

Regards

Sahil Gupta


P4-dev mailing list -- p4-dev@lists.p4.org
To unsubscribe send an email to p4-dev-leave@lists.p4.org
%(web_page_url)slistinfo%(cgiext)s/%(_internal_name)s

If Sahil is willing to publish the problem he is trying to solve, perhaps that would be a better starting point for discussing reasonable solutions using P4 mechanisms, versus starting with "hashing is not doing what I hope". I am not clear on the actual application Sahil is trying to develop. Andy On Sun, Feb 21, 2021 at 1:33 PM <hemant@mnkcg.com> wrote: > Exact match is tricky with Sahil’s traffic. His traffic is HTTP at > layer-7 and TCP at layer-4. If an implementation sticks to using layer-4 > or lower layers (3 (IP) and laye-2 (MAC)), HTTP and TCP have same IP > protocol. If HTTP and TCP have identical source/destination IP and TCP > ports, a 5-tuple fails to separate these two kind of traffic. So, looks > like Sahil decided to swap protocol with packet length, but the 4-tuple has > identical bits and the hash is unable to differentiate two different packet > lengths and causes collisions. I don’t even know what is one trying to do > to separate HTTP and TCP traffic especially when today, HTTP traffic uses > HTTPS which encrypts HTTP traffic. > > > > Hemant > > > > *From:* Andy Fingerhut <andy.fingerhut@gmail.com> > *Sent:* Sunday, February 21, 2021 1:06 PM > *To:* Sahil Gupta <sg5414@rit.edu> > *Cc:* hemant@mnkcg.com; p4-dev <p4-dev@lists.p4.org> > *Subject:* Re: [P4-dev] Re: Same hash value for different parameters. > > > > If you have some data plane algorithm where hash collisions mean that your > algorithm does not work at all, then perhaps you should be using a P4 table > with exact match keys instead of hashing. > > > > Every hash function from an input space with W-bit wide keys down to a > 16-bit output or 32-bit output is going to have collisions, unless you have > some special situation that you have guaranteed in advance that the W-bit > wide keys are restricted to a special subset of them, such that the special > subset plus the hash function you have chosen means they cannot collide. > If you don't have any such restrictions on the input values, collisions can > and will happen. You should plan for them in whatever algorithm you are > using. > > > > Andy > > > > > > On Sat, Feb 20, 2021 at 8:12 AM Sahil Gupta <sg5414@rit.edu> wrote: > > Thanks hemant. > > I will look into it. > > > > > > > > On Sat, Feb 20, 2021, 10:05 AM <hemant@mnkcg.com> wrote: > > Clearly, the ipv4 packet length doesn’t suffice if all other fields of the > tuple are identical. Use HTTP and UDP and swap IP length with protocol in > tuple and see if you get a collision. > > > > Even the Varghese paper has to use a near-perfect hash. A good source for > perfect hash with test code is here: > > > > https://burtleburtle.net/bob/hash/perfect.html > > > > You should play with such code to first make sure what collides or not. > Then, I would get back to P4. > > > > Hemant > > > > *From:* Sahil Gupta <sg5414@rit.edu> > *Sent:* Saturday, February 20, 2021 9:31 AM > *To:* hemant@mnkcg.com > *Cc:* Eder Ollora Zaballa <eoza@fotonik.dtu.dk>; p4-dev < > p4-dev@lists.p4.org> > *Subject:* Re: [P4-dev] Re: Same hash value for different parameters. > > > > I am using following values to create hash. > > Source IP, source port, destination IP, destination port, and ipv4 total > length. > > > > Since ipv4 total length field is different for SYN and HTTP packet, I am > expecting different hashes. > > > > I will look into your papers as well for more theory on this. > > > > > > Thanks > > Sahil Gupta > > > > > > On Sat, Feb 20, 2021, 9:27 AM <hemant@mnkcg.com> wrote: > > What is your flow tuple? > > > > A common data plane classification tuple is the 5-tuple (source IP, > destination IP, source layer-4 port, destination layer-4 port, and > protocol). HTTP uses TCP, so both HTTP and TCP have the same IP protocol. > If the traffic for HTTP and TCP uses identical source and destination IP > and layer-4 ports, why would hashed value for HTTP and TCP value be > different? > > > > The issue is not a P4 issue. Any tuple hash in C would have the same > problem. > > > > A good implementations of tuple hash in C is OVS. > > > > > https://github.com/openvswitch/ovs/blob/79349cbab0b2a755140eedb91833ad2760520a83/lib/flow.c#L2221 > > > > The theory behind tuple hashes is described in this paper. > > > > https://tinyurl.com/yrs65622 > > > > Hemant > > > > *From:* Eder Ollora Zaballa via P4-dev <p4-dev@lists.p4.org> > *Sent:* Friday, February 19, 2021 1:05 PM > *To:* p4-dev <p4-dev@lists.p4.org>; sg5414@rit.edu > *Subject:* [P4-dev] Re: Same hash value for different parameters. > > > > Hi, > > > > I could be wrong but I think CRC16 follows the birth paradox in regard to > the collision probability. I did a fast calculation and the collision for > something like CRC16 with only 2 different input values is very low. The > problem is that when you have multiple different values, that is, multiple > combinations of different tuples, then the probability is fairly high. With > only 300 different combinations, you stand a collision chance of 50%. You > can see it below. > > > > > > I am not aware of many ways to keep a unique identification (per flow) and > done in the data plane. As you said, you use the 16 bit value as an index > of a register. Values of 32 bits have better collision probabilities, but > still, might either be difficult to use it as a register index or some > cases might still not stand a single collision. > > > > I could imagine, though, sending a digest to the controller when a first > packet arrives and let the controller decide an ID (if your idea can stand > this case). Then use this ID at the data plane, maybe as parameter of a > table when the tuple matches. If involving the controller is fast enough it > might work for you. I think some more experienced P4 developers might have > better ideas and might give you a better answer. 🙂 > > > > Regards, > ------------------------------ > > *From:* Sahil Gupta <sg5414@rit.edu> > *Sent:* 19 February 2021 09:16 > *To:* p4-dev <p4-dev@lists.p4.org> > *Subject:* [P4-dev] Same hash value for different parameters. > > > > Hi all, > > I was storing values in a register using the CRC16 hash function of flow > tuples and ipv4 total length field. To my surprise SYN packet and HTTP GET > request are having the same hash values. > > I want the retransmitted packet not to be processed by my P4 program logic > as the first copy of the packet already processed by it. > > But since, hash of SYN and HTTP GET packet fields are the same for some > connections, my logic to classify original and retransmitted copy is > failing. > > Any suggestions how I can avoid this hash collision? > > > > > Attaching my customized pkt lifecycle log file generated from log_msg > function, s1.log and pcap for reference. > > Case 1: (Hash is same) > > *retransmitted_register_index*: 7 for both SYN and HTTP GET request > though ipv4 total length is 60 and 126 respectively. > > SYN pkd id = 56199 > > HTTP pkt id = 56201 > > Case 2: (Hash is different) > > *retransmitted_register_index*: 6 and 0 for SYN and HTTP GET request. > (ipv4 total length is 60 and 126) > > SYN pkt id = 45176 > HTTP pkt id = 45178 > > > *Hash function and parameter used*: > hash(retransmitted_register_index, HashAlgorithm.crc16, > (bit<32>)0,{hdr.ipv4.srcAddr, > hdr.tcp.srcPort,hdr.ipv4.dstAddr,hdr.tcp.dstPort,hdr.ipv4.totalLen},(bit<32>)10); > > Regards > > Sahil Gupta > > _______________________________________________ > P4-dev mailing list -- p4-dev@lists.p4.org > To unsubscribe send an email to p4-dev-leave@lists.p4.org > %(web_page_url)slistinfo%(cgiext)s/%(_internal_name)s > >