[P4-design] Maybe too much text describing issues around P4 table and parser value set sizes

Andy Fingerhut (jafinger) jafinger at cisco.com
Sun Mar 25 22:34:23 EDT 2018


Vladimir, do you have a specific recommendation on what you think the language spec should say about the ‘size’ table attribute?  That is what I was going for in part of my most recent message, and I don’t think it was trying to tie the hands of implementers much at all.

Andy

From: Vladimir Gurevich <vag at barefootnetworks.com>
Date: Sunday, March 25, 2018 at 7:31 PM
To: "Andy Fingerhut (jafinger)" <jafinger at cisco.com>
Cc: Mihai Budiu <mbudiu at vmware.com>, p4-design <p4-design at lists.p4.org>
Subject: Re: [P4-design] Maybe too much text describing issues around P4 table and parser value set sizes

Hello,

I think we need to re-state the basic goals of our activities. Specifically

  1.  Are we trying to define a super-formal language that will allow one to create not only super-portable programs, but even super-portable APIs that control those programs to the extent that you can swap targets (ranging from high-speed ASICs to fully SW implementations and anything in-between) as easily as one swaps one pair of gloves for another or
  2.  Are we trying to define a practical language that would allow people to easily create data plane programs on target device of their choice and generate a corresponding control plane API?
I think that goal #2 is quite achievable, but the goal #1 is, unfortunately, not. Moreover, trying to get closer to goal #1 will probably take us further away from being able to define a practical, usable language that gives data plane designers the ability to get things done.

I totally agree that it is impossible to formally define what "size" is. The question is: should that preclude us from moving forward and having this in the language? Even in Euclidian geometry there are terms that are not defined, e.g a point and a straight line. That's ok. Having "size" and, perhaps, other similar attributes in the language makes programs generally more readable and understandable. In fact, it even makes them more target-independent and more suitable for optimization, because it would not require people to use the externs, requiring a very specific implementation (as you suggested).

I can certainly see the value of implementation externs, especially if they would allow us to use fewer annotations, but unfortunately they will not, because any real implementation(e.g. for hash-tables) will require you to specify anywhere from one to dozens of parameters per table, most of of them optional -- something externs are not very good at (not to mention that it would be nice if they were named).

I urge everyone to keep P4 a practical language and sometimes that means that we might need to have things there that we might not be able to formally define. That's OK.

Thanks,
Vladiimir


On Sun, Mar 25, 2018 at 5:55 PM, Andy Fingerhut (jafinger) <jafinger at cisco.com<mailto:jafinger at cisco.com>> wrote:
If ‘size’ were an integer that meant, ‘the compiler will implement this table with something that can in some cases install up to N entries, but how likely being able to reach N entries is implementation- and table-specific’, then the value of N could be used as the “raw capacity” of a hash table, for example.

This might be what Vladimir means when he says “up to N”.  It seems a bit odd to me for an implementation to use, say, a size of 16384 for a hash table if then takes that value and uses it as the raw theoretical maximum capacity of a hash table, where in practice that could mean that adding a set of 12,000 keys causes the 12,000th one to fail to add because of too many collisions with earlier added entries.  75% to 95% typical usable capacity from a medium to large hash table is about the best I’ve seen (where the actual fraction depends on the raw size, how many hash functions, overflow TCAM size used, if any, etc.).

I think it would be reasonable to define ‘size’ to take an integer type value in the language spec, and leave the meaning of how that value is used up to the implementation, with just a general guideline of “it is recommended if you can make a table / parser value set that guarantees that many exact, lpm, and/or ternary entries can be added, but an implementation is allowed to use implementations that can fail to add entries at lower number of entries than the given size’.

Andy

From: Mihai Budiu <mbudiu at vmware.com<mailto:mbudiu at vmware.com>>
Date: Sunday, March 25, 2018 at 5:17 PM
To: "Andy Fingerhut (jafinger)" <jafinger at cisco.com<mailto:jafinger at cisco.com>>, Vladimir Gurevich <vag at barefootnetworks.com<mailto:vag at barefootnetworks.com>>
Cc: p4-design <p4-design at lists.p4.org<mailto:p4-design at lists.p4.org>>
Subject: RE: [P4-design] Maybe too much text describing issues around P4 table and parser value set sizes

The conclusion is that it is impossible to give a simple description of the meaning of “size”.

There are two possibilities going forward:

  *   We specify in the spec that the tables and value_sets can have a size integer parameter, but the meaning is target-dependent
  *   We do not standardize these at all. Tables can still have a size parameter (it is not mandated or forbidden by the spec, as it is today), but each target can choose a different way to specify sizes. An example is given below. For value_sets we have to use an annotation.

For tables I will submit again the example in ebpf_model, which is perfect for this particular target:

/*
Each table must have an implementation property which is either an array_table
or a hash_table.
*/

/**
Implementation property for tables indicating that tables must be implemented
using EBPF array map.
*/
extern array_table {
    /// @param size: maximum number of entries in table
    array_table(bit<32> size);
}

/**
Implementation property for tables indicating that tables must be implemented
using EBPF hash map.
*/
extern hash_table {
    /// @param size: maximum number of entries in table
    hash_table(bit<32> size);
}

table t {
   keys = …
   implementation = hash_table(32);
}

The benefit of this approach is that an architecture can define whatever parameters it deems necessary for describing the implementation. If Vladimir likes max_size, min_size, average_size and other parameters, he can define an extern which has all these as constructor arguments for his architecture.

The downside is that this is certainly target-specific, and thus non-portable.

But it looks from Andy’s description that the portability of making size a standard feature is illusory.

Mihai

From: P4-design [mailto:p4-design-bounces at lists.p4.org<mailto:p4-design-bounces at lists.p4.org>] On Behalf Of Andy Fingerhut (jafinger)
Sent: Saturday, March 24, 2018 5:10 PM
To: Vladimir Gurevich <vag at barefootnetworks.com<mailto:vag at barefootnetworks.com>>
Cc: p4-design <p4-design at lists.p4.org<mailto:p4-design at lists.p4.org>>
Subject: Re: [P4-design] Maybe too much text describing issues around P4 table and parser value set sizes

If by “up to N” you mean “up to N with high probability”, or “up to N for most practical sets of keys you probably want to install”, yes that is what I was trying to convey in the article for hash tables.  I wasn’t trying to say that ATCAMs are any worse than hash tables in that regard (although some variations I have worked with _are_), just that they are no better in them, either.

Andy

From: Vladimir Gurevich <vag at barefootnetworks.com<mailto:vag at barefootnetworks.com>>
Date: Saturday, March 24, 2018 at 4:36 PM
To: "Andy Fingerhut (jafinger)" <jafinger at cisco.com<mailto:jafinger at cisco.com>>
Cc: p4-design <p4-design at lists.p4.org<mailto:p4-design at lists.p4.org>>
Subject: Re: [P4-design] Maybe too much text describing issues around P4 table and parser value set sizes

Hi Andy,

I totally agree that the practical ATCAM capacity from the control plane point of view (e.g. how many routes you can put into a ATCAM with N entries divided into M partitions) is difficult to predict, but "up to N" statement can still be holding some water :) I think that this is an inherent ATCAM property, much like the fact that you can't generally guarantee any specific occupancy for a hashed table (as you clearly outlined in the article).

Thanks,
Vladmir

On Sat, Mar 24, 2018 at 10:40 AM, Andy Fingerhut (jafinger) <jafinger at cisco.com<mailto:jafinger at cisco.com>> wrote:
Thanks for the comments, Vladimir.  I agree with just about every word, and would be happy to incorporate your comments into it, perhaps in another week or so after I get some other things done.

One quick response here.  The article does not say that algorithmic TCAMs are not “fast”.  It says: ‘All techniques I am aware of here either cannot go "fast" as defined above, or have capacity that is dependent upon the contents of the table entries.’.  If you know of an algorithmic TCAM method that is both fast _and_ has capacity independent of the contents of the table entries, then I really want to hear about it.  I’ve spent way too much of my life dealing with the non-deterministic capacity of algorithmic TCAMs, for Cisco platforms where they wish they could claim the capacity was a fixed number N.

Thanks,
Andy

From: Vladimir Gurevich <vag at barefootnetworks.com<mailto:vag at barefootnetworks.com>>
Date: Saturday, March 24, 2018 at 9:06 AM
To: "Andy Fingerhut (jafinger)" <jafinger at cisco.com<mailto:jafinger at cisco.com>>
Cc: p4-design <p4-design at lists.p4.org<mailto:p4-design at lists.p4.org>>
Subject: Re: [P4-design] Maybe too much text describing issues around P4 table and parser value set sizes

Hi Andy,

Great stuff, worthy of a book chapter or an encyclopedia article.

If anything, I'd just say that some of the assumptions are a little "rosy" meaning that they cover the "classic" behaviors, but do not account for a variety of idiosyncratic behaviors that are often encountered in real life. Call them bugs, features or both, but they do have an effect on table capacity. One big topic you sort of omitted is the question of how to count default actions: are they entries or not? do they need to be included in total entry count or not (i.e. should the "natural" size be 2^N, 2^N-1 or 2^N+1)? Even in "good" cases, such as TCAMs or indexed tables, I am sure that different implementations will answer this question differently. In other cases control plane might need to reserve additional entries for a variety of technical/efficiency reasons (faster entry moves in TCAMs, atomic updates, garbage collection (in action profiles), etc.).

Therefore, people usually advertise their table capacities as being "up to N entries", with that "up to" being the key, that allows customers to reasonably judge the device capacity, while freeing the vendors from frivolous lawsuits :)

I know, there are different opinions in this area (mostly guided by (a) the n-to-1 case you described and (b) the fact that SW implementations might not need size indication at all), but coming from the ASIC background, I firmly believe that the basic "up to" size attribute is vital for both tables and value_sets and is preferred to an annotation, because it expresses a basic intent of the data plane program designer that the compiler cannot reasonably guess (as opposed to many technical allocation parameters (e.g. the number of ways in a hash table) that the compilers can determine automatically, often even better than humans). P4_14 even has min_size and max_size attributes in addition to size and I think that was not a bad idea (although I am not sure if they were implemented anywhere).

Last, but not least, I am not quite sure what to make of the comment at the end of the article that assert that ATCAMs are not "fast". I think everything depends on the implementation, but if you can implement a multi-way hash table with O(1) lookup performance, there is no reason why you can't implement an ATCAM with the same performance either.

Happy hacking,
Vladimir



On Fri, Mar 23, 2018 at 9:49 PM, Andy Fingerhut (jafinger) <jafinger at cisco.com<mailto:jafinger at cisco.com>> wrote:
In the last P4 language design WG meeting, I volunteered to write up some notes on P4 table sizes, and what “size” might mean.

I have written up a discussion of the issues around P4 table and parser value set sizes here.  It is more a tutorial style directed at people who might be unfamiliar with issues around high speed implementations of P4-style matching, and what their consequences are for specifying table sizes, especially ones where the actual capacity of a table is not really predictable.

https://github.com/jafingerhut/p4-guide/blob/master/docs/p4-table-and-parser-value-set-sizes.md<https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_jafingerhut_p4-2Dguide_blob_master_docs_p4-2Dtable-2Dand-2Dparser-2Dvalue-2Dset-2Dsizes.md&d=DwMGaQ&c=uilaK90D4TOVoH58JNXRgQ&r=tGW6TKXajnoXSyy1S1P4DHGPe8sj54GGvw-b21n7aWg&m=igHNDkIsLYagXgGCPpIvkY56e6q_4mGo3TN4g8WOQYk&s=Mm_IWg6WCXnjrb1Vh5lONc_7WE2uO3BgmENj5HnM_BQ&e=>

Corrections/questions/comments welcome.

Andy


_______________________________________________
P4-design mailing list
P4-design at lists.p4.org<mailto:P4-design at lists.p4.org>
http://lists.p4.org/mailman/listinfo/p4-design_lists.p4.org<https://urldefense.proofpoint.com/v2/url?u=http-3A__lists.p4.org_mailman_listinfo_p4-2Ddesign-5Flists.p4.org&d=DwMGaQ&c=uilaK90D4TOVoH58JNXRgQ&r=tGW6TKXajnoXSyy1S1P4DHGPe8sj54GGvw-b21n7aWg&m=igHNDkIsLYagXgGCPpIvkY56e6q_4mGo3TN4g8WOQYk&s=Avv9VPJx5NI3SRUQy9aN7lk_rDLuHqX0_IwMBoDIA2E&e=>



-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.p4.org/pipermail/p4-design_lists.p4.org/attachments/20180326/34d5d87a/attachment-0001.html>


More information about the P4-design mailing list