s1m0n

📅 Joined in 2010

🔼 96 Karma

✍️ 30 posts

🌀

15 latest posts

Show HN:

"Auto wrap C and C++ functions with instrumentation"

👤s1m0n🕑5y🔼2🗨️0

(Replying to PARENT post)

It was at this point that I realized that the performance of the user-land code in general (including NGINX) could be much better if it wasn't for the legacy network stack in the kernel, which does a bad job at some things such as slow performance accept()ing new sockets, and slower performance because every packet read has to be another system call unnecessarily copying memory; why not use shared memory? That's when I started to get interested in C10M... :-)

👤s1m0n🕑9y🔼0🗨️0

(Replying to PARENT post)

Thanks!

I also forgot to mention about performance. Whether you code * foo or the easier to comprehend and safer(?) foo[i] then the compiler still does an awesome job optimizing. However, it's much easier to assert() if variable i is in range rather than *foo. And it's also easier to read a verbose log variable i (usually a 'human-readable' integer) than to read a verbose log pointer (a 'non-human-readable' long hex number). At the time we wrote a lot of network daemons for the cloud and performance tested against other freely available code to ensure that we weren't just re-inventing the wheel. NGINX seemed to be the next fastest, but our 'dumbed down' version of C ran about twice as fast as NGINX according to various benchmarks at the time. Looking back, I think that's because we performance tested our code from the first lines of production code, and there was no chance for any type of even puppy fat to creep in. Think: 'Look after the cents and the dollars look after themselves' :-) Plus NGINX also has to handle the generic case, whereas we only needed to handle a specific subset of HTTP / HTTPS.

👤s1m0n🕑9y🔼0🗨️0

(Replying to PARENT post)

We do agree that the experienced developers are necessary along side the less experienced developers. At least when the project is getting off the ground and the less experienced developers are still learning the ropes.

👤s1m0n🕑9y🔼0🗨️0

(Replying to PARENT post)

Do you happen to know the company names where they work?

👤s1m0n🕑9y🔼0🗨️0

(Replying to PARENT post)

Thanks! And I have not seen any teams doing this kind of thing either.

Another tidbit: The developer pairs were responsible for writing both the production code and associated automated tests for the production code. We had no 'QA' / test developers. All code would be reviewed by a third developer prior to check-in. However, at one stage we tried developing all the test code in a high level scripting language with the idea that it would be faster to write the tests and need less lines of test source code. However, because we did several projects like this, we noticed that there was no advantage to writing tests in a scripting language. The ratio of production C source lines to test source lines was about the same whether the test source code was written in C or a scripting language. Further, there was an advantage to writing the tests in C because they ran much faster. We had some tens of thousands of tests and all of them could compile and run in under two minutes total, and that includes compiling three version of the sources and running the tests on each one; production, debug, and code coverage builds. Because the entire test cycle was so fast then developers could do 'merciless refactoring'.

👤s1m0n🕑9y🔼0🗨️0

(Replying to PARENT post)

"I have never professionally audited a C project and not found a vulnerability."

Just out of interest: How many C projects have you audited? And have you ever looked for a relationship between the number of vulnerabilities and the percentage code coverage from automated tests?

👤s1m0n🕑9y🔼0🗨️0

(Replying to PARENT post)

Not all C developers need be a top 0.001% C developer to write high performance and rock solid code. Why?

I'm a very experienced C programmer and one day my boss came to me and said that the sales guys had already sold a non-existing client side module to a house hold name appliance manufacturer. The deal was inked and it had to be ready in only 3 months. Even worse, it had to run in the Unix kernel of the appliance and therefore be rock solid so as to not take the whole appliance down. It also had to be ultra high performance because the appliance was ultra high performance and very expensive. Now the really bad news: I had a team made up of 3 more experienced C developers (including myself) and 3 very un-experienced C developers. We also estimated that in order to code all the functionality it would take at least 4 months. So we added on another 4 less experienced C developers (the office didn't have a lot of C developers). The project was completed in time, a success, and almost no bugs were found, and yet many developers without much C experience worked on the project. How?

(a) No dynamic memory allocation was used at run-time and therefore we never had to worry about memory leaks.

(b) Very, very few pointers were used. Instead mainly arrays. And not just C developers understand array syntax, e.g. myarray[i].member = 1 :-) Therefore we never had to worry about invalid pointers.

(c) Source changes could only be committed together with automated tests resulting in 100% code coverage and after peer review. This meant that most bugs were discovered immediately after being created but before being checked in to the source repository. We achieved 100% code coverage with an approx. 1:1 ratio of production C source code to test C source code.

(d) All code was written using pair programming.

(e) Automated performance tests were ran on each code commit to immediately spot any new code causing a performance problem.

(f) All code was written from scratch to C89 standards for embedding in the kernel. About a dozen interface functions were identified which allowed us to develop the code in isolation from the appliance, and not have to learn the appliance etc.

(g) There was a debug version of the code littered with assert()s and very verbose logging. Therefore, we never needed to use a traditional debugger. The verbose logging allowed us to debug the multi-core code. Regular developers were not allowed to use mutexes etc themselves in source code. Instead, generic higher level constructs were used to achieve multi-core. My impression is that debugging via sophisticated log files is faster than using a debugger.

(h) We automated the process of the Makefile so that developers could create new source files and/or libraries on-the-fly without having to understand make voodoo. C header files were also auto generated to increase developer productivity.

(i) Naming conventions for folders, files, and C source code were enforced programmatically and by reviews. In this way it was easier for developers to name things and comprehend the code of others.

In essence, we created a kind of 'dumbed down' version of C which was approaching being as easy to code in as a high level scripting language. Developers found themselves empowered to write a lot of code very quickly because they could rely on the automated testing to ensure that they hadn't inadvertently broken something, even other parts of the code that they had little idea about. This only worked well because there was a clear architecture and code skeleton. The rest was like 'painting by numbers' for the majority of developers who had little experience with C.

The same team went on to develop more C software using the same technique and with great success.

👤s1m0n🕑9y🔼0🗨️0

(Replying to PARENT post)

:-)

👤s1m0n🕑10y🔼0🗨️0

(Replying to PARENT post)

Reason #4: Another problem is that the network kernel was never designed to do internet on the mass scale desired today. Companies like whatsapp devoted lots of time to getting e.g. 2M concurrent TCP connections (considered good) running on a single box, mainly because of the greedy overhead and design of the legacy network kernel. Whereas, in theory it should be possible to have 10M or more concurrent TCP connections on modern average hardware. So from this POV then the legacy network kernel is the bloated memory greedy mess that Java is to software development. See http://c10m.robertgraham.com/p/manifesto.html

👤s1m0n🕑10y🔼0🗨️0

(Replying to PARENT post)

Looks interesting. I'll take a look. You might also be interested in mTCP (https://github.com/eunyoung14/mtcp), or possibly adding mTCP functionality to packet bricks?

👤s1m0n🕑10y🔼0🗨️0

(Replying to PARENT post)

Wouldn't it be more efficient to just have all or nearly all packets bypass the network kernel? Why compromise?

👤s1m0n🕑10y🔼0🗨️0

(Replying to PARENT post)

The title of the article does not mention CloudFlare; only bypassing. The fact that the CloudFlare architecture pushes a higher bandwidth of packets into the network kernel and bypasses the rest does not make it a good technique or to be recommended. If you are primarily interested in the best performance with a single NIC solution then I believe it is suboptimal. Why? You are asking the CPU to do two different types of work; optimized and unoptimized. Because of cache line pollution then the "unoptimized" work via the network kernel will pollute the other work. I may be wrong but I would bet you'd get better performance by separating your CloudFlare specific workload onto two boxes, each with one NIC. In this scenario then no cache line pollution can occur. Of course, these two boxes might not be easily possible within three existing CloudFlare architecture. But this has nothing to do with the general idea of packets bypassing the kernel. After the bypass you want the CPU to process those packets in the most efficient way...

👤s1m0n🕑10y🔼0🗨️0

(Replying to PARENT post)

If you are only interested in pushing infrequently used ssh packets into the kernel for e.g. low bandwidth health monitoring -- while all other packets bypass there kernel -- then why would this be considered a "toy app"? Surely it's a useful technique because it allows netmap to be used on very many cheap dedicated hosts for rent where only one NIC is available and you have no control over the hardware, or?

👤s1m0n🕑10y🔼0🗨️0

(Replying to PARENT post)

This is the inaccurate sentence: "Snabbswitch, DPDK and netmap take over the whole network card, not allowing any traffic on that NIC to reach the kernel." Obviously with netmap traffic to the NIC may reach the kernel...

👤s1m0n🕑10y🔼0🗨️0