Archives
DNS Working Group
Thursday, 14 May, 2015, at 9:00 a.m.:
JIM REID: Good morning, everybody. This is the second session for the DNS Working Group, and I hope you are all bright and refreshed at this early unreasonable hour of the morning.
Just before we get started, could I please ask you all to make sure your mobile devices and other gadgets that beep and make noises are off.
A few talks following on from yesterday about the Knot resolver, Marek will give us an update right now about the new resolver from Knot, which is called ‑‑ Marek.
MAREK VAVRUSA: So, good morning, everyone. Six or seven years ago, I wrote a buy for the bindings for the unbound and I thought my recursive DNS career was over. It turns out I was wrong, and five years when I joined cz.nic I started working on the Knot DNS, the authoritative part and last October I started on the recursive, which is the stuff I am presenting today and it's currently work in progress.
So, for the impatient ones with the screens in front of you, you can go to the website and just glide through the documentation to get a feel what it feels like.
So, it looks a little bit dreary. So, in the next ten minutes I am going to go the description of the library, the Daemon and the extensions for the Daemon, just the concepts and the stuff that works already. And then I am going to give you a quick demo, it's going to be a fake one but a demo is a demo.
So, why did we build our own library anyway? The obvious answer is could you just use something that is already present and slap a network Daemon to it and the thing about the libraries today is that they have this thing that I call the Telegraph road effect, so back in the day when the Telegraph was the tech thing, they started building these poles to correct the cities. But in order to build them they needed to build the infrastructure first, the roads and the cities and all the stuff that comes along. But then the Telegraph became sort of obsolete and there was no use for the towns and the cities. So when you go back to the libraries, they usually build around a specific technology like the storage engine or something, but when you need to replace it with something else you need to replace the infrastructure with it, so it turns out it's much easier to write your own library, anyway.
So, the library that we built provides two different APIs for the resolution, one which is just like the ‑‑ info and one which is a stateful machine. The library provides a few services, like the resolution and a system for the extension, a cache and the reputation system for the name servers so we can blame them for answering badly or doing nasty things.
So, this is a little bit simplified diagram of how the name resolution looks like. When I draw it I realise the name resolution is really a data transformation in disguise and you don't have to read it because the most interesting parts or these two and so the library implements a system of layers that acts like a little bit tiny state machines how the data is processed and generated.
So, the advantage, when it is compared against the monolithic libraries you can make some match layers, if you are a local or the end user you can run a stat resolver and some cache, if you run a big hosting farm you can add the different layers of caches, module for statistics or some filters against the DDoS or something else, but the thing is, everything needs something a little bit different, and it's possible with the layers.
It also means it's a little bit more secure because there is less active code in each installation and that means less surface area. There are three basic layers that I implemented in the library, it's the iterater, the cache flow records and the cash flow packets which is used for negative and positive caching as well.
Just to give you a quick example on the layers, the iterated layers drives that co‑operates with the caches and it does the, what I call the best effort QNAME minimisation. It's best effort because it's not really a QNAME minimisation because it stops minimising the query when it reaches a zone cut. The reason why we do this is some CDNs are broken and they give you a name when they shouldn't and they give you when shouldn't ‑‑ and the thing is, things might break during the resolution so you have to stop at some point. It might leak some information to the name server that is one step above it, but it usually doesn't.
The cache isn't build around a specific storage engine, it's just an interface. Right now we have implemented an LMDB based back end because I figure that is what most people would use, it's persistent so if you take out the daemon and restart it you have the cache already it, doesn't wipe its brains out. It's used for pretty much anything, like if it's a record, is it a secret, is it a DNSSEC data or something like this? You can replace it during the resolution. And as I said, it's persistent.
Interests lass system for name server reputation because you have to identify which name server answer badly or a long round trip time or something else. I figured it might be an interesting thing to map the health of the Internet infrastructure sort of, to make a map and maybe mail the individuals who commit crimes against the DNS. But I might be charged for something.
I don't know if this is legal, actually, to many people.
So, if we move to the daemon, it's written in a C because I know C and I quite like Lua, so I try to dabble with it, honestly. The Lua is actually used for pretty much anything from the configuration to the extensions and interface and all the interactions. You can even write layers that I mentioned before so you can type into the query resolution itself. I figured it might be interesting, for example, against a DDoS attacks because the attack pattern changes very rapidly and it's not very convenient to compile it and look at the code and distribute it anywhere because if you just make a Lua package you can push it to all the servers in the farm and they can load it and be protected until the next attack comes.
It's a dynamic configuration, because it's really Lua behind but you can configure it in a declaretive manner, you are pretty much used to it.
In addition, you can enumerate the network interfaces and set up events. A lot of people template the configuration files, you don't have to do this with this because you have to check if the host name matches something or do some preconditions, iterations and stuff like so you don't have to use to do this. For example, if you have public network facing machine and internal machine you can alter the configuration based on this.
The resolver supports the modules written in C, Lua and limited subset of go because I realised it's impossible to go from C to go and the other way around but I wanted to try anyway.
As I said, you can type it directly into the name resolution so with Lua that makes sort of an open resty of DNS, you can script pretty much anything, subscribe to data if you have E DC daemon and you want to configure all the resolvers to the same settings you can subscribe to it, you be publish them because I realise add lot of DNS companies are actually data companies so you can tap into pretty much anything like the round trip time, the resolution or the behaviour of the users, the content of the caches or anything else.
With the extension system, we have included a few modules that you might use, one for static hints that is being to be show cased. The second one is just in test, whether it works with the Etcd daemon so it works, it can update its configuration from the peers. There is a cache control module for operating the cache, and I am writing support for the memcached D instead of LMDB for some people that might want to use it.
In general, you can replace pretty much any part of the library, so I figured it makes it sort of ‑‑ kind of thing.
So, to give a recap: We made a resolver library with state‑machinesque API, scriptable daemon and a bunch of modules, we have a quarterly release plan but things are not stable yet so the APIs might break and it might eat in your sleep, I can't guarantee that.
To put it in some sort of perspective, we have built buildings with the walls and the roof but the furniture is not there yet, so it's not very comfortable.
And now to the demo. Obviously, this is my terminal, I am doing it live right now, so we just compile the daemon, quite nice, with all the dependencies, now we can start it. We don't have any configurations yet so I just listen on the local host and some special ports, it gives us this command line where you can use it as a calculator or just validate commands. And so we did just that, we evaluated all the interfaces that the daemon is bound to and we can load some modules like the static hints and the cache control, loaded some hints. We are not limited to that because we can ‑‑ this gives you ‑‑ ‑ well, before, when I was talking about the layers, this actually lists the layers that are active right now in the resolution so you can see it falls through the iteration and checks the caches and the hints. So it does it in this order.
So now we can dig it it answers to recursive queries. We can look at what it did. It planned the query and then followed the search path and finally led to an answer. Great.
So, back to the configuration now, there is some domain that we don't want to translate to correct address, so we just set up a hint for a bad guy to translate to local host. And we can query it if it's active, yes, it is. And now we take it to the resolver; it's going to translate the bad guy with no TTL to the address from the hint. We can check it from the locks because there was no search path and it was answered directly from the hints. So that is like the basic resolution in practice.
You can have a look at the web pages where you can find the project. GitHub and we are building on Travis which I am very thankful for, the scan just checks for mistakes and we built the documentation on the ‑‑ if you have any questions?
JIM REID: Any questions for Marek?
ED LEWIS: You didn't mention DNSSEC I don't think, in the slides. But I am not asking about that but I want to ask about a Trust Anchor management, I have a talk later about ‑ dish want to make sure when you build these tools now the new recursives they able to learn trust anchors and stuff that hasn't happened all that much. I want to plug in for adding concern at some point, trust anchor management in the course of DNSSEC which I am sure you are going to do.
JAN VCELAK: It doesn't validate yet so we have a module that is being built right now. But thank you.
JIM REID: If I can time to ask a question myself, Marek. When do you think the software is actually going to be of production quality to be released and supported or would that be waiting until such times as you have got the DNSSEC validate element completed?
JAN VCELAK: Yes, I would like to have the production ready version at the end of the year, maybe earlier. We actually have a release plan on our website which you can have a look at it.
JIM REID: OK. Thank you very much. Anybody else? OK. Thank you, Marek.
(Applause)
Next talk this morning is from Marco Prowse and going to be telling bus some interesting things found own transfers with rather strange connectivity properties Prowse Prowse I am with DENIC, and today I would like to talk about the experiences we made with zone transfers over long fat networks, lossy networks, to locations far, far away.
Every presentation needs an agenda and so just let's start in the introduction. DENIC is the registry for the top level domain.de. At the moment we are holding over 15 million domains in our database. We have got 16 name server locations, most of them are Anycast set‑ups. The size of our zone file is around about 1.5 gigabyte at the moment, and, yeah, we are holding around about 20,000 DNSSEC domains. And average incremental zone transfer takes around about 185 mega byte. So things looking fine so far.
Why should we take a deeper look at the network? So, well, with increasing zone file and DNSSEC enabled, also our incremental zone transfer, we are growing, of course. And to locations far, far away, we of course saw that transfers last longer. But unfortunately, in some cases the transfers didn't really fit very well in our zone generation cycle or even worse, the incremental zone transfers were cancelled and full ‑‑ AXFR transfer was started. This is kind of bad in situation where we already hit high packet loss rates. So beside latency we also see packet loss on some paths which also decrease our throughput, of course.
Here is an example of the times. We see from generation of the zone to the fully transferred zone file. Let's say the generation start at 11:00 a.m., it takes roughly one to one‑and‑a‑half minute to be generated out of our database. After that, the zone is signed and double‑checked. Some connistency checks after. This amount of time is roughly 45 minutes. After that, the signed zone is transferred to the locations. And as you can see with zone, yeah, number 2, is generated at, let's say, 1 o'clock PM, we met some kind of race condition for upcoming zones in case of high round trip times and packet loss during the transfer.
OK. What did we do? First of all, we had a look at the path MTU discovery at maximum segment size. We delivered this zone over spoken tap ‑‑ we have decreased MTU at that case. The good news is that path MTU discovery is working like a gem and also the maximum segment size is adjusted by the interface MTU. But unfortunately, we also some that the path MTU discovery is not influencing the maximum segment size and only the fixed MTU of the interface is taken to compute the maximum segment size. So we had two possibilities to fix that issue and to possibly save packets.
We could configured a fixed MTU of 1300 on the interface but then this will also be used for the LAN traffic and therefore also decrease the MTU on the LAN.
And the second choice we had was to let our VPN concentrator change the maximum segment size inside the flow and takes to MSS clamping we could rewrite the maximum segment size during the initial TCP handshake so both end points learned the correct maximum segment size.
After enabling MSS clamping we saw a small improvement concerning fragmentation, so we saved a few packets and, yeah, we saved packet is a good packet, right, in case of packet loss but it was not enough to handle the traffic to our locationes with high latency and additional packet loss.
So, what was the next level in the game. The next level we had a short look at the involved TCP congestion control algorithm. As, you know, there are few algorithms out there in the wild, just to mentiona a few of them, bic cubic, veno, and so on and so forth. We focused at the most promising three, cubic Illinois and Hybla. Why these three?
Cubic seems to focus on high benefits ‑‑ focus ‑‑ it focus also ‑‑ allows also very fast window expansion so both could fit in our situation.
TCP‑Illinois seems to target high speed long distance networks and it should achieve a higher average throughput. So, sounds quite well. Sounds ‑‑ yeah, could help in our situation.
And last but not least, TCP‑Hybla is also focusing on longer round trip times and packet loss due to errors. Sounds also quite well.
Although Hybla seems to be developed to focus on terrestrial and satellite radio links, two locations far, far away we seem to meet the same conditions and to be honest, we actually can't know if there are some satellites links involved because we are using public Internet or VPN, maybe, maybe not.
So these were the three algorithms we tested in a small test set‑up because, yeah, testing in production is a bad thing.
For this test we took two Linux VMs and one FreeBSD VM. The first ‑‑ the first VM act as Zone Master, the second Linux VM act as zone receiver, and in the middle was responsible for the delay and the packet loss, it does its job quite well with IPFW and the detecting module, so for injecting the delay and packet loss, it was just I think too lines of policies to activate it.
Yeah and the winner is TCP‑Hybla, although they are quite close together Hybla did the best job at the simulated lossy LFN.
Congestion control algorithm was quite easy to activate on the Linux boxes, we had just a short look if the modules are there. We learn ‑‑ we load the module and after that we just activate it via the probe file system or could also choose ccTL, it's your choice. It's a good idea to activate selective AC scaling, window scaling on the receiver, in this case it was activated by default.
So let's have a look how the new contrast and control algorithm will work in reality. Here you can see the zone transfer times before TCP‑Hybla was activated. The zone receiver was configured against ‑‑ Zone Master and to be honest, I had a quite hard time to figure out which congestion control algorithm is used by 10, the closest information I got could be TCP gnode.
So nevertheless, at this point we had also talked to our transit providers to change the path and unfortunately to this location, this point of Seoul in south Korea we met was quite prominent, nearly all paths we see, we have one or two paths we could use where the real problematic was not involved. So we asked our transit to change the path to this location and we did see a small amount of improvement but on high traffic times and this location we also met packet loss and, yeah, also a delay of our zone transfer.
Then the location got ‑‑ we built the location and configured it against newly set up Hybla optimised Zone Master and, yeah, we saw quite good transfer times and we also saw that there was no lack of zone delivery.
At this point, we built our syslog server, and then something funny happened; some morning we had a look at the graphs and thought what the heck happened here, did we bet on the dead horse or ‑‑ well, the explanation was quite easy: Unfortunately, the location was rebuilt and unfortunately, with the old config. So here you see the same traffic pattern we had without the looped in TCP‑Hybla activated Zone Master. After fixing this, we see quite nice traffic pattern we had with Hybla.
Also, other locations were configured against new Zone Master, as you can see in Beijing also met the problem with round trip time and packet loss and we have the lack of some zone file deliveries before Hybla and after using the new Zone Master with Hybla we saw a quite stable delivery of our zones. Also, at high packet loss, really high packet loss times, we were still able to deliver the zone in time.
Then a few weeks ago, we were quite happy to see packet loss to another location in Hong Kong where we can ‑‑ could see the coincidence concerning the packet loss at the moment, at this moment we met around about 5% packet loss and we didn't see any influence on the zone transfers at this time. So with this new looped‑in Zone Master with TCP‑Hybla activated we were able to re‑establish a stable delivery of our zones to locations far, far away.
That's it. Thanks for attention. Questions? Comments? Recommendations?
JIM REID: Are there any questions?
ED LEWIS: ICANN. At the beginning of the talk I liked the idea you even looked at this, is it going to happen every two hours, it better be done in less than two hours. I have seen people forget about that and things blow up. My question is about you mentioned in the problem set‑ups you set‑ups fiction is at this size. What would be interesting is what records are in that, in an average ixfer how many records you are sending back and forth and what kind? The reason why, from that I remembered at the IETF not too long ago some guy named MAT eye I can't talked about trying to optimise IXFER and for example, can we just not send the removed R significance what would ‑‑ this is an algorithmic way to help the same problem let's lighten the load, because this is going to grow over time, I think it's good to use both of that and I want to bring up math thigh's work here, this is operational data that could feed into the IETF side. That is a good idea.
MARCO PRAUSE: The question was what kind of data is in the incremental zone transfer, right?
ED LEWIS: If you actually could also put in what ‑‑ what are the record types, is it mostly RR 6 that are being deleted which are garbage and if we get rid of that the load is not damaged ‑‑ you still have the problem with the long delay. Not saying that is a better solution but it would be helpful to quantify because your zone is huge and you are doing DNSSEC so it's ‑‑ yes. Prowse Prowse most of the ‑‑ most of the R SIG at this point because we are signing the zone every two hours.
JIM REID: I think ‑‑
SPEAKER: Met /THAO*EUT I can't say thanks, Ed, for mentioning that. This is indeed something I looked in while I was working and we saw this zone transfers incremental signed going out with lots of signatures, like Sir recorded deleted signatures you are sending a lot of data which you can say in a couple of bytes, right? So I looked into it and I thought of ways how you can improve this but I never, I tried to find out if there was an actual real live scenario that suffers from this and you should....thank you for that.
SPEAKER: Laurence from Netnod. These patches to the kernel, they are ‑‑ D /HR*EUPBLGS kernel in your case I suppose, the TCP variations ‑‑ are they a variable ‑‑ available for other operating systems, other certainlies as well as, is there a Linux specific thing?
Prowse Prowse: It's not quite a Linux specific thing. Hybla is available for all kind of Linux distributions. And I had a look at, lets say, open base, for example, there are possibilities to change the contrast and control algorithm, for example, for FreeBSD. At the moment I am not quite sure if TCP‑Hybla is available, so I have to look again.
SPEAKER: Thanks, that's fine.
JIM REID: Thank you. One last question from myself again. Was there any other ways you could have trying to tune the PCP performance by changing the values of some of the variables other than the parameters, was it really necessary to look at a new congestion control algorithm?
MARCO PRAUSE: Yeah, I tried to change the Windows size and so on in the beginning, but I also had hit this issue, so yeah, the next step was to to looking at the contrast and control algorithm.
JIM REID: OK. No more questions? Thank you very much.
(Applause)
MARCO PRAUSE: Thank you.
JIM REID: After Marco we have another well‑known face, Ed Lewis, he is going to talk about the mentkeel roll over for root zones.
ED LEWIS: Good morning, Ed Lewis from ICANN. Always at a microphone. I have an agenda it's not hidden, I am not going to talk about it, though. So, for the background, I am talking about some of the events that could happen involved the root zones KSK right about now they have started already. And they are going to go for some time. The background, the root zone KSK is the trust anchor for DNSSEC. If you are doing validation from the top down and the DNS, the root zone KSK matters to you some wayment the other half ‑‑ and it's been in place for five years, we have been using the same key for the last ‑‑ almost last five years and also up until recently we also had the same critical hardware, the HSM has been in place. After five years in operation, it's ‑‑ there is some concerns over the HSM's battery life, I would say it's more of a perception than an actual ‑‑ we haven't had evidence that they were failing but people are concerned about the batteries. And there is this overhanging requirement from the rafters to roll the KSK if not other reasons to go ahead and do that.
Now the players involved in this activity, there is three of them that are the root zone management partners, ICANN, our main concern here is the KSK and amongst other things. There is the NTIA who is working with all this, and then there is Verisign who is on the ZSK zone. We split the keys between the two organisations. In addition to these called the root zone partners we have an ex‑/TERPBLT design team that we have recruited so give us some more opinions on what is going on and I hope to involve them at the end of the slides to have a discussion in the room today.
By the way, I am asked to mention too ICANN is doing all of this work the KSK and DNSSEC work in other functions under the IANA functions contract.
For those who don't know, and I am sure most people do know, KSK it's what kind the root zones keys ‑‑ signs any zone's keys, in the root zone KSK it's at the top. It's the one set that everyone has to copy in order to start validation, you can't learn it from any other way witness the DNS, it's got to come from out‑of‑band place or mechanism into what you have. And the private key for all this is held in some HSMs in different facilities, it's only ever used in that, in those facilities. An HSM is specialised hardware for handling private keys and it's basic job to make sure no one sees the private key for any reason.
So, in terms of public impact, why do we have a presentation on this? The HSMs change ‑‑ changing the HSM, one of the actions you should know about this, you should know we are doing it but you shouldn't actually sense it in the network. If all goes well you will never know (shouldn't) the biggest thing is the perception that the batteries may be coming an issue and we are addressing that by having new machines and new batteries and mixing up the batteries and so on. The KSK roll is a much bigger deal impacts doing DNSSEC ‑‑ anyone who is validating DNSSEC, it doesn't really affect those signing or doing authoritative serves but definitely on the validation side. If you are validating DNSSEC today you have a copy of this key, well unless you are doing it some where down the tree, and that is going to have to be updated somehow and that is the job ahead of us, the root zone manager partners.
This presentation, I am actually halfway through my slides right now, I want to impart some of the information what is going on so you have up to date I want to get reaction and feedback for this. One of the things here is I want to make sure that everyone is happy with all this going forward so we are going out to talk to some people about this. I do want to call attention to a specific thing coming up next month that ICANN has its public comment periods, a formal way to respond to these things. And we will let you know when that is going to happen and have information about how to be part that have but I want to make sure you know there is a public comment which is the formal way to send feedback in here. Informal ways are also accepted and that is part of today. You can send comments anywhere that someone is listening and we will try to address it. No guarantee we hear it if it's informal area but we are doing our best to cover as much as we can.
HSM change, it's actually a pretty straight tech refresh of the equipment we have out there. For the most part we are talk taking the same brand that we have been using and getting newer models for them. There is details in the lower link about the background and choices and such, if you really want to go deeper. The status of that right now, we have brought in two new HSMs already, they are active or they are in a box right now but active in our east coast facility in Culpeper Virginia and the west coast about two or three months from yesterday.
The KSK roll, this is a much bigger issue and one we need to be concerned about. Greater public impact, a lot of options. Over the years ICANN has started looking at this. In 2012 we had a public consultation which started the process of what should we do. There was an engineering effort in 2013 which designed a way to go ahead and do this, now we are bringing in external folks, we asked for people to come and volunteer and do some more background and make sure we have covered all of the angles. The milestones for this is in June we are going to release some documentation on the study going on for the past couple of months, let people comment on them and clean up the reports and use that to help develop a plan that will be what we end up following.
The design time roster are these folks, many of them are familiar in the room. Jaap is here, Andrei is here and Geoff is here. And there is four others. The three of us will come up here and talk or ‑‑ three more and will come and join. Plus participation of the players earlier.
So in theory, KSK rolls are done, there has been many out there. TLDs have done them and we have looked at them. Some haven't gone very well, which is a good thing we know what makes things go bad. We have seen a lot more good ones go out there and if do you things right it works. The root zone is a little different ‑‑ the important element here is that there is no one to tell here is the key, and everyone gets it. We have to tell everybody about the key so that makes us a little bit different from the traditional roll. However, we do have RFC 5011 which is a Trust Anchor management standard, which helps mitigate this. It's not the only way to do this but it's one big important reason why we believe this can be pulled off.
Now, in practice, any plan we are going to have is going to have challenges, to quote Geoff, things will break, it's a matter of limiting that and knowing how to react to the breakage out there. Questions come down a bit deeper. L validators be able to receive these messages are we going to expand the size bigger than what firewalls will let in? Will the automated trust updates work? And not just 5011 work, the RFC standard but does all the software saying it does this, does this work we are trying to make sure people have tested this and we won't have problems and what versions of software will it work in. And beyond that operators need to know what happens when something is going bad, who do they turn to. And do all the code paths work?
I am trying to avoid calling the ISP operators and saying please fix this for me. It's not the ISPs' problem or fault.
So, this presentation is to inform and invite participation. So, I would like some of the folks from the society, if you want to come up here or be available for questions, come on up here.
Typically we have been looking at the size of messages and alternatives to the algorithms out there, 5011 and what goes on there, and I will run through the next couple of slides, these are some support slides that were added to give the audience a bit more of a ‑‑ what to think about and what kind of questions to ask. This isn't necessarily the plan we are going to follow; this is just some of the earlier work we have had. The first one shows the first event being that at some point we are going to have a new KSK put into the root zone, when we do that responses will get bigger and they get bigger.
Later on, we are going to do ZSK rolls in the middle of this, if you know anything about RFC 5011 it takes a long time, we may have even bigger packet size. And again, bigger is something triggering a new height, maybe that is a concern.
At some point, we take the old KSK out and when that happens, anyone looking at the old KSK and not is going to see ‑‑ everything goes ServFail, black for them. We don't want that to happen. And finally, we are also throwing into this a little twist on, it's 5011, which is to delay the revocation piece ‑‑ revocation means I am not going to use this key any more, if we take it away and stop using it and have to go back to it we have it and can go back to it. But if things go well we start revoking again and get out of there because do you need to remove the garbage from the system at some point.
The other slide here is we have been looking at the response size, that has been one of the things that we have been kind of concerned about, what is the right size to limit things to. And on this chart up here, which is more of an engineering chart than colourful display chart, there is two vertical bars are at 512 bytes and 1,500 bytes which are two significant sizes in the Internet and the histogram shows you where current TLDs sizes are, the day we measured this. And some of those are outside those bars. We don't know that those two bars mean something significant but we are anticipating that. So, I have run kind of quickly through this. I didn't go through much detail I would like to get people to come up and say what you want about the process and things that you think we are not thinking about, anything ‑‑ suggest feel is going to be a problem. I invite to you come on up and we can discuss it with the sign team also.
JIM REID: Ten minutes.
ED LEWIS: We have ten minutes for discussion here.
JIM REID: Does anyone have any comments or questions or observations to make?
Marek: Do you have plans to change the key parameters or just to roll the new one? I am asking for algorithm, for example.
ED LEWIS: So in terms of changing the algorithm, it's being considered. I am not going to say no to anything, I actually want to hear concerns up here. Yes, we have considered a lot of different things which I am not going to fit into the presentation but we have looked at changing the algorithms and other parameters and especially if you have good suggestions, I mean if there is suggest want us to look into and make sure we know about it. But yes, we have looked at different things and we do have some constraints on what we can actually do but we want to make sure we are addressing issues, if something comes in there we think it's a bad idea we want to document that, we didn't do it because of this.
Marek: Imconcerned about the increased response size.
ED LEWIS: Yeah, Geoff, did ‑‑ do you want to ‑‑
GEOFF HUSTON: I have a presentation.
ED LEWIS: Off presentation on that. ‑‑ your next presentation here is about that. I will defer that to Geoff's presentation.
SPEAKER: Same Weiler: I want to make sure you take a look at CDF.344 and consider whether that could be useful here because it could certainly affect the response size.
ED LEWIS: OK. So, I will tell you, we had a similar conversation over the weekend at DNS OARC and quite a bit of feedback there, if people ‑‑ don't be shy. I mean, this is a good opportunity to say I want to ask about this. If not, I think we are OK.
JIM REID: If people have further comments about this, have a question or observations to make on the plan, how do they make them known to you or ICANN? Is there a mailing list for this?
ED LEWIS: I would say, so the formal way is the public comment period, so let me say that the design time right now and we are going to meet for another month, we talk every week actually. They are going to produce a document of the team and it's going to go for public comment so the formal formal formal way is to answer to that and we will definitely make sure that public comment is advertised on to whatever mailing list needs to be advertised to, including Working Group list.
JIM REID: If someone wants to give comments into the design team's work and what they are doing, is that possible?
ED LEWIS: You can talk to anyone on the design team straight out and members of the design team are on lots of mailing lists so kind of sniffing out ideas. This weekend the idea of roll over and die came up and we hadn't been thinking about that, is that effect going to come into play. The design team is out there learning and listening from the group so the informal way in the sense that we are not tracking everything, we have the informal method of ‑‑ if you are worried about the formality, the ‑‑ fur not worried about informality and want to know, ask anybody.
JIM REID: I was thinking more there might be people in the community or parts of the Working Group who are not familiar with the likes of yourself or know how to contact you, if there was an e‑mail address.
ED LEWIS: If it's all right I would say probably the DNS Working Group in RIPE, Jaap, and I should be on it, I think I am currently not.
JIM REID: Just in case, yes.
Met eye I can't say: I didn't go to the mic before when you asked for discussion because we had a DNS org meeting as well and we had some fairly good discussions and I didn't feel like repeating it here, if people are interested in what has been said here that has been archived you can find Google for it or something. I didn't want to repeat the same discussion here.
PETER KOCH: DENIC. So, we had a similar discussion at OARC as you mentioned. On the 5011 option, the nice or maybe not so nice observation is that the people in the room are probably aware of this, and everybody who does not run on a 5011 aware resolver will be able to manually change the config or more, migrate to a 50 level 1. What is the vision for the long tails, all these billions of validating resolvers, or not, that might suffer from the change?
ED LEWIS: It's a good question. 5011 is actually mentioned in our document, 5011 is a one tool and there are other ways to do it, there is a Trust Anchor draft out there which, it's an Internet draft and needs to be an RFC, we are working to get some attention to look at that, so that is a plug for that document. So yeah, we are trying to make sure we understand who is actually consuming the Trust Anchor information we have. Which is tough. I mean, we are not allowed to snoop so we don't know who is getting it but we have an idea of who is pulling it and we are trying to go to vendors and suppliers. Who is pushing these keys out to people and we are trying to make sure they know that there is a new one coming and so on, that they trust the new key and they are able to say we trusted it so all our relying parties can do that also. So we are trying to figure out what is the market for Trust Anchors out there. That is one of the things we are doing. I call it supply chain.
Ultimately, though, and this is where something will break, there is stuff throughout we don't know how to get to the person who is in charge of that or if there is anybody out there. At some point how much responsibility we have to make sure that everyone is perfectly updated against how much do I have to like do network surveillance to find out? It gets kind of interesting at this point and privacy makes it harder to ensure the smooth transition. I am not saying that privacy is wrong but you have to understand that there are parts of that we don't even have the tools to test some of this stuff. I don't know if I didn't remember validating resolver has any Trust Anchors without breaking you and you complain to me. I don't know that I want to have that ability; it's a question in the air, whether or not there is any remote way of managing this for someone, you know, yeah, I don't really want the responsibility to going out to all resolvers out there and testing them.
PETER KOCH: What I think what I was trying to get at is that this is at least as much a communication exercise as an engineering exercise.
ED LEWIS: Oh, yes, certainly.
PETER KOCH: And that is ‑‑ that communication plan part is something that probably needs to be more on people's radar.
ED LEWIS: And actually, suggestions to help are appreciated. I am just saying this idly. Definitely, if people have an idea of how ‑‑ what venues should this go to, we do have an experience with the root zone, in 2010 there was a roadshow essentially, and saying that wasn't exactly the right thing to do. Nowadays, if people want to know how ‑‑ if there is someone you know out there in an enclave and needs to know this but it's not apparent that I know them, then bring it up. This is a general invitation.
PETER KOCH: So we have this usual chart with the early adopters, late majority, blah‑blah‑blah. From an early adopter I would expect that they keep up to speed with developments and early adopters are probably supposed to reed the root's D PS and stuff like that. However we definitely have a population, maybe a small one, maybe a bigger one, we don't know, that is in the early adopter range and doesn't know, and that is kind of scary because it's ‑‑ of course, and you are perfectly aware of that, if that breaks, the trust in the DNSSEC and the workings of that will have another push back seems to what addressed by the negative Trust Anchors, just waving flags and hands.
Steffann Largohome speaking for myself. One thing that changed over the last five years is Edward Snowden came and detected some stuff we didn't know five years ago. And when I am listening to you, I don't hear anything that you are changing in the process to counter that; I am a little bit concerned. So I wanted to understand is there anything you guys are doing to like mitigate that additional risk that we know is available? I am hearing you you using the same HSM, it is in a box somewhere now which sounds a little bit scary. I think there should be some counter measures because if you remember from the OARC in LA there was a guy from Yahoo that spoke there and said there is no way in hell we are going to do any DNSSEC signing because we don't trust it. I spoke to Dan a couple of months ago, you had this initiative with P BIRD, it kind of died out with Snowden. So there is certainly a little bit of a mistrust in the community here, and is there anything we can do during this key roll over to help restore that?
ED LEWIS: Certainly quantify or ‑‑ the concerns they could be addressed in a much more engineering sense, you are giving me kind of a high level concern here and in my mind I am trying to what part ‑‑ I think ‑‑ yes, I have to understand better what you are talking about to address it quickly. I mean, I get the whole idea but I am trying to think what parts of this apply because we are not ‑‑ we are not tapping for packets so it's not that you are concerned about. But in terms if I am hearing you correctly the algorithm, the cryptographic algorithm may be something that people are questioning the use of. Just as a potential. So should we go to different algorithm? For that reason. That is good question. I am not going to really get into it because we are beyond time. That is good question and you can talk to some of the design team members and make sure we address that in the report. I will tell you, we have looked at the other algorithms and looked at different reasons for other algorithms but this factor may be different, for example we are concerned about size so obviously size is an issue. But issues like that would be interesting to to see to make sure we are mitigating any concerns that are in this area. So definitely, you know, raise these issues up because we may not have considered that particular part of the problem space.
Warren: Largely responding to what Peter said. Back in 2013 S /SA*BG published something and it's first recommendation was /TKPHA*PB conjunction with ‑‑ should immediately undertake a significant worldwide communication effort... process and publish this as widely as possible." So, there was a already supposed to be a bunch of outreach so all the resolver people would know. It would be good if that happened soon because a lot of people don't know.
ED LEWIS: We are building a time machine ‑‑ yes, noted.
JIM REID: With that, I think we are finished. Thank you very much, Ed.
(Applause)
JIM REID: Next up we have very popular speaker as always, Geoff Hughes /TORPBGS who is going to be tyre kicking the DNS.
GEOFF HUSTON: Good morning. This is actual a follow‑on to Ed's talk because it's trying to look at one specific part of this issue around rolling keys. I actually looked and dug up some of this stuff from five years ago and the announcements that came up, I remember coming to RIPE meetings where awful were you /STAPBGD up applauding a letter going, we must tell /TKAO*PB sign the root and it was with great fanfare that it all happened.
And they took it very, very seriously, as you are well aware, and here is a picture of where one KSKs definitely lives in some deep dark repository inside Culpeper in Virginia. There is another KSK here in Amsterdam just outside the RIPE offices. Somewhere.
But the commitment was made five years ago and, you know, the key will roll now, and, you know, there is no reason not to, per se, we are not any wiser or any dumber and the commitment was made to roll it. Keys aren't eternal, no key is eternal, at some point you have to roll. And there is this large discussion about whether you roll today, tomorrow or a fortnight's time or whatever. But what you can't say with confidence is, never roll it. And to some point, you know, five years, fair enough, five years. So this is really, really easy, right? You all change keys and do you it all the time. You published a new KSK, include it in ‑‑ use the new KSK to sign the ZSK, withdraw the old signature and revoke the old KSK. Fine. You use 5011, everyone is just happy, yes? And here is a diagram to show it's got lots of pretty colours and we understand it, sort of. There is some fine detail inside all that phase because two keys are rolling. Every 90 days the zone signing key already rolls. But it doesn't sort of matter as much, you don't notice it because the key signing key is always constant so your validators because they have got a static key signing key simply run with the new zone signing key and that is fine. Inside this is one critical little point, that assuming you know we are running with the zone signing key, we are starting to sign and publish with two key signing keys, and just for a critical period we are running both, the response sizes start to get big. If we are using a 1024 bit signing /TAOEL it's 1297 objecting at the times in response, when you do an in DNS query and if 2048‑bit, it gets to 1425. Which the v6 folk would recognise as being interesting. It's certainly over 1280.
So all this is easy, right? And we have had so much experience before, haven't we? What could possibly go wrong? We saw roll over and die and what was going on there was that every time, every six months when the RIPE NCC changed their key, this is before the root got signed so using the old DLV ideas, there were some distributions out there that were copying the old key, and as soon as they failed to validate they started thrashing through name servers. When we noticed this, new software came out and if you now look at the way most resolvers validate, if they find I can't validate through one name server chain, most of them just give us. Because trying every last name server chain from the root down to the terminal, just simply thrashes the DNS. So, we don't do that any more. And of course, now we all support RFC 5011 all ‑‑ and everyone can cope with absolutely massive DNS responses because you all just can, can't you? And if all that is happening all this will go absolutely without a hitch nobody will notice and it's just going to be wonderful right? It's not, and there is an awful lot of weird stuff out there. And there are kind of two concerns that I can sort of pinpoint as being concerns; we changed the key signing key and you don't pick it up. There are many reasons why you might not pick it up. If you have got RFC 5011 complaint old signs new you will pick it up but maybe you don't have that. Maybe you have an implementation that looks for CDS records but doesn't have 5011. OK. But maybe you have none of that. So, the first problem is, you may not pick it up. And the second problem, which was always there, is that in the same way that a v6 host is required and routers not to fragment anything smaller ‑‑ sorry, anything ‑‑ required not to fragment anything smaller than 1280, standards complaint v4 host is not required to accept the data gram, IP data gram larger than 576 objecting at the times. Truly rulely. And nothing is that small any more. And so the second real concern is when we start to blow these things up in /SAOEURBGSZ you might not believe the answer and might not be able to get the answer. The first one is hard to test from the outside. It's kind of it's your resolver, I can't see what it does with 5011. And there is no way I can test it. You can test it individually for your resolvers but I can't test it. But oddly enough, the second one can be tested, and that is what I want to talk about this morning, very quickly.
Do you get bigger responses? So I am really interested in getting you, as users, millions of you, to ask a question where the response is big. In fact, just at the magic size being contemplated in a KSK roll, I want to know if you get the answer or start thrashing. Here is some interesting sizes and you look at that magic one, it's 1425 objecting at the times, which is that larger size of a DNSKEY response where the 2048‑bit ‑‑ I find all this really weird. Most folk understand it and there are a few who numbers confuse them. The number of folk who said an ‑‑ buffer size of 1,500. It's kind of do you think the IP top ‑‑ packet header is zero bytes? You know the largest payload of an unfragmented Ethernet is 1452, if you want a safe, it should be 1452, if you dual stacked, 1472 if you are not. Why are you telling me 1,500? What is wrong with you? In fact, if you have a look at EDNS buffer sizes, here is the distribution in a cumulative sense. Almost everyone gets what they got from the package and just installs it. The package says 4096, the implementation say 4096. But there is a few with really, really low values. Lets blow them up a bit. Around about 5% come in at 512. OK. A few come in at around 1410, 14130, 1440, 1460, 1472 and like I said, there are a few that bring in sizes of 1,500, God knows what they were thinking; numbers must confuse them. So the ones we are really interested in are the folk who say don't send May response if the response you are going to send me is 1425 bytes or longer. Because that what we are planning to do. Which means, necessarily, the response you get from the root zone is going to be truncated. Which means, necessarily, if you really want the answer, you are going to flick to TCP.
Two things: The root zones will get slightly more TCP traffic during this transition; and B, you better be able to receive TCP. All DNS resolvers can talk TCP, can't they? Yeah right.
So, what do we do? Thanks to Google, we start measuring from the edge in big volume. We set up the add, the ad does ex perms and fetches and they all come to servers, so we can do a lot, we did 1414 objecting objecting at the times and one was signed with E ‑‑ want to what you are able to do.
What do we find? We did this test 76 million times, Google is amazing. Google is truly amazing. Sent this ad out to millions of people, absolutely, so we did. 76 million queries. Almost everyone sets EDNS 0, giving back 6. Now, this is weird because only 30% of you then go through with validation. So, oddly enough, if you sign your domain, almost everybody, 83% of queries will get the signatures. It's kind of this weird, I don't know how to describe it really. You are sort of halfway there, you get back all the hard work and go I am not going to bother about validating it, which I find quite bizarre.
83% of you ask for the /S*EUGS. If I look just at resolvers and not queries. 109 ‑‑ several about one‑third of the world's 3 billion users. When I say we found 777,000 resolvers we found the 2000 that really matter and we found a whole lot more that seemed to be your personal resolver. Just serving one or two people, because that is the way the DNS works. 84% of resolvers ask for all the /S*EUGS. So, how well do we work with 1440 objecting at the time,s 9 ‑‑ 85% received the DNS, fetched the web block and we are just happy people. So most of you think RSA signed 14 40, just happy. Of that .7 and a half million, 276 million fetched the DS record, you did the next step, you got the sigs and ask do I like what I am seeing? Let's go and fetch the DS record, lets start the validation dance. There is a lot of Attention Deficit Disorder in ads and let me say, if you receive an ad on your screen, watch it to the end. Don't stop. Because you muck up the numbers. So watch every ad to the end. And what we found there was 850,000 of you left early. Naughty people. And another 494,000 timed out. But I can only find a tiny number, 72 folk who were really stranded and didn't do anything logical so. Out of 9 million or so, that is awfully small and that is for me certainly down an experimental error. So 5% of experiments didn't run to completion. And don't forget, I am measuring users not resolvers. The resolver damage might be a bit big per but you normally have two or three, if one fails you move on. So I am measuring users here not resolvers.
Then I thought well, hell, 14‑401,700, what is the difference between friends? So I tried you all with the 1,700 objecting at the time response. And interestingly, it was still pretty good. The failure rate out of 6 and a half million experts that sort of did both size was still pretty small, 5,000 folk just got left stranded. So most of you handled some sort of frag without any real problem. But the stranding rates just a little bit higher, just that little bit higher at 5,000. And the next question is ECDSA.
There has been some concerns over that both in terms of intellectual property and whether the NSA pre cracked it. But there is one thing about these curves, they are cryptographically dense. I have been told, and I have no idea about cryptography, it's all bits to me, that you get around 10 times the cryptographic strength for the same key length, or in other words, you can do the same cryptography as 204 bit RSA in 256 bit ECDSA. Talk to someone who knows what they are talking about, don't talk to me. But what if I compare ECDSA to RSA and just to be fair, I will pad out the ECDSA response to make sure that both responses are the same size. So the only difference now is the signing protocol, what is the failure rate there?
Slightly worse but not too bad. It's much the same as a really, really big packet. 9 million tests, around 5,000 seem to sort of fail with ECDSA. It's bigger than RSA, it is bigger, but, you know, it's still very, very small. So, from ECDSA point of view it's viable. The one problem is that one in five of the folk who use resolvers that know about RSA, are using old open SSL libraries. Like level 3. And when they get a response signed in ECDSA, they throw up their hands and say it's not signed. I will treat it as an unsigned name and just press on. So while the response rate is pretty good, the problem is that amongst those pool who receive the web block, they weren't validating whereas previously in RSA they were, so there is a certain amount of security downgrade happening.
There has been a lot of work in browser land, preferred v6 goes the message for you, it's better than v4. If you have got dual stack prefer v6. You are in this inverted dystopian land where all messages are reversed so you forget all that crap and go if it's v4 I will use it or v6 ignore it. It's amazing when I look at this, that the number of queries in v6 is less than 1%. And I have been playing with that a bit because I am fascinating by this. Where is the piece of code that goes, well if I get back an NS record with 4 and 6 I obviously prefer 4, why? Very small number of resolvers, very small number of Quays Reese. If I give you an NS that is 6 only, more than half of you go there. So it's something about resolver preference that is go don't like 6 in the DNS. So everyone over there in that next room who is talking about a 6 only Internet next year, next month, next whatever, should talk to the people in this room who are going it's never going to happen. Absolutely never going to half. V4 forever. So it is rather bizarre. V6 has died in the DNS and I don't understand.
Some quick observations. 87% have DNSSEC OK. Which is pretty amazing. And 30% of all DNSSEC resolvers attempt to validate so. If someone tells you there is a real problem with DNSSEC it's not deployed, they might be talking about people who sign domains and I don't know because no one let's me walk their domains. But you guys who validate, there is an awful lot of you. The people next door would kill for 25% deployment and you are signature on it going it's not good enough. Bloody hell, it's more than good enough; it's amazing. So yes, DNSSEC is out there and it's working. There is very little v6, I have gone through that, I don't know why, browsers prefer 6. ECDSA is sort of viable, sort of because we have left 20% of the folk behind us. If we can pick that up life would be even better? Can it work, I don't know about 5011 or any of picking up the key. But using RSA and keeping yourself just below 1,500, most of us will get there most of the time. Some resolvers might get stuck, but users normally have two or three resolvers in etc. Resolve dot CO N F. That is all I had. Questions?
SPEAKER: Alan. I have a question about IPv6, people were querying or not querying, actually. Do you think it's a side effect of the ‑‑ try both 4 and 6 if it's faster you go 4, or do you think it's hard coded in the software?
GEOFF HUSTON: It's all just me and my server, there is no difference in round trip times and I don't see any exploratory round trip ‑‑ I think it's hard coded, but, you know, I don't write DNS code because I am not kinky enough, because the write DNS resolve appear to have a different brain space from the rest of humanity and I found their code really hard to read, hint hint. So I can't find any kind of actual rule in the codes that I have looked at. But the behaviour seems to indicate there is a solid preference going on, A is before AAAAs.
SPEAKER: Very few is all over a lot of people using, have you been asking those people who they have been doing?
GEOFF HUSTON: Not yet. If some of you want to fess up there are some resolver writers here, fess up, what do you prefer if you get both.
AUDIENCE SPEAKER: 4.
GEOFF HUSTON: I am not surprised.
SPEAKER: Jared, NTT. So I was going to comment on the same thing about the v4 versus v6. It's actually this, if they were doing RTT probing there is a number of networks I know that have discovered because there is no traffic engineering in IPv6, in many cases their IPv6 takes the shortest path across their network whereas the v4 may use traffic engineering and go a slightly longer path. So if they were RTT probing they would actually prefer v6. But I do suspect that there is something a bit simpler to be explained which is that similar to the probing, I suspect that they are going and saying I have an answer for this v4 thing but I don't have an answer from the v6 server and that is probably the most likely case and say this is undefined and therefore I am just preferring the one I have a cache for.
GEOFF HUSTON: I can offer another explanation about this and I will be nice to them, the resolver writers. And it came up in DNS OARC from PowerDNS's distributed load balancer and the comment was DNS resolvers like to run hot. So if I have got five and I am load balancing, the comment was it's better to run three receipt hod and two idle than all five semi‑because of caching, caching works when you hit it hard. If I am v4 and v6, if I do everything in 4, theoretically, all those lines of being filled and it backs self‑fulfilling prophecy, that it sort of seems a safe thing to do and we are all scared about v6 MTU and about those issues. If we run 4 we are on known ground. So I can understand it to some extent. But I don't run a CGN and I am not trying to provision binding capacity with v4 DDoS in the DNS. So I don't live the nightmare of the service provider, who is sitting there thinking, no matter how much 6 I deploy, as long as I get all this v4 UDP traffic my CGN still has to be provisioned like crazy and nothing I can do in 6 makes life better. You guys aren't talking to each and selectively making your life better makes their life hell. And this is kind of weird but it's typical of the Internet. And I have ranted too long. Your question.
Marek: Cz.nic. Actually have you tried to change the older of the records in the glue, like the carrot and stick for the resolvers?
A. Give everyone v6 only and see how you do then. That is the next thing I am going to report on at some feature meeting. I have mucked around with order in previous experiments. There seems to be a hard v4 ordering. It just is hard‑wired as far as I can tell.
Marek: Some resolvers obviously they are configured differently, they fetch the v4 address first.
GEOFF HUSTON: A lot of the time. But if Knot does it differently, tell us.
Marek: We fetch both and decide based on the round trip time.
GEOFF HUSTON: We can test this. Thank you.
JIM REID: Thanks very much, Geoff.
(Applause)
OK. Then that brings us to any other business? And as far as I know there is only one item which I will cover very briefly. As you have been aware, all the Working Groups have been going through a procedure of trying to develop a process for appointing group chairs. We appear to have reached the point where the DNS Working Group has got some kind of silent consensus for a proposal that was put out about a week ago. Now, I think it was a little bit hazy of me to say at the time that at the wee leave this until this particular RIPE meeting and make the final decision on it. So I would like to leave things open for another week or so in case people have had not a chance to comment and review the proposal in what should be its final state. It would be a great.help us to if you could voice message of support rather than relying on using the principle silence applies consent. If the Working Group doesn't speak up and/or say anything we will adopt it and put it into effect in time for RIPE 71 so the first will take place in Bucharest. If we have got anybody interested in standing or want to know what happens as being a working Working Group co‑chair talk to myself or Jaap or Peter, and we will be happy to give you any information for that. With that I would like to close the Working Group proceedings. I would like to thank the NCC staff, the technical support logistics, the scribe, person taking care of the chat room and Jabber feed coming in and the the nice lady doing the stenographery for us. Thank you very much and hope to see you in Bucharest.
(Applause)
LIVE CAPTIONING BY AOIFE DOWNES RPR
DOYLE COURT REPORTERS LTD, DUBLIN IRELAND.
WWW.DCR.IE