How Improved Short-Circuit Local Reads Bring Better Performance and Security to Hadoop


. This approach was simple, but it had some downsides. For example, the DataNode had to keep a thread around and a TCP socket for each client that was reading a block. There was the overhead of the TCP protocol in the kernel, as well as the overhead of DataTransferProtocol itself. There was room to optimize.

Unfortunately, those permission changes opened up a security hole: Users with the permissions necessary to read the DataNode’s files could simply browse through everything, not just things that they were supposed to have access to. This was a little bit like making the user a super-user!  This might be acceptable for a few users — such as the “hbase” user — but in general, it presented problems.  So although a few dedicated administrators enabled short-circuit local reads, it was not a common choice.

. This is better than a path cache, since it doesn’t require the client to re-open the file to re-read the block. We found that this approach improved performance over the old short-circuit local read implementation.

 is an expensive operation.

相关内容

    暂无相关文章