实战派带你玩真正的大数据,14周高强度特训!

锋云网

 找回密码
 快速注册

QQ登录

只需一步,快速开始

搜索
查看: 46919|回复: 10

Hadoop常见错误及解决办法汇总【时常更新】

  [复制链接]
发表于 2014-4-29 14:59:45 | 显示全部楼层 |阅读模式
本帖最后由 锋云帮主 于 2014-4-29 15:15 编辑
0 S) S8 r* N+ K4 a3 h  w; h( |. Z# H) f
收集日常开发运营过程中常见的错误及解决办法,随时保持更新,也欢迎大家补充。
6 q8 k: D+ [, R  l( `9 t  u/ o
+ r$ e7 ]8 b( h- H+ M0 \& J/ B错误一:java.io.IOException: Incompatible clusterIDs 时常出现在namenode重新格式化之后( c+ v: m( T2 `. ~' a1 c

0 [# j: x8 m3 Q8 M7 r2014-04-29 14:32:53,877 FATAL org.apache.hadoop.hdfs.server.datanode.DataNode: Initialization failed for block pool Block pool BP-1480406410-192.168.1.181-1398701121586 (storage id DS-167510828-192.168.1.191-50010-1398750515421) service to hadoop-master/192.168.1.181:9000
; a0 _' F2 S- ]* zjava.io.IOException: Incompatible clusterIDs in /data/dfs/data: namenode clusterID = CID-d1448b9e-da0f-499e-b1d4-78cb18ecdebb; datanode clusterID = CID-ff0faa40-2940-4838-b321-98272eb0dee32 U( J8 E& l8 z5 e5 h
        at org.apache.hadoop.hdfs.server.datanode.DataStorage.doTransition(DataStorage.java:391); U- ]6 l% z! \+ }/ @7 o: ~
        at org.apache.hadoop.hdfs.server.datanode.DataStorage.recoverTransitionRead(DataStorage.java:191)7 f7 P3 U$ t* ~' u7 }' G2 B
        at org.apache.hadoop.hdfs.server.datanode.DataStorage.recoverTransitionRead(DataStorage.java:219)- P3 D' u4 }! ?( ?: X  ?7 L, }
        at org.apache.hadoop.hdfs.server.datanode.DataNode.initStorage(DataNode.java:837)' w9 N7 Z0 j9 M# C
        at org.apache.hadoop.hdfs.server.datanode.DataNode.initBlockPool(DataNode.java:808)
7 j; G% K; ]# v5 Q        at org.apache.hadoop.hdfs.server.datanode.BPOfferService.verifyAndSetNamespaceInfo(BPOfferService.java:280)
0 W; [( L5 x9 a        at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.connectToNNAndHandshake(BPServiceActor.java:222)5 S5 L9 t5 X& I' E# s) \$ a7 o2 V
        at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.run(BPServiceActor.java:664)  d8 C2 e- X# _, C
        at java.lang.Thread.run(Thread.java:722)
  N9 {* m8 Q& B1 G2014-04-29 14:32:53,885 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: Ending block pool service for: Block pool BP-1480406410-192.168.1.181-1398701121586 (storage id DS-167510828-192.168.1.191-50010-1398750515421) service to hadoop-master/192.168.1.181:9000
2 y) f  K: x8 w# m* D2014-04-29 14:32:53,889 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Removed Block pool BP-1480406410-192.168.1.181-1398701121586 (storage id DS-167510828-192.168.1.191-50010-1398750515421)
5 w0 F- J7 y( O- Q2014-04-29 14:32:55,897 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: Exiting Datanode" m2 W. I: ^$ r- G2 r6 ]
- h2 Q3 Z# }' @& P

1 t, L, y* _/ X* ?原因:每次namenode format会重新创建一个namenodeId,而data目录包含了上次format时的id,namenode format清空了namenode下的数据,但是没有清空datanode下的数据,导致启动时失败,所要做的就是每次fotmat前,清空data下的所有目录.2 i% |3 G7 M8 ?

5 s# a; D0 T! x; s: {解决办法:停掉集群,删除问题节点的data目录下的所有内容。即hdfs-site.xml文件中配置的dfs.data.dir目录。重新格式化namenode。, o1 [( ?7 d' \. g  y# ?1 O
* L) N; G+ e, j# C

/ T. M! g$ A  [# E" f$ C另一个更省事的办法:先停掉集群,然后将datanode节点目录/dfs/data/current/VERSION中的修改为与namenode一致即可。& b# Y0 L& Z$ z: u* v$ h' j
学大数据 到大讲台
回复

使用道具 举报

 楼主| 发表于 2014-4-29 17:18:44 | 显示全部楼层
本帖最后由 锋云帮主 于 2014-4-29 17:28 编辑
3 Q- R$ \8 w; ^5 ^+ Z
7 o2 F' o) z5 c' G" v+ S错误二:org.apache.hadoop.yarn.exceptions.YarnException: Unauthorized request to start container
7 y) p& J' _, O( {) M& W7 f/ ^
% r! t% y$ A$ i7 ]; x14/04/29 02:45:07 INFO mapreduce.Job: Job job_1398704073313_0021 failed with state FAILED due to: Application application_1398704073313_0021 failed 2 times due to Error launching appattempt_1398704073313_0021_000002. Got exception: org.apache.hadoop.yarn.exceptions.YarnException: Unauthorized request to start container.
' N( G3 p* U, O2 u6 ~3 R% FThis token is expired. current time is 1398762692768 found 1398711306590& W; d+ R4 C% B4 Z
        at sun.reflect.GeneratedConstructorAccessor30.newInstance(Unknown Source)8 E, L+ Z- N/ v3 e! K- s8 r7 N- i3 N
        at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
9 @: K& ~& y% `* D% Y/ N- f        at java.lang.reflect.Constructor.newInstance(Constructor.java:525)
5 A7 r* u) c$ s5 F$ u! b        at org.apache.hadoop.yarn.api.records.impl.pb.SerializedExceptionPBImpl.instantiateException(SerializedExceptionPBImpl.java:152)
. u* X: ]4 r3 E5 n        at org.apache.hadoop.yarn.api.records.impl.pb.SerializedExceptionPBImpl.deSerialize(SerializedExceptionPBImpl.java:106)8 }" x: d2 Y# x9 r' n
        at org.apache.hadoop.yarn.server.resourcemanager.amlauncher.AMLauncher.launch(AMLauncher.java:122)
4 ]  d1 \! B2 s) t. g0 _        at org.apache.hadoop.yarn.server.resourcemanager.amlauncher.AMLauncher.run(AMLauncher.java:249)
+ t" r# O' O: m9 C# B        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
0 P1 w2 L# y, Q% E! E% H        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
+ Y6 h' q* m; b        at java.lang.Thread.run(Thread.java:722)5 j0 @, m9 O) l6 {
. Failing the application.* P' X( O) n* N, a
14/04/29 02:45:07 INFO mapreduce.Job: Counters: 0
7 m# p* X( w: A7 P
- u& [+ _: F+ K问题原因:namenode,datanode时间同步问题5 H3 f+ ]( G: p8 F
2 S( O" s! z0 x/ H! X- D: _
解决办法:多个datanode与namenode进行时间同步,在每台服务器执行:ntpdate time.nist.gov,确认时间同步成功。
" Z# w) K* y$ b" ~5 C  ^2 k6 t. l最好在每台服务器的 /etc/crontab 中加入一行:
  {' y! q, \9 U5 i& s+ n5 B0 2 * * * root ntpdate time.nist.gov && hwclock -w
学大数据 到大讲台
回复 支持 反对

使用道具 举报

 楼主| 发表于 2014-5-6 15:14:57 | 显示全部楼层
本帖最后由 锋云帮主 于 2014-5-6 15:55 编辑 ' b6 m' e; x! E9 T0 Q

3 G9 T8 z8 X( \错误:java.net.SocketTimeoutException: 480000 millis timeout while waiting for channel to be ready for write
+ C4 ?/ N* e0 i5 X" F1 e2014-05-06 14:28:09,386 ERROR org.apache.hadoop.hdfs.server.datanode.DataNode: hadoop-datanode1:50010ataXceiver error processing READ_BLOCK operation  src: /192.168.1.191:48854 dest: /192.168.1.191:500107 ?1 s, d* S& |) \: a* L; A
java.net.SocketTimeoutException: 480000 millis timeout while waiting for channel to be ready for write. ch : java.nio.channels.SocketChannel[connected local=/192.168.1.191:50010 remote=/192.168.1.191:48854]4 T( S$ V  D) Z( W0 ]
        at org.apache.hadoop.net.SocketIOWithTimeout.waitForIO(SocketIOWithTimeout.java:246). n/ T3 T. z- A% U
        at org.apache.hadoop.net.SocketOutputStream.waitForWritable(SocketOutputStream.java:172)( I" _" C- J$ q" e( Q+ ?4 r2 W! t/ \9 b
        at org.apache.hadoop.net.SocketOutputStream.transferToFully(SocketOutputStream.java:220)
$ h2 e& @- {3 I        at org.apache.hadoop.hdfs.server.datanode.BlockSender.sendPacket(BlockSender.java:546)
+ u) d: c5 o- `" V9 ]        at org.apache.hadoop.hdfs.server.datanode.BlockSender.sendBlock(BlockSender.java:710)
3 ?, |; P5 A* ~5 O' z5 R. x) F        at org.apache.hadoop.hdfs.server.datanode.DataXceiver.readBlock(DataXceiver.java:340)
/ k! j+ D  l' i. H5 H9 M) @. L4 p: Q        at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opReadBlock(Receiver.java:101)# z. S2 G5 C$ G* ^4 D) [! R
        at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:65)
3 s7 ?' k0 x3 B! R& _+ o  K4 J        at org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:221)
$ s- k# I- R6 W8 x: u        at java.lang.Thread.run(Thread.java:722)1 M6 C& P0 I, w, p% V( Y: [- g

! _$ v: }1 M# ^1 a# o7 A原因:IO超时; A& U* Q# V2 C9 L) L
" J- F6 M; z, {
解决方法:
3 E8 k( {/ C! o( T8 Z! y. _. t修改hadoop配置文件hdfs-site.xml,增加dfs.datanode.socket.write.timeout和dfs.socket.timeout两个属性的设置。
0 s' q# W$ y# C# n    <property>. S+ m7 f1 S6 a8 Z0 v0 F
        <name>dfs.datanode.socket.write.timeout</name>
# j6 J" `# `& k4 V. @3 d; x9 ^" ?        <value>6000000</value>
6 p4 ?: e+ ?  U    </property>; r  w; ?$ N8 [" b( O8 _
( V+ L: A9 ]5 p0 k
    <property>
1 F) g8 Z# c, Y& l/ ?/ u: a: {        <name>dfs.socket.timeout</name>; n" o( r7 ]; Q* c6 l7 U
        <value>6000000</value>
! L3 v/ U/ W9 k; y$ @* {& r+ _2 T    </property>* Z, ^- `( ]+ `  S% r& b8 B" E

% I8 h0 a8 S1 P$ j! I注意: 超时上限值以毫秒为单位。0表示无限制。
8 ?7 r) w! ]- @5 Y( Y9 g4 O% t. U  z5 N

点评

设置成0还会报这个错误:java.net.SocketTimeoutException: 60000 millis timeout while waiting for channel to be ready for write. ch : java.nio.channels.SocketChannel  发表于 2014-5-9 16:50
学大数据 到大讲台
回复 支持 反对

使用道具 举报

 楼主| 发表于 2014-5-6 15:43:15 | 显示全部楼层
错误:DataXceiver error processing WRITE_BLOCK operation
* T! M, ?' _* d) t1 h7 }& y2014-05-06 15:21:30,378 ERROR org.apache.hadoop.hdfs.server.datanode.DataNode: hadoop-datanode1:50010ataXceiver error processing WRITE_BLOCK operation  src: /192.168.1.193:34147 dest: /192.168.1.191:500105 s* }( J! C% B3 b) U- w
java.io.IOException: Premature EOF from inputStream
: Z& b; D& J4 \% _7 {% @        at org.apache.hadoop.io.IOUtils.readFully(IOUtils.java:194)8 o9 H- M% N+ T* U
        at org.apache.hadoop.hdfs.protocol.datatransfer.PacketReceiver.doReadFully(PacketReceiver.java:213)
( J8 g# n: F2 z0 x2 H        at org.apache.hadoop.hdfs.protocol.datatransfer.PacketReceiver.doRead(PacketReceiver.java:134). i# `" G/ H6 p; K7 R( w: {
        at org.apache.hadoop.hdfs.protocol.datatransfer.PacketReceiver.receiveNextPacket(PacketReceiver.java:109)) L- h5 L* C# F
        at org.apache.hadoop.hdfs.server.datanode.BlockReceiver.receivePacket(BlockReceiver.java:435)! w& A" m" i! l$ y+ @
        at org.apache.hadoop.hdfs.server.datanode.BlockReceiver.receiveBlock(BlockReceiver.java:693), Y* q5 ~+ ?6 w% s5 I. H9 M3 R, d, g
        at org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(DataXceiver.java:569)
" c' `& M$ H& O; B# L/ a        at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opWriteBlock(Receiver.java:115)
( N- V( q( i' A+ J4 l" ^+ @6 [        at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:68)
' v* _6 H  h* j2 _! p7 h9 q! c9 f: n/ {        at org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:221)% u# k# i, f" @0 W4 T2 F3 Y* O
        at java.lang.Thread.run(Thread.java:722)9 f& J) ?, |* u" M
7 Y5 P& m* Q) ]6 X
原因:文件操作超租期,实际上就是data stream操作过程中文件被删掉了。! H3 W" J; ?: W% x! }( ^8 a
8 c( a  ^1 ?  B% ~- p+ f) v  l
解决办法:
. i) Y& h' }/ H. _修改hdfs-site.xml (针对2.x版本,1.x版本属性名应该是:dfs.datanode.max.xcievers):
0 t# K# Y% t; A; S6 t<property>
  _: N7 J2 n* J8 ~! x        <name>dfs.datanode.max.transfer.threads</name>
) G  l6 ^0 [8 C% C        <value>8192</value> ! z8 u* `( c8 r3 o1 f
</property>7 O- m) G8 a1 g0 M
拷贝到各datanode节点并重启datanode即可; R2 d$ u3 ?4 ^8 C8 M' {
学大数据 到大讲台
回复 支持 反对

使用道具 举报

 楼主| 发表于 2014-5-7 14:41:35 | 显示全部楼层
错误:java.io.IOException: Failed to replace a bad datanode on the existing pipeline due to no more good datanodes being available to try.9 c  k7 f  b2 X- d, ~! k
2014-05-07 12:21:41,820 WARN [Thread-115] org.apache.hadoop.mapreduce.v2.app.MRAppMaster: Graceful stop failed
# a. ^7 q; ], T4 W1 corg.apache.hadoop.yarn.exceptions.YarnRuntimeException: java.io.IOException: Failed to replace a bad datanode on the existing pipeline due to no more good datanodes being available to try. (Nodes: current=[192.168.1.191:50010, 192.168.1.192:50010], original=[192.168.1.191:50010, 192.168.1.192:50010]). The current failed datanode replacement policy is DEFAULT, and a client may configure this via 'dfs.client.block.write.replace-datanode-on-failure.policy' in its configuration.
/ v! d% D4 l; Z( `  g/ e& ?        at org.apache.hadoop.mapreduce.jobhistory.JobHistoryEventHandler.handleEvent(JobHistoryEventHandler.java:514)
6 k9 r2 p6 w* U0 e; q, M        at org.apache.hadoop.mapreduce.jobhistory.JobHistoryEventHandler.serviceStop(JobHistoryEventHandler.java:332)6 |1 D6 O) r8 p, F5 E
        at org.apache.hadoop.service.AbstractService.stop(AbstractService.java:221)1 V6 b! |  S8 s  f' k% w$ T
        at org.apache.hadoop.service.ServiceOperations.stop(ServiceOperations.java:52)
6 T: y* t+ ?: g1 `% u2 N        at org.apache.hadoop.service.ServiceOperations.stopQuietly(ServiceOperations.java:80)1 X) M/ s6 P  Z1 u3 g
        at org.apache.hadoop.service.CompositeService.stop(CompositeService.java:159)* b) _& o4 p- N: d
        at org.apache.hadoop.service.CompositeService.serviceStop(CompositeService.java:132)+ K6 a* ]0 c$ v/ l
        at org.apache.hadoop.service.AbstractService.stop(AbstractService.java:221)% W; [  J0 I7 k, `) n' e
        at org.apache.hadoop.mapreduce.v2.app.MRAppMaster.shutDownJob(MRAppMaster.java:548)
- _9 y7 C' U2 c( b4 o1 b        at org.apache.hadoop.mapreduce.v2.app.MRAppMaster$JobFinishEventHandler$1.run(MRAppMaster.java:599)
% Q, d8 |' S5 @- K: o9 eCaused by: java.io.IOException: Failed to replace a bad datanode on the existing pipeline due to no more good datanodes being available to try. (Nodes: current=[192.168.1.191:50010, 192.168.1.192:50010], original=[192.168.1.191:50010, 192.168.1.192:50010]). The current failed datanode replacement policy is DEFAULT, and a client may configure this via 'dfs.client.block.write.replace-datanode-on-failure.policy' in its configuration.
1 h8 Q3 H8 p9 \+ x1 u        at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.findNewDatanode(DFSOutputStream.java:860)
. k% C7 X$ B- w7 h( o        at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.addDatanode2ExistingPipeline(DFSOutputStream.java:925)
3 z) b: _! N3 [$ |. G        at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.setupPipelineForAppendOrRecovery(DFSOutputStream.java:1031)' j7 a1 n6 k7 T8 N3 y1 f' g
        at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.processDatanodeError(DFSOutputStream.java:823)- z# G. O. z0 v% u  v4 ^( N
        at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:475)8 v# w  W# ^9 C# Z& k5 j7 j

2 }8 \2 t$ z$ d1 b& P& Y* B, V原因:无法写入;我的环境中有3个datanode,备份数量设置的是3。在写操作时,它会在pipeline中写3个机器。默认replace-datanode-on-failure.policy是DEFAULT,如果系统中的datanode大于等于3,它会找另外一个datanode来拷贝。目前机器只有3台,因此只要一台datanode出问题,就一直无法写入成功。' L/ M3 h! }+ Q0 @8 v
) I+ d' n6 r! }1 [
解决办法:修改hdfs-site.xml文件,添加或者修改如下两项:  C, ?! b0 J$ \2 J- o7 w
<property>$ `7 T6 F7 `0 V, R1 g4 X  w& I- m
  <name>dfs.client.block.write.replace-datanode-on-failure.enable</name>
8 p$ P* @: D8 f* R6 W! g3 v  <value>true</value>
  v4 U  x" a1 W0 Q- u) K' ]</property>2 @  t7 }/ O9 Q; `( E6 ~. _
<property>/ L4 Q8 t/ ^1 X' j
  <name>dfs.client.block.write.replace-datanode-on-failure.policy</name>
& l' J/ r0 y; Y' q5 n* @. t- N  <value>NEVER</value>
/ Y* q, J" S- _+ d7 ~' r</property>
' Y  x$ D+ }' y) l
! Q# }' r, G+ m5 G5 `3 e& x对于dfs.client.block.write.replace-datanode-on-failure.enable,客户端在写失败的时候,是否使用更换策略,默认是true没有问题。$ I8 e6 I0 r$ _$ ]
对于,dfs.client.block.write.replace-datanode-on-failure.policy,default在3个或以上备份的时候,是会尝试更换结点尝试写入datanode。而在两个备份的时候,不更换datanode,直接开始写。对于3个datanode的集群,只要一个节点没响应写入就会出问题,所以可以关掉。; H! l- l1 S3 I& i: u  \- J
学大数据 到大讲台
回复 支持 反对

使用道具 举报

 楼主| 发表于 2014-5-8 12:36:37 | 显示全部楼层
本帖最后由 锋云帮主 于 2014-5-8 14:08 编辑
0 q- Y' n! f$ u9 \. [; o/ ]) ?3 z8 E! T; K: U; k
错误:org.apache.hadoop.util.DiskChecker$DiskErrorException: Could not find any valid local directory for 1 e2 m; o, l8 \- i! n6 K: x) ^5 X
14/05/08 18:24:59 INFO mapreduce.Job: Task Id : attempt_1399539856880_0016_m_000029_2, Status : FAILED
. {& {$ b2 j- e" _Error: org.apache.hadoop.util.DiskChecker$DiskErrorException: Could not find any valid local directory for attempt_1399539856880_0016_m_000029_2_spill_0.out( l6 [/ O* U: C1 V5 e
        at org.apache.hadoop.fs.LocalDirAllocator$AllocatorPerContext.getLocalPathForWrite(LocalDirAllocator.java:398)
6 S3 W3 Y: l. Q) Q2 `, W        at org.apache.hadoop.fs.LocalDirAllocator.getLocalPathForWrite(LocalDirAllocator.java:150)- c) L- y- J1 w" v4 q
        at org.apache.hadoop.fs.LocalDirAllocator.getLocalPathForWrite(LocalDirAllocator.java:131)2 b2 R+ Q9 ?# b( j# s9 G
        at org.apache.hadoop.mapred.YarnOutputFiles.getSpillFileForWrite(YarnOutputFiles.java:159)2 |4 H) y5 y, E$ `
        at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.sortAndSpill(MapTask.java:1573)
. P2 h2 A, y: d/ {) M$ q        at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.flush(MapTask.java:1467)
' t+ K  h* Z' @+ H: k5 Z0 O, y; T0 |$ m7 [        at org.apache.hadoop.mapred.MapTask$NewOutputCollector.close(MapTask.java:699)1 _0 z0 i. h0 U' `  \( K
        at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:769)/ {. D# c" N. r: c5 g, C
        at org.apache.hadoop.mapred.MapTask.run(MapTask.java:339)
8 {, W1 A. y, n( y0 V        at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:162)  u. C$ Y; M& R2 F! h+ Z5 l
        at java.security.AccessController.doPrivileged(Native Method)
/ f- Y6 `# u5 E' |& m* n; A        at javax.security.auth.Subject.doAs(Subject.java:415)
  k+ ^6 E3 i* r5 c        at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1491)) U- o4 Y9 {( v8 W- C
        at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:157)9 X% f0 n% z# k! `1 g
& Z; `: z: x! Z- U# H
Container killed by the ApplicationMaster.4 r( W  Q9 x1 i. y, b* s

2 O4 z% i  J+ |1 T- t2 W原因:两种可能,hadoop.tmp.dir或者data目录存储空间不足。- v' g: s9 Y: z

  N& ~9 U; x+ ~$ B7 q  \, w解决办法:看了一下我的dfs状态,data使用率不到40%,所以推测是hadoop.tmp.dir空间不足,导致无法创建Jog临时文件。查看core-site.xml发现没有配置hadoop.tmp.dir,因此使用的是默认的/tmp目录,在这目录一旦服务器重启数据就会丢失,因此需要修改。添加:- z' V' K/ W! o
<property>, R/ K* \4 g% K* ?/ S* u2 K- @
<name>hadoop.tmp.dir</dir>7 I- K3 r% `( t: E% g: l" J
<value>/data/tmp</value>7 O+ \" m8 B2 A; j. Q% S
</property>$ ]. O3 E, w2 |' p; f
然后重新格式化:hadoop namenode -format1 G9 H+ C( G3 i5 i! F% Y) f
重启。. p6 I+ A# P- f) ^) W8 m

- U& ?" I; {$ z1 @& e9 [
1 Z$ R7 ]2 N6 k6 v4 p) l* U" \/ k6 [
学大数据 到大讲台
回复 支持 反对

使用道具 举报

 楼主| 发表于 2014-6-19 10:27:52 | 显示全部楼层
2014-06-19 10:00:32,181 INFO [org.apache.hadoop.mapred.MapTask] - Ignoring exception during close for org.apache.hadoop.mapred.MapTask$NewOutputCollector@17bda0f25 s# S( J5 v8 _
java.io.IOException: Spill failed
9 _$ C4 j$ [- t& N& s        at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.checkSpillException(MapTask.java:1540)
! }& `7 y, N9 w        at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.flush(MapTask.java:1447)
5 X; _- _; x1 X: J  }+ m% B        at org.apache.hadoop.mapred.MapTask$NewOutputCollector.close(MapTask.java:699)
( p; P( U( }  z        at org.apache.hadoop.mapred.MapTask.closeQuietly(MapTask.java:1997)
5 {  @' v$ k6 x1 d$ e' I9 _        at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:773)
/ K  f) v) j! L! V3 H! u) Y7 D        at org.apache.hadoop.mapred.MapTask.run(MapTask.java:339)3 w; j; F; X; _4 H6 ?, C$ B  P
        at org.apache.hadoop.mapred.LocalJobRunner$Job$MapTaskRunnable.run(LocalJobRunner.java:235)$ ?. Q* z' {* R5 R% k$ f
        at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)) _) R4 g8 L! |, Q  G
        at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
* Y3 b1 B0 m( C% }        at java.util.concurrent.FutureTask.run(FutureTask.java:166): z% `6 W5 c' y8 {3 k; |
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)6 K- {- z9 C+ }( E, [1 g' Z% W; G
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)& C% w' o/ \& l  }  m
        at java.lang.Thread.run(Thread.java:722)
" [& I0 }% [6 x! l5 PCaused by: org.apache.hadoop.util.DiskChecker$DiskErrorException: Could not find any valid local directory for output/spill0.out" K5 E) S( Q3 ?: P& o- K8 I
        at org.apache.hadoop.fs.LocalDirAllocator$AllocatorPerContext.getLocalPathForWrite(LocalDirAllocator.java:398)
) N. |$ n7 l5 `0 U. _8 R' b( R: r        at org.apache.hadoop.fs.LocalDirAllocator.getLocalPathForWrite(LocalDirAllocator.java:150)
" i  M8 d: W* }- X! x9 U9 B3 G1 A        at org.apache.hadoop.fs.LocalDirAllocator.getLocalPathForWrite(LocalDirAllocator.java:131)
" o8 \. e- A! G# `- _- H        at org.apache.hadoop.mapred.MROutputFiles.getSpillFileForWrite(MROutputFiles.java:146)
7 n9 P! J, L7 y/ z0 q        at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.sortAndSpill(MapTask.java:1573)
1 B0 x% B" x6 C# e        at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.access$900(MapTask.java:852)  M, W( p9 l2 |, _; ?
        at org.apache.hadoop.mapred.MapTask$MapOutputBuffer$SpillThread.run(MapTask.java:1510)3 N4 }: K' [4 {
/ p, S4 d1 {, s  H0 ]
3 U2 \( k$ ]6 _$ W. R
错误原因:本地磁盘空间不足非hdfs (我是在myeclipse中调试程序,本地tmp目录占满)/ G0 o% Q) x+ `
解决办法:清理、增加空间
! r% z8 }1 y: v! ^! E* S" y
9 K; \# w! E, G1 D2 A1 C0 I# I
; M  O* Q8 ?. \3 g* Q) b# G/ p1 V
学大数据 到大讲台
回复 支持 反对

使用道具 举报

 楼主| 发表于 2014-7-4 08:42:16 | 显示全部楼层
2014-06-23 10:21:01,479 INFO [IPC Server handler 3 on 45207] org.apache.hadoop.mapred.TaskAttemptListenerImpl: Progress of TaskAttempt attempt_1403488126955_0002_m_000000_0 is : 0.308017162014-06-23 10:21:01,512 FATAL [IPC Server handler 2 on 45207] org.apache.hadoop.mapred.TaskAttemptListenerImpl: Task: attempt_1403488126955_0002_m_000000_0 - exited : java.io.IOException: Spill failed        at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.checkSpillException(MapTask.java:1540)        at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.collect(MapTask.java:1063)        at org.apache.hadoop.mapred.MapTask$NewOutputCollector.write(MapTask.java:691)        at org.apache.hadoop.mapreduce.task.TaskInputOutputContextImpl.write(TaskInputOutputContextImpl.java:89)        at org.apache.hadoop.mapreduce.lib.map.WrappedMapper$Context.write(WrappedMapper.java:112)        at com.mediadc.hadoop.MediaIndex$SecondMapper.map(MediaIndex.java:180)        at com.mediadc.hadoop.MediaIndex$SecondMapper.map(MediaIndex.java:1)        at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:145)        at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:763)        at org.apache.hadoop.mapred.MapTask.run(MapTask.java:339)        at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:162)        at java.security.AccessController.doPrivileged(Native Method)        at javax.security.auth.Subject.doAs(Subject.java:415)        at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1491)        at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:157)Caused by: org.apache.hadoop.util.DiskChecker$DiskErrorException: Could not find any valid local directory for attempt_1403488126955_0002_m_000000_0_spill_53.out        at org.apache.hadoop.fs.LocalDirAllocator$AllocatorPerContext.getLocalPathForWrite(LocalDirAllocator.java:398)        at org.apache.hadoop.fs.LocalDirAllocator.getLocalPathForWrite(LocalDirAllocator.java:150)        at org.apache.hadoop.fs.LocalDirAllocator.getLocalPathForWrite(LocalDirAllocator.java:131)        at org.apache.hadoop.mapred.YarnOutputFiles.getSpillFileForWrite(YarnOutputFiles.java:159)        at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.sortAndSpill(MapTask.java:1573)        at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.access$900(MapTask.java:852)        at org.apache.hadoop.mapred.MapTask$MapOutputBuffer$SpillThread.run(MapTask.java:1510)2014-06-23 10:21:01,513 INFO [IPC Server handler 2 on 45207] org.apache.hadoop.mapred.TaskAttemptListenerImpl: Diagnostics report from attempt_1403488126955_0002_m_000000_0: Error: java.io.IOException: Spill failed        at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.checkSpillException(MapTask.java:1540)        at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.collect(MapTask.java:1063)        at org.apache.hadoop.mapred.MapTask$NewOutputCollector.write(MapTask.java:691)        at org.apache.hadoop.mapreduce.task.TaskInputOutputContextImpl.write(TaskInputOutputContextImpl.java:89)        at org.apache.hadoop.mapreduce.lib.map.WrappedMapper$Context.write(WrappedMapper.java:112)        at com.mediadc.hadoop.MediaIndex$SecondMapper.map(MediaIndex.java:180)        at com.mediadc.hadoop.MediaIndex$SecondMapper.map(MediaIndex.java:1)        at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:145)        at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:763)        at org.apache.hadoop.mapred.MapTask.run(MapTask.java:339)        at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:162)        at java.security.AccessController.doPrivileged(Native Method)        at javax.security.auth.Subject.doAs(Subject.java:415)        at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1491)        at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:157)Caused by: org.apache.hadoop.util.DiskChecker$DiskErrorException: Could not find any valid local directory for attempt_1403488126955_0002_m_000000_0_spill_53.out        at org.apache.hadoop.fs.LocalDirAllocator$AllocatorPerContext.getLocalPathForWrite(LocalDirAllocator.java:398)        at org.apache.hadoop.fs.LocalDirAllocator.getLocalPathForWrite(LocalDirAllocator.java:150)        at org.apache.hadoop.fs.LocalDirAllocator.getLocalPathForWrite(LocalDirAllocator.java:131)        at org.apache.hadoop.mapred.YarnOutputFiles.getSpillFileForWrite(YarnOutputFiles.java:159)        at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.sortAndSpill(MapTask.java:1573)        at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.access$900(MapTask.java:852)        at org.apache.hadoop.mapred.MapTask$MapOutputBuffer$SpillThread.run(MapTask.java:1510)2014-06-23 10:21:01,514 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: Diagnostics report from attempt_1403488126955_0002_m_000000_0: Error: java.io.IOException: Spill failed        at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.checkSpillException(MapTask.java:1540)        at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.collect(MapTask.java:1063)        at org.apache.hadoop.mapred.MapTask$NewOutputCollector.write(MapTask.java:691)        at org.apache.hadoop.mapreduce.task.TaskInputOutputContextImpl.write(TaskInputOutputContextImpl.java:89)        at org.apache.hadoop.mapreduce.lib.map.WrappedMapper$Context.write(WrappedMapper.java:112)        at com.mediadc.hadoop.MediaIndex$SecondMapper.map(MediaIndex.java:180)        at com.mediadc.hadoop.MediaIndex$SecondMapper.map(MediaIndex.java:1)        at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:145)        at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:763)        at org.apache.hadoop.mapred.MapTask.run(MapTask.java:339)        at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:162)        at java.security.AccessController.doPrivileged(Native Method)        at javax.security.auth.Subject.doAs(Subject.java:415)        at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1491)        at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:157)Caused by: org.apache.hadoop.util.DiskChecker$DiskErrorException: Could not find any valid local directory for attempt_1403488126955_0002_m_000000_0_spill_53.out        at org.apache.hadoop.fs.LocalDirAllocator$AllocatorPerContext.getLocalPathForWrite(LocalDirAllocator.java:398)        at org.apache.hadoop.fs.LocalDirAllocator.getLocalPathForWrite(LocalDirAllocator.java:150)        at org.apache.hadoop.fs.LocalDirAllocator.getLocalPathForWrite(LocalDirAllocator.java:131)        at org.apache.hadoop.mapred.YarnOutputFiles.getSpillFileForWrite(YarnOutputFiles.java:159)        at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.sortAndSpill(MapTask.java:1573)        at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.access$900(MapTask.java:852)        at org.apache.hadoop.mapred.MapTask$MapOutputBuffer$SpillThread.run(MapTask.java:1510)2014-06-23 10:21:01,516 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: attempt_1403488126955_0002_m_000000_0 TaskAttempt Transitioned from RUNNING to FAIL_CONTAINER_CLEANUP7 E7 [1 h. b) \% [. U
错误很明显,磁盘空间不足,但郁闷的是,进各节点查看,磁盘空间使用不到40%,还有很多空间。0 s+ u7 a% |2 {7 T. B3 Q* Z( Y

8 E5 x1 ]* X: V郁闷很长时间才发现,原来有个map任务运行时输出比较多,运行出错前,硬盘空间一路飙升,直到100%不够时报错。随后任务执行失败,释放空间,把任务分配给其它节点。正因为空间被释放,因此虽然报空间不足的错误,但查看当时磁盘还有很多剩余空间。8 Q0 G; D% W0 C7 ?# S0 g
: U! p0 \& d1 o4 x
这个问题告诉我们,运行过程中的监控很重要。. R' w: j- _- s6 F5 T3 B9 P
学大数据 到大讲台
回复 支持 反对

使用道具 举报

发表于 2015-4-1 15:50:38 | 显示全部楼层
兄弟,你最后一个问题是怎么解决的,有解决方案吗?
学大数据 到大讲台
回复 支持 反对

使用道具 举报

 楼主| 发表于 2015-4-8 13:39:15 | 显示全部楼层
jerryliu_306 发表于 2015-4-1 15:50   B+ f; K3 c, P0 [: j* [
兄弟,你最后一个问题是怎么解决的,有解决方案吗?
- s: O" C0 r" A2 h% K, z) Y
不好意思,才看到。
! g# p( \5 b0 l' ~我说明了,问题就是硬盘空间不足,中间输出比较多。但执行失败后这些输出会被自动清掉,所以检查又发现空间很多,产生误判
学大数据 到大讲台
回复 支持 反对

使用道具 举报

您需要登录后才可以回帖 登录 | 快速注册

本版积分规则

在线咨询|关于锋云|联系我们|手机版|投诉建议|版权声明|云计算|Hadoop|大数据|锋云网 ( 京ICP备13050990号 )

这是云计算时代的精英部落,这是中国最大的云计算社区 —— 锋云网(sharpcloud.cn)!

本站CDN/存储服务由本站CDN/存储服务由又拍云提供提供

Powered by Discuz! X3.2

© 2001-2015 Sharpcloud.cn

 

锋云网官方QQ群

中国云计算精英群(ID:64924638)中国云计算精英群      Hadoop技术交流群②(ID:25728812)Hadoop技术交流群②      Spark技术交流群(ID:413581066)Spark技术交流群

Hadoop技术交流群(ID:113156288,2000人群已满)

返回顶部