锋云网

 找回密码
 快速注册

QQ登录

只需一步,快速开始

大数据培训资料领取
搜索
查看: 63914|回复: 10

Hadoop常见错误及解决办法汇总【时常更新】

  [复制链接]
发表于 2014-4-29 14:59:45 | 显示全部楼层 |阅读模式
本帖最后由 锋云帮主 于 2014-4-29 15:15 编辑
7 I  t) n( c7 z! D& f. B+ f, e; Z4 O. K8 N3 O: P( Z
收集日常开发运营过程中常见的错误及解决办法,随时保持更新,也欢迎大家补充。
" t; g+ x8 m6 w* N9 }" K" A  o) y5 O/ z& H7 ?, s0 M; A, Y
错误一:java.io.IOException: Incompatible clusterIDs 时常出现在namenode重新格式化之后  D; d8 `! V( |5 S' D
$ R% s: J' K0 t0 G% ?
2014-04-29 14:32:53,877 FATAL org.apache.hadoop.hdfs.server.datanode.DataNode: Initialization failed for block pool Block pool BP-1480406410-192.168.1.181-1398701121586 (storage id DS-167510828-192.168.1.191-50010-1398750515421) service to hadoop-master/192.168.1.181:9000
% c; a* k* t  J& K1 o, [6 |java.io.IOException: Incompatible clusterIDs in /data/dfs/data: namenode clusterID = CID-d1448b9e-da0f-499e-b1d4-78cb18ecdebb; datanode clusterID = CID-ff0faa40-2940-4838-b321-98272eb0dee3
1 b9 C* v3 O% p% v6 v        at org.apache.hadoop.hdfs.server.datanode.DataStorage.doTransition(DataStorage.java:391)$ B: W, w9 t0 J' Y/ @, z
        at org.apache.hadoop.hdfs.server.datanode.DataStorage.recoverTransitionRead(DataStorage.java:191)5 o9 e' ~4 Q1 s4 v1 _7 l8 E- J% b+ q
        at org.apache.hadoop.hdfs.server.datanode.DataStorage.recoverTransitionRead(DataStorage.java:219)8 }' X8 q% f/ v. T! o; P
        at org.apache.hadoop.hdfs.server.datanode.DataNode.initStorage(DataNode.java:837): z$ ]! T4 ~9 l6 J+ l
        at org.apache.hadoop.hdfs.server.datanode.DataNode.initBlockPool(DataNode.java:808)6 Z" N( }! G0 Q0 a* b
        at org.apache.hadoop.hdfs.server.datanode.BPOfferService.verifyAndSetNamespaceInfo(BPOfferService.java:280)* Z% a2 S" t% q( T& g
        at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.connectToNNAndHandshake(BPServiceActor.java:222)  |3 \: `- v/ t+ ^, w% f
        at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.run(BPServiceActor.java:664)' X- k+ H+ z: B8 s, p4 B
        at java.lang.Thread.run(Thread.java:722)
. ~' N" J5 w- j( @; P8 O2014-04-29 14:32:53,885 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: Ending block pool service for: Block pool BP-1480406410-192.168.1.181-1398701121586 (storage id DS-167510828-192.168.1.191-50010-1398750515421) service to hadoop-master/192.168.1.181:90003 t5 x* r% W  ~! A
2014-04-29 14:32:53,889 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Removed Block pool BP-1480406410-192.168.1.181-1398701121586 (storage id DS-167510828-192.168.1.191-50010-1398750515421)
9 A% @8 _! j7 a+ R3 R2014-04-29 14:32:55,897 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: Exiting Datanode# V- Q% T9 U, c, U
) ^8 J1 g$ w1 z0 |  e* u

8 A  M: ?. O5 F9 n" w" f  d% }; A原因:每次namenode format会重新创建一个namenodeId,而data目录包含了上次format时的id,namenode format清空了namenode下的数据,但是没有清空datanode下的数据,导致启动时失败,所要做的就是每次fotmat前,清空data下的所有目录.
+ F# L$ A+ A4 N$ C  v( M3 I( U- X+ C4 ^: W' K7 U! }
解决办法:停掉集群,删除问题节点的data目录下的所有内容。即hdfs-site.xml文件中配置的dfs.data.dir目录。重新格式化namenode。
( t2 O# e. l. U9 Q1 h* ]- s6 G6 b& a
2 L5 C8 P) W& G; y, \; O/ P1 U; ~' V
另一个更省事的办法:先停掉集群,然后将datanode节点目录/dfs/data/current/VERSION中的修改为与namenode一致即可。( N! A/ }5 r$ {
学大数据 到大讲台
回复

使用道具 举报

 楼主| 发表于 2014-4-29 17:18:44 | 显示全部楼层
本帖最后由 锋云帮主 于 2014-4-29 17:28 编辑
; j3 t3 x& z' B& T+ N0 e! B% W3 ?/ r" d8 w/ V5 Q+ N. d& g
错误二:org.apache.hadoop.yarn.exceptions.YarnException: Unauthorized request to start container1 e  R' ]! E8 }/ ~" C# U  Z

; q  a) l2 Y: e  m5 K; c14/04/29 02:45:07 INFO mapreduce.Job: Job job_1398704073313_0021 failed with state FAILED due to: Application application_1398704073313_0021 failed 2 times due to Error launching appattempt_1398704073313_0021_000002. Got exception: org.apache.hadoop.yarn.exceptions.YarnException: Unauthorized request to start container.
; h3 B2 D! s- {/ \8 i* B- }' J  @This token is expired. current time is 1398762692768 found 1398711306590; y$ b+ t& K& v9 C% Y5 j$ d, W5 I7 v9 l
        at sun.reflect.GeneratedConstructorAccessor30.newInstance(Unknown Source)
) T/ S' }- F+ A2 a7 K  X* R        at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
. @3 o0 i$ K) E! t        at java.lang.reflect.Constructor.newInstance(Constructor.java:525)$ U/ a$ q/ Z& B$ D8 \. D
        at org.apache.hadoop.yarn.api.records.impl.pb.SerializedExceptionPBImpl.instantiateException(SerializedExceptionPBImpl.java:152)! \& {4 _! C. D& R3 j6 R
        at org.apache.hadoop.yarn.api.records.impl.pb.SerializedExceptionPBImpl.deSerialize(SerializedExceptionPBImpl.java:106): \: M' L) ~' W) p
        at org.apache.hadoop.yarn.server.resourcemanager.amlauncher.AMLauncher.launch(AMLauncher.java:122)
6 i# I0 m1 S. E4 n' w( Z; C        at org.apache.hadoop.yarn.server.resourcemanager.amlauncher.AMLauncher.run(AMLauncher.java:249)4 j) c# d5 [5 j* ]% _% a
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
' V. ]! u/ K- @. N/ A        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)+ x( O9 [6 ]- N0 c  s+ g2 s& a
        at java.lang.Thread.run(Thread.java:722)/ b/ J7 E5 c7 }+ x, N4 i9 f7 [
. Failing the application.
" R# [' i$ X$ p* u2 \/ u14/04/29 02:45:07 INFO mapreduce.Job: Counters: 0/ ]5 m& z! }/ e

, o. K. A- ?# C% v6 O) E4 A问题原因:namenode,datanode时间同步问题
* N+ i0 `. V: k0 t& z5 r3 h
( v' P9 q- K: `, L解决办法:多个datanode与namenode进行时间同步,在每台服务器执行:ntpdate time.nist.gov,确认时间同步成功。+ p7 E1 V9 Y% l5 i" \, v
最好在每台服务器的 /etc/crontab 中加入一行:* j$ N: H$ R6 t8 [% Z
0 2 * * * root ntpdate time.nist.gov && hwclock -w
学大数据 到大讲台
回复 支持 反对

使用道具 举报

 楼主| 发表于 2014-5-6 15:14:57 | 显示全部楼层
本帖最后由 锋云帮主 于 2014-5-6 15:55 编辑 $ c5 z6 x) L* D0 g, d# E& w
) \+ Q9 ]5 y% I( `' D* B2 o
错误:java.net.SocketTimeoutException: 480000 millis timeout while waiting for channel to be ready for write
; Y" V* ]1 {; X- k8 j- R4 q2014-05-06 14:28:09,386 ERROR org.apache.hadoop.hdfs.server.datanode.DataNode: hadoop-datanode1:50010ataXceiver error processing READ_BLOCK operation  src: /192.168.1.191:48854 dest: /192.168.1.191:50010
% ~. Q0 v/ m" @) r7 K; Sjava.net.SocketTimeoutException: 480000 millis timeout while waiting for channel to be ready for write. ch : java.nio.channels.SocketChannel[connected local=/192.168.1.191:50010 remote=/192.168.1.191:48854]$ w- [" N* Y, ^% {6 _
        at org.apache.hadoop.net.SocketIOWithTimeout.waitForIO(SocketIOWithTimeout.java:246)
$ S7 J( G6 W. P4 B+ p        at org.apache.hadoop.net.SocketOutputStream.waitForWritable(SocketOutputStream.java:172)$ {$ _5 {! W: m8 g: W  u
        at org.apache.hadoop.net.SocketOutputStream.transferToFully(SocketOutputStream.java:220)
' K2 R% I+ @- @        at org.apache.hadoop.hdfs.server.datanode.BlockSender.sendPacket(BlockSender.java:546)
4 d6 Z; ^* C$ x+ D% y, E/ Z        at org.apache.hadoop.hdfs.server.datanode.BlockSender.sendBlock(BlockSender.java:710)
* K4 i/ a5 U8 u/ I5 U. [        at org.apache.hadoop.hdfs.server.datanode.DataXceiver.readBlock(DataXceiver.java:340)3 p# A- x% m9 a7 L0 ?% m# @9 ^& d2 t
        at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opReadBlock(Receiver.java:101)
1 u$ `& S! M" o3 a4 {* S        at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:65)0 D1 G# Q7 s3 [* _5 F+ }; X
        at org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:221)7 z$ k$ n, c/ e9 S" t- M! Q
        at java.lang.Thread.run(Thread.java:722)
, R) C* m( I; j- c; U( @' {1 y( E) |  n
原因:IO超时8 w$ G" A. h8 n

6 P( W0 f6 \" \解决方法:$ f% [7 d. h: B& R
修改hadoop配置文件hdfs-site.xml,增加dfs.datanode.socket.write.timeout和dfs.socket.timeout两个属性的设置。8 I/ o' o+ m; Y' C
    <property>
5 H9 U0 I$ ]/ N& w7 M8 l# @& K" R; s: q        <name>dfs.datanode.socket.write.timeout</name>' G. V5 Y  j+ G$ K8 P! X5 Y; j% r
        <value>6000000</value>6 A; `3 ^4 j9 C; E9 d9 q# g
    </property>
3 n% V+ ~& G4 r3 o; W
3 I5 b3 F! }9 Y+ m6 ]    <property>
2 I4 a7 F4 ]7 F6 l& d8 W8 z        <name>dfs.socket.timeout</name>* x. R+ p: {( G0 K% |- f  g
        <value>6000000</value>5 O% r0 T: _( I) J% U2 Z8 |: L* m4 Z
    </property>( D) }1 O5 _8 [9 k! c. T! D0 _1 t
$ _! ?) w( t) p8 ~3 e3 v/ d
注意: 超时上限值以毫秒为单位。0表示无限制。; R* N" |- h/ B) \/ }

' g" \5 J' F& ^& H2 e; i6 E

点评

设置成0还会报这个错误:java.net.SocketTimeoutException: 60000 millis timeout while waiting for channel to be ready for write. ch : java.nio.channels.SocketChannel  发表于 2014-5-9 16:50
学大数据 到大讲台
回复 支持 反对

使用道具 举报

 楼主| 发表于 2014-5-6 15:43:15 | 显示全部楼层
错误:DataXceiver error processing WRITE_BLOCK operation# f* [) r/ g# c
2014-05-06 15:21:30,378 ERROR org.apache.hadoop.hdfs.server.datanode.DataNode: hadoop-datanode1:50010ataXceiver error processing WRITE_BLOCK operation  src: /192.168.1.193:34147 dest: /192.168.1.191:50010
# U# f6 o4 {( ~" ~! q' x' t* Bjava.io.IOException: Premature EOF from inputStream
8 Y% J- Q1 Y. n1 W, h: H  m2 W* \, X        at org.apache.hadoop.io.IOUtils.readFully(IOUtils.java:194)
+ _  a- x' q8 h& e; B1 _" K3 ^+ |        at org.apache.hadoop.hdfs.protocol.datatransfer.PacketReceiver.doReadFully(PacketReceiver.java:213)6 n% n  ^) C6 Y# u1 [6 R
        at org.apache.hadoop.hdfs.protocol.datatransfer.PacketReceiver.doRead(PacketReceiver.java:134)5 h9 C( l* O: f% d. n
        at org.apache.hadoop.hdfs.protocol.datatransfer.PacketReceiver.receiveNextPacket(PacketReceiver.java:109)+ S6 O: P& T$ C# f
        at org.apache.hadoop.hdfs.server.datanode.BlockReceiver.receivePacket(BlockReceiver.java:435)
3 j7 ~9 `1 Y, s        at org.apache.hadoop.hdfs.server.datanode.BlockReceiver.receiveBlock(BlockReceiver.java:693)6 {% j  a3 i9 u& a# j' x
        at org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(DataXceiver.java:569)6 Z9 a  ^/ N" g) P$ H2 t
        at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opWriteBlock(Receiver.java:115)
) c) J2 _) D5 j# ?        at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:68)- x# t0 w8 u# ~5 _& k
        at org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:221)
; j5 V7 w% _$ j  ]7 m        at java.lang.Thread.run(Thread.java:722)
9 N& Z" k3 ]" i1 U' v$ r7 x) O* f& Q0 h3 i- H1 ]6 S/ ^2 V# J
原因:文件操作超租期,实际上就是data stream操作过程中文件被删掉了。( c8 q# `: |5 Z$ c

& d% A1 K8 y" R! Y解决办法:
7 b6 W% F- }- x3 i修改hdfs-site.xml (针对2.x版本,1.x版本属性名应该是:dfs.datanode.max.xcievers):$ W5 {; Q" v% L9 h6 F; }
<property> ( C1 {: K/ H' u
        <name>dfs.datanode.max.transfer.threads</name> 1 Z' X; ]- t' Z0 f5 o- G: _; e
        <value>8192</value>
( ]1 `3 m# f- r1 z; r, T</property>8 @/ c: o' r/ L! s+ h, U
拷贝到各datanode节点并重启datanode即可7 w& F2 }) o: I6 e4 G) v1 S# y
学大数据 到大讲台
回复 支持 反对

使用道具 举报

 楼主| 发表于 2014-5-7 14:41:35 | 显示全部楼层
错误:java.io.IOException: Failed to replace a bad datanode on the existing pipeline due to no more good datanodes being available to try.- ]( r5 s2 I% m& ]2 E9 k
2014-05-07 12:21:41,820 WARN [Thread-115] org.apache.hadoop.mapreduce.v2.app.MRAppMaster: Graceful stop failed " I" h- Y& m' d2 y$ I. Z
org.apache.hadoop.yarn.exceptions.YarnRuntimeException: java.io.IOException: Failed to replace a bad datanode on the existing pipeline due to no more good datanodes being available to try. (Nodes: current=[192.168.1.191:50010, 192.168.1.192:50010], original=[192.168.1.191:50010, 192.168.1.192:50010]). The current failed datanode replacement policy is DEFAULT, and a client may configure this via 'dfs.client.block.write.replace-datanode-on-failure.policy' in its configuration.
2 K% t+ g% P$ g6 S  j: n+ s5 u        at org.apache.hadoop.mapreduce.jobhistory.JobHistoryEventHandler.handleEvent(JobHistoryEventHandler.java:514)
4 Q- C, J$ z! u* w        at org.apache.hadoop.mapreduce.jobhistory.JobHistoryEventHandler.serviceStop(JobHistoryEventHandler.java:332)
0 y5 o, I: v" k) J( t        at org.apache.hadoop.service.AbstractService.stop(AbstractService.java:221)
) O+ i- e/ \% c  S0 f" h  S        at org.apache.hadoop.service.ServiceOperations.stop(ServiceOperations.java:52)7 `+ w! N3 E2 G% g
        at org.apache.hadoop.service.ServiceOperations.stopQuietly(ServiceOperations.java:80)4 _6 s: z1 _8 x! L
        at org.apache.hadoop.service.CompositeService.stop(CompositeService.java:159): ?+ M! A# y6 h( n% _
        at org.apache.hadoop.service.CompositeService.serviceStop(CompositeService.java:132)( r/ ?( \. b! P
        at org.apache.hadoop.service.AbstractService.stop(AbstractService.java:221)
5 ~, \# z* H0 @/ p" {) q& K9 C4 |        at org.apache.hadoop.mapreduce.v2.app.MRAppMaster.shutDownJob(MRAppMaster.java:548)* `5 v# }$ K4 k( [. V# p4 K
        at org.apache.hadoop.mapreduce.v2.app.MRAppMaster$JobFinishEventHandler$1.run(MRAppMaster.java:599)* [. }4 V* k4 \2 Y& N( x
Caused by: java.io.IOException: Failed to replace a bad datanode on the existing pipeline due to no more good datanodes being available to try. (Nodes: current=[192.168.1.191:50010, 192.168.1.192:50010], original=[192.168.1.191:50010, 192.168.1.192:50010]). The current failed datanode replacement policy is DEFAULT, and a client may configure this via 'dfs.client.block.write.replace-datanode-on-failure.policy' in its configuration.
+ i$ @) N4 [* ]  J' @, }  `) R0 m        at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.findNewDatanode(DFSOutputStream.java:860)
1 F' @$ M8 d) w3 f        at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.addDatanode2ExistingPipeline(DFSOutputStream.java:925)/ I5 d3 h) I! w2 Z3 l  y' v! K$ v4 s
        at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.setupPipelineForAppendOrRecovery(DFSOutputStream.java:1031)/ ?) L9 O8 P& H2 I# x4 y
        at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.processDatanodeError(DFSOutputStream.java:823)
% Q' [. V2 E% B2 J, T3 E- ?        at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:475); U' x: F* c4 ]: U# F9 r
8 y& P8 u. l; c3 J
原因:无法写入;我的环境中有3个datanode,备份数量设置的是3。在写操作时,它会在pipeline中写3个机器。默认replace-datanode-on-failure.policy是DEFAULT,如果系统中的datanode大于等于3,它会找另外一个datanode来拷贝。目前机器只有3台,因此只要一台datanode出问题,就一直无法写入成功。4 N6 F0 T6 I) L3 A& `+ }2 |! I

9 Z* s& ]- _, e4 {% M- g0 u解决办法:修改hdfs-site.xml文件,添加或者修改如下两项:
' \) J& h& Q5 m<property>
2 {8 R! R* o5 R( s. o  <name>dfs.client.block.write.replace-datanode-on-failure.enable</name>
1 S$ o, U) ]7 y7 x0 h: ?9 `2 a1 L! t! w  <value>true</value>
6 W6 d4 B  u0 k" `) a, v</property>! c) u- s; E' K7 g1 O) Y
<property>9 r: ~, @! t+ ]7 }9 l& l
  <name>dfs.client.block.write.replace-datanode-on-failure.policy</name>
" y8 [6 T' S9 |' B: X  <value>NEVER</value>7 n+ F  T. b  I+ L; D' i1 P
</property>2 T; C9 t1 a) X# t* A2 X7 Q
( g! w- M% B4 O/ D' h$ a
对于dfs.client.block.write.replace-datanode-on-failure.enable,客户端在写失败的时候,是否使用更换策略,默认是true没有问题。
6 e5 i+ T$ D" o* w4 A) \对于,dfs.client.block.write.replace-datanode-on-failure.policy,default在3个或以上备份的时候,是会尝试更换结点尝试写入datanode。而在两个备份的时候,不更换datanode,直接开始写。对于3个datanode的集群,只要一个节点没响应写入就会出问题,所以可以关掉。5 d6 S2 ?6 Q$ `1 V9 E$ ]/ y7 T
学大数据 到大讲台
回复 支持 反对

使用道具 举报

 楼主| 发表于 2014-5-8 12:36:37 | 显示全部楼层
本帖最后由 锋云帮主 于 2014-5-8 14:08 编辑   ]+ m) z( C- b5 |0 l
' v0 R, I+ }3 w: |
错误:org.apache.hadoop.util.DiskChecker$DiskErrorException: Could not find any valid local directory for
( A) ~) y6 v0 N6 R4 J4 t14/05/08 18:24:59 INFO mapreduce.Job: Task Id : attempt_1399539856880_0016_m_000029_2, Status : FAILED
$ ]: o2 E& d( \+ B/ |Error: org.apache.hadoop.util.DiskChecker$DiskErrorException: Could not find any valid local directory for attempt_1399539856880_0016_m_000029_2_spill_0.out
2 _* b$ s9 \2 C8 N- A: P        at org.apache.hadoop.fs.LocalDirAllocator$AllocatorPerContext.getLocalPathForWrite(LocalDirAllocator.java:398)0 U2 Q' j" H7 x& N9 G9 w
        at org.apache.hadoop.fs.LocalDirAllocator.getLocalPathForWrite(LocalDirAllocator.java:150)# a3 q; G5 C; E# j
        at org.apache.hadoop.fs.LocalDirAllocator.getLocalPathForWrite(LocalDirAllocator.java:131). R; _8 J: C, Y% p/ k7 T4 K
        at org.apache.hadoop.mapred.YarnOutputFiles.getSpillFileForWrite(YarnOutputFiles.java:159)
2 e( d) X6 p8 W% \! m$ U        at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.sortAndSpill(MapTask.java:1573)) Z: D6 f$ b3 b' h3 l
        at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.flush(MapTask.java:1467)2 Z" A( h6 U* C  M; q" g) Q
        at org.apache.hadoop.mapred.MapTask$NewOutputCollector.close(MapTask.java:699)" y& g6 F3 h) L
        at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:769)
! R5 G" e: X* f5 f- x        at org.apache.hadoop.mapred.MapTask.run(MapTask.java:339), V% i( `& t+ S8 V5 X
        at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:162)  s2 C% g  p1 N
        at java.security.AccessController.doPrivileged(Native Method)
& z9 _) L  [/ Z+ p        at javax.security.auth.Subject.doAs(Subject.java:415)
, f+ S9 E  T# b' l* p        at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1491), H4 w2 o* b% w  ]  X
        at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:157)
/ R; O7 p+ |( H
0 R( @2 @; n) s! ~! _: v8 H+ `# YContainer killed by the ApplicationMaster.' r: E3 \* n- @

3 m& T2 c! n2 S  D' J原因:两种可能,hadoop.tmp.dir或者data目录存储空间不足。* q' d. L, C4 X0 {9 r0 t

! K: f( L% R0 \解决办法:看了一下我的dfs状态,data使用率不到40%,所以推测是hadoop.tmp.dir空间不足,导致无法创建Jog临时文件。查看core-site.xml发现没有配置hadoop.tmp.dir,因此使用的是默认的/tmp目录,在这目录一旦服务器重启数据就会丢失,因此需要修改。添加:: A' _" X2 l& A
<property>
  ~9 U) A$ {, }3 \3 l, V<name>hadoop.tmp.dir</dir>
* h6 N. I& S2 x1 H0 `* a) W<value>/data/tmp</value>
! N7 L/ C6 x# w</property>
+ _7 m/ [6 |9 K! C然后重新格式化:hadoop namenode -format9 s, W4 l( D5 s! h" i6 T
重启。
) Z, J+ w, k/ k8 _7 H9 p5 l, l0 r0 V7 i
% h3 ^+ H0 ~( f" @/ _# [' X1 u* f2 o
学大数据 到大讲台
回复 支持 反对

使用道具 举报

 楼主| 发表于 2014-6-19 10:27:52 | 显示全部楼层
2014-06-19 10:00:32,181 INFO [org.apache.hadoop.mapred.MapTask] - Ignoring exception during close for org.apache.hadoop.mapred.MapTask$NewOutputCollector@17bda0f2
9 W+ i' W. K! ?: S, L& Kjava.io.IOException: Spill failed  K: E. k9 B( }, B2 w  q. G+ D! ~. ?
        at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.checkSpillException(MapTask.java:1540)  K) F0 n& Q, ~( N. g  T$ N+ Q& \
        at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.flush(MapTask.java:1447)' ?: I& r5 U+ p- c. x- W
        at org.apache.hadoop.mapred.MapTask$NewOutputCollector.close(MapTask.java:699)/ F3 O7 q" T' Z( C5 H
        at org.apache.hadoop.mapred.MapTask.closeQuietly(MapTask.java:1997)
; i0 t7 z. i. {" K: |        at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:773)$ S* t" ]* P8 K# {- H1 A! ?5 S
        at org.apache.hadoop.mapred.MapTask.run(MapTask.java:339)) S7 \! f3 L6 a
        at org.apache.hadoop.mapred.LocalJobRunner$Job$MapTaskRunnable.run(LocalJobRunner.java:235)
2 p$ h+ _  x8 d" W        at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
0 q: N: |2 f, Q9 [; t; `( j3 r        at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)2 e7 ^$ r+ ]3 I3 T1 P" M
        at java.util.concurrent.FutureTask.run(FutureTask.java:166)/ v# ~7 v3 ?/ O
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)+ s8 R4 Z8 P5 n  s4 F
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)/ n% N' n6 N: k* C/ ]& H5 b
        at java.lang.Thread.run(Thread.java:722)
5 k  D6 @3 `9 fCaused by: org.apache.hadoop.util.DiskChecker$DiskErrorException: Could not find any valid local directory for output/spill0.out
9 y* Q) e; j& X        at org.apache.hadoop.fs.LocalDirAllocator$AllocatorPerContext.getLocalPathForWrite(LocalDirAllocator.java:398)
7 `9 K6 L1 o( N% \3 T        at org.apache.hadoop.fs.LocalDirAllocator.getLocalPathForWrite(LocalDirAllocator.java:150)/ L7 L/ i! E& j# [6 s1 v# ]
        at org.apache.hadoop.fs.LocalDirAllocator.getLocalPathForWrite(LocalDirAllocator.java:131)
# w% t! \4 Q5 S( B9 i        at org.apache.hadoop.mapred.MROutputFiles.getSpillFileForWrite(MROutputFiles.java:146)
0 x8 V: y8 t% V" _8 O/ E        at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.sortAndSpill(MapTask.java:1573)" q+ L% {9 ^& c! f0 j* F; K
        at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.access$900(MapTask.java:852)8 @3 O) A0 M2 Z5 D& D
        at org.apache.hadoop.mapred.MapTask$MapOutputBuffer$SpillThread.run(MapTask.java:1510)
" x' h- M2 _  p: Y3 T. G4 e, U
7 m4 j) I3 c" M: S! p* S& A
+ i. R$ B. g0 U' n9 t错误原因:本地磁盘空间不足非hdfs (我是在myeclipse中调试程序,本地tmp目录占满)  n& K: }; F6 ^
解决办法:清理、增加空间
1 a7 I) _2 _; J* X  ]) X
7 r! r: e$ K0 i5 l% `7 M: t; y
; M- q( @& c: }: q* I: o
学大数据 到大讲台
回复 支持 反对

使用道具 举报

 楼主| 发表于 2014-7-4 08:42:16 | 显示全部楼层
2014-06-23 10:21:01,479 INFO [IPC Server handler 3 on 45207] org.apache.hadoop.mapred.TaskAttemptListenerImpl: Progress of TaskAttempt attempt_1403488126955_0002_m_000000_0 is : 0.308017162014-06-23 10:21:01,512 FATAL [IPC Server handler 2 on 45207] org.apache.hadoop.mapred.TaskAttemptListenerImpl: Task: attempt_1403488126955_0002_m_000000_0 - exited : java.io.IOException: Spill failed        at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.checkSpillException(MapTask.java:1540)        at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.collect(MapTask.java:1063)        at org.apache.hadoop.mapred.MapTask$NewOutputCollector.write(MapTask.java:691)        at org.apache.hadoop.mapreduce.task.TaskInputOutputContextImpl.write(TaskInputOutputContextImpl.java:89)        at org.apache.hadoop.mapreduce.lib.map.WrappedMapper$Context.write(WrappedMapper.java:112)        at com.mediadc.hadoop.MediaIndex$SecondMapper.map(MediaIndex.java:180)        at com.mediadc.hadoop.MediaIndex$SecondMapper.map(MediaIndex.java:1)        at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:145)        at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:763)        at org.apache.hadoop.mapred.MapTask.run(MapTask.java:339)        at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:162)        at java.security.AccessController.doPrivileged(Native Method)        at javax.security.auth.Subject.doAs(Subject.java:415)        at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1491)        at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:157)Caused by: org.apache.hadoop.util.DiskChecker$DiskErrorException: Could not find any valid local directory for attempt_1403488126955_0002_m_000000_0_spill_53.out        at org.apache.hadoop.fs.LocalDirAllocator$AllocatorPerContext.getLocalPathForWrite(LocalDirAllocator.java:398)        at org.apache.hadoop.fs.LocalDirAllocator.getLocalPathForWrite(LocalDirAllocator.java:150)        at org.apache.hadoop.fs.LocalDirAllocator.getLocalPathForWrite(LocalDirAllocator.java:131)        at org.apache.hadoop.mapred.YarnOutputFiles.getSpillFileForWrite(YarnOutputFiles.java:159)        at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.sortAndSpill(MapTask.java:1573)        at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.access$900(MapTask.java:852)        at org.apache.hadoop.mapred.MapTask$MapOutputBuffer$SpillThread.run(MapTask.java:1510)2014-06-23 10:21:01,513 INFO [IPC Server handler 2 on 45207] org.apache.hadoop.mapred.TaskAttemptListenerImpl: Diagnostics report from attempt_1403488126955_0002_m_000000_0: Error: java.io.IOException: Spill failed        at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.checkSpillException(MapTask.java:1540)        at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.collect(MapTask.java:1063)        at org.apache.hadoop.mapred.MapTask$NewOutputCollector.write(MapTask.java:691)        at org.apache.hadoop.mapreduce.task.TaskInputOutputContextImpl.write(TaskInputOutputContextImpl.java:89)        at org.apache.hadoop.mapreduce.lib.map.WrappedMapper$Context.write(WrappedMapper.java:112)        at com.mediadc.hadoop.MediaIndex$SecondMapper.map(MediaIndex.java:180)        at com.mediadc.hadoop.MediaIndex$SecondMapper.map(MediaIndex.java:1)        at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:145)        at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:763)        at org.apache.hadoop.mapred.MapTask.run(MapTask.java:339)        at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:162)        at java.security.AccessController.doPrivileged(Native Method)        at javax.security.auth.Subject.doAs(Subject.java:415)        at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1491)        at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:157)Caused by: org.apache.hadoop.util.DiskChecker$DiskErrorException: Could not find any valid local directory for attempt_1403488126955_0002_m_000000_0_spill_53.out        at org.apache.hadoop.fs.LocalDirAllocator$AllocatorPerContext.getLocalPathForWrite(LocalDirAllocator.java:398)        at org.apache.hadoop.fs.LocalDirAllocator.getLocalPathForWrite(LocalDirAllocator.java:150)        at org.apache.hadoop.fs.LocalDirAllocator.getLocalPathForWrite(LocalDirAllocator.java:131)        at org.apache.hadoop.mapred.YarnOutputFiles.getSpillFileForWrite(YarnOutputFiles.java:159)        at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.sortAndSpill(MapTask.java:1573)        at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.access$900(MapTask.java:852)        at org.apache.hadoop.mapred.MapTask$MapOutputBuffer$SpillThread.run(MapTask.java:1510)2014-06-23 10:21:01,514 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: Diagnostics report from attempt_1403488126955_0002_m_000000_0: Error: java.io.IOException: Spill failed        at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.checkSpillException(MapTask.java:1540)        at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.collect(MapTask.java:1063)        at org.apache.hadoop.mapred.MapTask$NewOutputCollector.write(MapTask.java:691)        at org.apache.hadoop.mapreduce.task.TaskInputOutputContextImpl.write(TaskInputOutputContextImpl.java:89)        at org.apache.hadoop.mapreduce.lib.map.WrappedMapper$Context.write(WrappedMapper.java:112)        at com.mediadc.hadoop.MediaIndex$SecondMapper.map(MediaIndex.java:180)        at com.mediadc.hadoop.MediaIndex$SecondMapper.map(MediaIndex.java:1)        at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:145)        at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:763)        at org.apache.hadoop.mapred.MapTask.run(MapTask.java:339)        at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:162)        at java.security.AccessController.doPrivileged(Native Method)        at javax.security.auth.Subject.doAs(Subject.java:415)        at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1491)        at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:157)Caused by: org.apache.hadoop.util.DiskChecker$DiskErrorException: Could not find any valid local directory for attempt_1403488126955_0002_m_000000_0_spill_53.out        at org.apache.hadoop.fs.LocalDirAllocator$AllocatorPerContext.getLocalPathForWrite(LocalDirAllocator.java:398)        at org.apache.hadoop.fs.LocalDirAllocator.getLocalPathForWrite(LocalDirAllocator.java:150)        at org.apache.hadoop.fs.LocalDirAllocator.getLocalPathForWrite(LocalDirAllocator.java:131)        at org.apache.hadoop.mapred.YarnOutputFiles.getSpillFileForWrite(YarnOutputFiles.java:159)        at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.sortAndSpill(MapTask.java:1573)        at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.access$900(MapTask.java:852)        at org.apache.hadoop.mapred.MapTask$MapOutputBuffer$SpillThread.run(MapTask.java:1510)2014-06-23 10:21:01,516 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: attempt_1403488126955_0002_m_000000_0 TaskAttempt Transitioned from RUNNING to FAIL_CONTAINER_CLEANUP
1 t  l3 k# M8 [1 d# L0 Z4 V错误很明显,磁盘空间不足,但郁闷的是,进各节点查看,磁盘空间使用不到40%,还有很多空间。. Z( I: c% u+ K
0 a- S# y+ E3 _+ @0 t! r
郁闷很长时间才发现,原来有个map任务运行时输出比较多,运行出错前,硬盘空间一路飙升,直到100%不够时报错。随后任务执行失败,释放空间,把任务分配给其它节点。正因为空间被释放,因此虽然报空间不足的错误,但查看当时磁盘还有很多剩余空间。0 j+ |& _1 G* W9 H% a4 P
5 W# o2 ~) r; ^! F) E  T5 d
这个问题告诉我们,运行过程中的监控很重要。
$ r. ?$ H, s& s% i
学大数据 到大讲台
回复 支持 反对

使用道具 举报

发表于 2015-4-1 15:50:38 | 显示全部楼层
兄弟,你最后一个问题是怎么解决的,有解决方案吗?
学大数据 到大讲台
回复 支持 反对

使用道具 举报

 楼主| 发表于 2015-4-8 13:39:15 | 显示全部楼层
jerryliu_306 发表于 2015-4-1 15:50
  ^) C, j( }8 K$ w. h% C兄弟,你最后一个问题是怎么解决的,有解决方案吗?
. y3 T7 M; D+ o" l/ }' F
不好意思,才看到。
+ e0 u; R& J/ D. C我说明了,问题就是硬盘空间不足,中间输出比较多。但执行失败后这些输出会被自动清掉,所以检查又发现空间很多,产生误判
学大数据 到大讲台
回复 支持 反对

使用道具 举报

您需要登录后才可以回帖 登录 | 快速注册

本版积分规则

在线咨询|关于锋云|联系我们|手机版|投诉建议|版权声明|云计算|Hadoop|大数据|锋云网 ( 京ICP备13050990号 )

这是云计算时代的精英部落,这是中国最大的云计算社区 —— 锋云网(sharpcloud.cn)!

本站CDN/存储服务由本站CDN/存储服务由又拍云提供提供

Powered by Discuz! X3.2

© 2001-2015 Sharpcloud.cn

 

锋云网官方QQ群

中国云计算精英群(ID:64924638)中国云计算精英群      Hadoop技术交流群②(ID:25728812)Hadoop技术交流群②      Spark技术交流群(ID:413581066)Spark技术交流群

Hadoop技术交流群(ID:113156288,2000人群已满)

返回顶部